In this episode, our hero (that’s you!) will be learning a tool that is helpful in virtually any kind of development you will wind up working in: Version control. Version control is rarely taught in school. It can be somewhat a deep topic, but we’ll go over some basics of how you see it in the open source world, which can carry over significantly to the commercial world.
1. what’s version control?
Version control is like the double-entry bookkeeping technique of saving all of your hard work. If you recall on the last lesson, there was an off-the-cuff statement about how your computer is just a fancy interface for working with files. Version control typically excels at handling text files. This blog post is a text file. HTML is text files. The original file that the blog post is generated from is a text file. SVG images are also text files. There’s also non-text files. Generally we call these binary files, but really we could also call them shit-files. That’s a completely objective industry term and not me showing any bias at all. Honest.
Double entry bookkeeping is something you might have had some exposure to in grade school, where you have to write down every credit and debit that hits your account, and there will be a corresponding debit or credit on some other account. It’s a ruthlessly simple model for handling money. We’re going to use a ruthlessly simple model for handling changes in text files. Version control systems can handle binary files too, but things are more eww there, so we’re just going to skip over that for now.
Like the bookkeeping stuff, version control is expressed as just a series of changes. Sure, you can write down what the final amount is somewhere in your checking account, but the final amount is always some composition of all the credits and debits since the dawn of time against your account. In the same way any given text file can be seen as the composition of changes happening over time to a text file.
2. why?
There’s a lot of benefits that fall out of version control, but perhaps a primary driver for it is it allows us to capture a state in time that your source code was in. As of version x in the software, some things might be known such as features, bugs, and other behaviors. If we need to, we can go back to that point. We can make fixes against software that we released a while back.
Making fixes against older versions of the software can be helpful if we’re working on some shiny new feature, but we found out some critical bug in our software that our uses are on right now. Caught with our proverbial software pants down, we don’t want to have to discard our current work nor do we want to inflict some half-complete feature upon our users. Instead we can get in our version-control-time-machine and assume a prior state of the code base, make the fix, and roll it out. Once we’re done we can switch back to working on our shiny new feature.
At its most basic form, version control can serve as a super efficient text backup system. It keeps a history of all of your changes made in little slices at a time, and it stores these changes are just a series of changes (add this line here, remove this other one, etc). Even if you aren’t planning on doing releases/rollbacks, or don’t want to share code with others, you need to be on version control. And believe me, if you want to find work out there, it will be a boon in a job interview for your first position to be able to say “yeah I use this”. Otherwise they’ll have to teach it to you, and while that’s all find and good, this is a fundamental skill we expect software engineers in the industry to know.
3. git
There are lots of version control systems out there, but git
is kind of the
gold standard. Open source, free, and serverless, git
can be very powerful
(thus very complicated) and an easy defacto standard for version control
software.
If you’re on MacOS, you might have git
installed already. That’s okay but
let’s make sure we’re on the latest version with this:
$ brew install git
While waiting for that to install, let’s go over some git
trivia. git
was
a tool born of necessity as part of working on the Linux kernel. Torvalds
stated he named it “git” (roughly meaning idiot) because he needed to be
reminded that he was an idiot, or something. git
transformed an insanely
vast open source project into something more manageable. git
is
“distributed”, meaning there’s no central server that you must use. This
allows for arbitrary organization of your repository. Repository is the word
we give to a single code base, but repository doesn’t imply code since it
works with just text. Sometimes we call a repository a “repo” for short.
This arbitrary organization allowed the Linux kernel to be divided up in a more organic fashion. Linus Torvalds could assign a kind of trust to “generals” in his open source army, who in turn could have lieutenants. This means there’s a kind of flow to the work being done. Lieutenants contribute to various changes in the code base, which get reviewed, corrected, and vetted by the generals. The generals then present the changes to Linus directly. This is really helpful when you maintain a complex operating system kernel on largely your spare time.
4. creating your first repo
From working on our prior cat-hatred
directory, we already have a code base
ready to become a repository.
$ cd ~/dev/cat-hatred
From here, we can do our initialization command.
$ git init Initialized empty Git repository in /Users/me/dev/cat-hatred/.git/
That’s it for making the repo! Kind of.
5. ignoring files
We’re not going to add everything to the repository.
5.1. generated files
Generally we want to not add generated files to our version control. This is because generated files can get large, unwieldy, and they are generally derived from some other authoritative file. This means pretty much anyone can regenerate those files at any time if we needed them. We want to discourage editing of generated files because when you run the generator again, and now you can see there’s a change, who is correct in that change? It’s hard to provide a concretely correct answer there, so best to opt out of that risk in the first place. If we need the generated files changed, we should assume the generator is always safe to run and that the source file that produces the generated file will somehow reflect the needed changes.
Since we’re in a Node project, the node_modules
directory fits this bill
perfectly of a series of generated files. The entire directory is managed by
our package manager (for us, yarn
). We want to ignore node_modules
. git
conventionally uses a .gitignore
file to list file/directory names and
patterns in order to ensure those files aren’t “checked in”. “Checking in” is
something we do with our files. I don’t recall if git
specifically uses that
terminology but I’m a little old and some the vocabulary from older systems
will bleed over there. Rest assured there are people older than me in software
engineering probably.
We can create our .gitignore
file and add node_modules
to it in one
stroke:
$ echo "node_modules" > .gitignore
5.2. editor specific files
Depending on your editor choice, this may not be the complete story. Editors
love to store their project+editor unique settings in a given repository. It
would be easy to find out what those files are named and also add them to
.gitignore
but I urge you not to do that. There are a lot of editors out
there, and you don’t want to inflict your holy editor upon others (just like
you wouldn’t want their garbage editors inflicted upon you). In this case
“others” can also be your future self. For example, I’m going strong on Emacs
right now, but I can quickly recall using seven other editors I’ve used
before that, each with its own config files. It’s your responsibility to
avoid adding unnecessary clutter to the repository.
With git
you can add a global ignore file like so:
$ git config --global core.excludesfile '~/.gitignore'
Find out what your editor’s temporary and hidden files are and add them to
this global git ignore file. If you change editors later, simply add to the
file. You can even add ignore settings for your operating system as well
(think Thumbs.db
from Windows and .DS_Store
on MacOS).
While digging around for examples I found this excellent gitignore repository
where you can just lift settings for your editor and/or operating system. You
can copy across multiple files, just separate them with lines. You can add
comments starting with the #
character to document sections if you like.
6. adding your files
Make sure if you’re jumping around that you’ve handled everything in the
5 section. Now we’re going to add files to this repository.
First, we can look to see what files are available with the git status
command.
$ git status On branch master No commits yet Untracked files: (use "git add <file>..." to include in what will be committed) accept-test.sh .gitignore package.json server.js yarn.lock nothing added to commit but untracked files present (use "git add" to track)
Your version may be more colorful than my post’s, and that’s okay. Generally untracked and modified files are displayed in red. Files that are “staged” for commit are shown in green. I realized red/green color blindness is very common, and I think these colors can be configured, but I haven’t looked into that yet.
To stage these files, we can use git add <file1>, <file2>, ...
. You can also
indicate a directory and it will get everything under that directory that
isn’t being ignored. Ignored files won’t show up in the git status
output
so long as they weren’t committed earlier.
If your output differs than mine above then definitely ask about it, or retrace your steps. It can be a pain to take out files you didn’t mean to add. You also lose badass points.
Once you’ve made sure that’s all correct, we’ll add these files just one at a time to get used to the flow:
$ git add accept-test.sh .gitignore package.json server.js yarn.lock
And now when you do git status
again, you should see these are ready to
commit:
$ git status On branch master No commits yet Changes to be committed: (use "git rm --cached <file>..." to unstage) new file: .gitignore new file: accept-test.sh new file: package.json new file: server.js new file: yarn.lock
All of the files will be green! And it describes that they are ready to be committed. Let’s commit them!
7. committing files
Regardless of where you stand with commitment, you must commit your files or
this just isn’t going to work between us. Generally when you issue a commit
command, you are presented with your editor (whatever your system has listed
for EDITOR
in your shell’s environment variables - frequently this is nano
or vim
but could be configured to be a number of things). This is so we can
write a meaningful message for our commit. Instead of getting into those
editors, we’re going to use the -m
flag to specify the commit message for
today.
$ git commit -m "initial commit" [master (root-commit) 8aed320] initial commit 5 files changed, 380 insertions(+) create mode 100644 .gitignore create mode 100755 accept-test.sh create mode 100644 package.json create mode 100644 server.js create mode 100644 yarn.lock
The exact commit hash (8aed320
) and insertions may differ for you and that’s
okay. Although the files listed should be the same. That’s our first commit!
In git
, everything is represented as a series of commits along these things
called branches or “refs”. We’ll get more into branching later.
8. pushing files
Right now, if your machine just caught fire and melted, you’d still lose your
work forever and you’d probably be really sad because you lost all of this and
those questionable pictures you weren’t sure if you should put in a shared
location for privacy reasons. All the work we’ve done thus far in git
has
been local. That means you could do this without any server connection. You
can now bring your laptop out to the wilderness on that stupid camping trip
and tell your hippie friends to piss off while you whip up a script that
replaces their dog walking talents with a Roomba, and then commit that shit.
I’m not bitter.
We want to push your commit up to a server. There’s a number of places we can put this, and they are free. Generally I use Github for my public stuff, and Bitbucket for my private stuff (both of which are free for their public and private use, respectively). There’s also Gitlab, which you can self host and there’s an online version, but I’m less familiar with its monetary aspects.
8.0.1. github
Go to Github and setup an account there. Follow their instructions for setting up an SSH key. This will take some time and that’s okay.
Once it’s done, you can create a new repository by going to their new repository page. Name the project “cat-hatred” (avoid spaces and caps!). You can leave the rest blank or default, and just click the create repository button. If you decided to go with something that’s not Github, that’s okay and the steps will likely be similar.
Once you did that successfully, you’ll be on a page that shows you what the
repository git URL is. For me, it’s
git@github.com:LoganBarnett/cat-hatred.git
. Yours will be similar but
using your username instead of mine. Make sure you use yours and not mine!
Okay?
Okay.
Copy that URL and go back to your terminal. You can add the repository to “origin”, which is kind of a default “remote” (we’ll dive into that more later).
$ git remote add origin git@github.com:LoganBarnett/cat-hatred.git
That’s what my command looks like - don’t forget that yours should be using your username so it will be different than mine!
Once that’s done, we can do our first push with this command:
$ git push Counting objects: 7, done. Delta compression using up to 4 threads. Compressing objects: 100% (6/6), done. Writing objects: 100% (7/7), 4.26 KiB | 2.13 MiB/s, done. Total 7 (delta 0), reused 0 (delta 0) To github.com:LoganBarnett/cat-hatred.git * [new branch] master -> master
Assuming everything went without a hitch, you’ll see some output like that. It will take a couple of seconds typically, as now you’re reaching over a network. Congrats! You can even browse to your repo’s page on Github and see the source code there. Mine is here: https://github.com/LoganBarnett/cat-hatred
9. more, but…
There’s a lot more to git
than what I’ve shown you thus far, but now you
can continue to do work, commit files that you change or add, and push them to
your repository’s online copy. In the future I will share code with you using
git
. For a solo endeavor, the skillset you have now will suffice. Now we can
get back to making awesome software!