Software Engineering - Starter Web app 4

In this episode, our hero (that’s you!) will be learning a tool that is helpful in virtually any kind of development you will wind up working in: Version control. Version control is rarely taught in school. It can be somewhat a deep topic, but we’ll go over some basics of how you see it in the open source world, which can carry over significantly to the commercial world.

1. what’s version control?

Version control is like the double-entry bookkeeping technique of saving all of your hard work. If you recall on the last lesson, there was an off-the-cuff statement about how your computer is just a fancy interface for working with files. Version control typically excels at handling text files. This blog post is a text file. HTML is text files. The original file that the blog post is generated from is a text file. SVG images are also text files. There’s also non-text files. Generally we call these binary files, but really we could also call them shit-files. That’s a completely objective industry term and not me showing any bias at all. Honest.

Double entry bookkeeping is something you might have had some exposure to in grade school, where you have to write down every credit and debit that hits your account, and there will be a corresponding debit or credit on some other account. It’s a ruthlessly simple model for handling money. We’re going to use a ruthlessly simple model for handling changes in text files. Version control systems can handle binary files too, but things are more eww there, so we’re just going to skip over that for now.

Like the bookkeeping stuff, version control is expressed as just a series of changes. Sure, you can write down what the final amount is somewhere in your checking account, but the final amount is always some composition of all the credits and debits since the dawn of time against your account. In the same way any given text file can be seen as the composition of changes happening over time to a text file.

2. why?

There’s a lot of benefits that fall out of version control, but perhaps a primary driver for it is it allows us to capture a state in time that your source code was in. As of version x in the software, some things might be known such as features, bugs, and other behaviors. If we need to, we can go back to that point. We can make fixes against software that we released a while back.

Making fixes against older versions of the software can be helpful if we’re working on some shiny new feature, but we found out some critical bug in our software that our uses are on right now. Caught with our proverbial software pants down, we don’t want to have to discard our current work nor do we want to inflict some half-complete feature upon our users. Instead we can get in our version-control-time-machine and assume a prior state of the code base, make the fix, and roll it out. Once we’re done we can switch back to working on our shiny new feature.

At its most basic form, version control can serve as a super efficient text backup system. It keeps a history of all of your changes made in little slices at a time, and it stores these changes are just a series of changes (add this line here, remove this other one, etc). Even if you aren’t planning on doing releases/rollbacks, or don’t want to share code with others, you need to be on version control. And believe me, if you want to find work out there, it will be a boon in a job interview for your first position to be able to say “yeah I use this”. Otherwise they’ll have to teach it to you, and while that’s all find and good, this is a fundamental skill we expect software engineers in the industry to know.

3. git

There are lots of version control systems out there, but git is kind of the gold standard. Open source, free, and serverless, git can be very powerful (thus very complicated) and an easy defacto standard for version control software.

If you’re on MacOS, you might have git installed already. That’s okay but let’s make sure we’re on the latest version with this:

$ brew install git

While waiting for that to install, let’s go over some git trivia. git was a tool born of necessity as part of working on the Linux kernel. Torvalds stated he named it “git” (roughly meaning idiot) because he needed to be reminded that he was an idiot, or something. git transformed an insanely vast open source project into something more manageable. git is “distributed”, meaning there’s no central server that you must use. This allows for arbitrary organization of your repository. Repository is the word we give to a single code base, but repository doesn’t imply code since it works with just text. Sometimes we call a repository a “repo” for short.

This arbitrary organization allowed the Linux kernel to be divided up in a more organic fashion. Linus Torvalds could assign a kind of trust to “generals” in his open source army, who in turn could have lieutenants. This means there’s a kind of flow to the work being done. Lieutenants contribute to various changes in the code base, which get reviewed, corrected, and vetted by the generals. The generals then present the changes to Linus directly. This is really helpful when you maintain a complex operating system kernel on largely your spare time.

4. creating your first repo

From working on our prior cat-hatred directory, we already have a code base ready to become a repository.

$ cd ~/dev/cat-hatred

From here, we can do our initialization command.

$ git init

Initialized empty Git repository in /Users/me/dev/cat-hatred/.git/

That’s it for making the repo! Kind of.

5. ignoring files

We’re not going to add everything to the repository.

5.1. generated files

Generally we want to not add generated files to our version control. This is because generated files can get large, unwieldy, and they are generally derived from some other authoritative file. This means pretty much anyone can regenerate those files at any time if we needed them. We want to discourage editing of generated files because when you run the generator again, and now you can see there’s a change, who is correct in that change? It’s hard to provide a concretely correct answer there, so best to opt out of that risk in the first place. If we need the generated files changed, we should assume the generator is always safe to run and that the source file that produces the generated file will somehow reflect the needed changes.

Since we’re in a Node project, the node_modules directory fits this bill perfectly of a series of generated files. The entire directory is managed by our package manager (for us, yarn). We want to ignore node_modules. git conventionally uses a .gitignore file to list file/directory names and patterns in order to ensure those files aren’t “checked in”. “Checking in” is something we do with our files. I don’t recall if git specifically uses that terminology but I’m a little old and some the vocabulary from older systems will bleed over there. Rest assured there are people older than me in software engineering probably.

We can create our .gitignore file and add node_modules to it in one stroke:

$ echo "node_modules" > .gitignore

5.2. editor specific files

Depending on your editor choice, this may not be the complete story. Editors love to store their project+editor unique settings in a given repository. It would be easy to find out what those files are named and also add them to .gitignore but I urge you not to do that. There are a lot of editors out there, and you don’t want to inflict your holy editor upon others (just like you wouldn’t want their garbage editors inflicted upon you). In this case “others” can also be your future self. For example, I’m going strong on Emacs right now, but I can quickly recall using seven other editors I’ve used before that, each with its own config files. It’s your responsibility to avoid adding unnecessary clutter to the repository.

With git you can add a global ignore file like so:

$ git config --global core.excludesfile '~/.gitignore'

Find out what your editor’s temporary and hidden files are and add them to this global git ignore file. If you change editors later, simply add to the file. You can even add ignore settings for your operating system as well (think Thumbs.db from Windows and .DS_Store on MacOS).

While digging around for examples I found this excellent gitignore repository where you can just lift settings for your editor and/or operating system. You can copy across multiple files, just separate them with lines. You can add comments starting with the # character to document sections if you like.

6. adding your files

Make sure if you’re jumping around that you’ve handled everything in the 5 section. Now we’re going to add files to this repository. First, we can look to see what files are available with the git status command.

$ git status

On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)

  accept-test.sh
  .gitignore
  package.json
  server.js
  yarn.lock

nothing added to commit but untracked files present (use "git add" to track)

Your version may be more colorful than my post’s, and that’s okay. Generally untracked and modified files are displayed in red. Files that are “staged” for commit are shown in green. I realized red/green color blindness is very common, and I think these colors can be configured, but I haven’t looked into that yet.

To stage these files, we can use git add <file1>, <file2>, .... You can also indicate a directory and it will get everything under that directory that isn’t being ignored. Ignored files won’t show up in the git status output so long as they weren’t committed earlier.

If your output differs than mine above then definitely ask about it, or retrace your steps. It can be a pain to take out files you didn’t mean to add. You also lose badass points.

Once you’ve made sure that’s all correct, we’ll add these files just one at a time to get used to the flow:

$ git add accept-test.sh .gitignore package.json server.js yarn.lock

And now when you do git status again, you should see these are ready to commit:

$ git status

On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

  new file:   .gitignore
  new file:   accept-test.sh
  new file:   package.json
  new file:   server.js
  new file:   yarn.lock

All of the files will be green! And it describes that they are ready to be committed. Let’s commit them!

7. committing files

Regardless of where you stand with commitment, you must commit your files or this just isn’t going to work between us. Generally when you issue a commit command, you are presented with your editor (whatever your system has listed for EDITOR in your shell’s environment variables - frequently this is nano or vim but could be configured to be a number of things). This is so we can write a meaningful message for our commit. Instead of getting into those editors, we’re going to use the -m flag to specify the commit message for today.

$ git commit -m "initial commit"

  [master (root-commit) 8aed320] initial commit
  5 files changed, 380 insertions(+)
  create mode 100644 .gitignore
  create mode 100755 accept-test.sh
  create mode 100644 package.json
  create mode 100644 server.js
  create mode 100644 yarn.lock

The exact commit hash (8aed320) and insertions may differ for you and that’s okay. Although the files listed should be the same. That’s our first commit!

In git, everything is represented as a series of commits along these things called branches or “refs”. We’ll get more into branching later.

8. pushing files

Right now, if your machine just caught fire and melted, you’d still lose your work forever and you’d probably be really sad because you lost all of this and those questionable pictures you weren’t sure if you should put in a shared location for privacy reasons. All the work we’ve done thus far in git has been local. That means you could do this without any server connection. You can now bring your laptop out to the wilderness on that stupid camping trip and tell your hippie friends to piss off while you whip up a script that replaces their dog walking talents with a Roomba, and then commit that shit. I’m not bitter.

We want to push your commit up to a server. There’s a number of places we can put this, and they are free. Generally I use Github for my public stuff, and Bitbucket for my private stuff (both of which are free for their public and private use, respectively). There’s also Gitlab, which you can self host and there’s an online version, but I’m less familiar with its monetary aspects.

8.0.1. github

Go to Github and setup an account there. Follow their instructions for setting up an SSH key. This will take some time and that’s okay.

Once it’s done, you can create a new repository by going to their new repository page. Name the project “cat-hatred” (avoid spaces and caps!). You can leave the rest blank or default, and just click the create repository button. If you decided to go with something that’s not Github, that’s okay and the steps will likely be similar.

Once you did that successfully, you’ll be on a page that shows you what the repository git URL is. For me, it’s git@github.com:LoganBarnett/cat-hatred.git. Yours will be similar but using your username instead of mine. Make sure you use yours and not mine! Okay?

Okay.

Copy that URL and go back to your terminal. You can add the repository to “origin”, which is kind of a default “remote” (we’ll dive into that more later).

$ git remote add origin git@github.com:LoganBarnett/cat-hatred.git

That’s what my command looks like - don’t forget that yours should be using your username so it will be different than mine!

Once that’s done, we can do our first push with this command:

$ git push

Counting objects: 7, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (6/6), done.
Writing objects: 100% (7/7), 4.26 KiB | 2.13 MiB/s, done.
Total 7 (delta 0), reused 0 (delta 0)
To github.com:LoganBarnett/cat-hatred.git
 * [new branch]      master -> master

Assuming everything went without a hitch, you’ll see some output like that. It will take a couple of seconds typically, as now you’re reaching over a network. Congrats! You can even browse to your repo’s page on Github and see the source code there. Mine is here: https://github.com/LoganBarnett/cat-hatred

9. more, but…

There’s a lot more to git than what I’ve shown you thus far, but now you can continue to do work, commit files that you change or add, and push them to your repository’s online copy. In the future I will share code with you using git. For a solo endeavor, the skillset you have now will suffice. Now we can get back to making awesome software!