Skip to main content

Git for Authors

Section 2.1 Commits

Collections of changes can be called changesets, but we will be more likely to call each such collection a commit. That is a noun, not a verb. If you have some experience with other revision control systems, then you might be familiar with the notion of “committing”, or “checking in.” Try to avoid confusing the new noun with your old verbs.
How do you make a commit? Roughly, you edit your files, so that your directory of files (your working directory) is dirty. The dirty or clean directory is a good mental image as you start working with git. Edit your files, and save your files. Normally, you feel pretty secure at this point. You have made changes, and by saving the edited files, you feel like you have saved your changes. But from git’s perspective, your files are dirty and you have not made your changes known to git yet. Here is the drill, using two commands at the command line in a terminal.
List 2.1.1. Making a Commit
  1. Edit some files and save them, making your working directory dirty.
  2. git add <file1> <file2> <file3>
  3. git commit -m "Add the incident at the train station"
You will get no reaction (output) from the git add command, but when you actually make the commit, you should get a response like
[master c0f19a2] Add the incident at the train station
OK, that is a basic recipe, but what actually happened? In the add command you would have listed some, or all, of the files you had edited and saved. If you only listed some, the commit would only contain some of your changes, and the remaining changes would contribute to keeping your working directory dirty. The add command moves your changes in the indicated files to a staging area, a sort of purgatory, called the index. We say those changes are staged. You can incrementally add changes to the index to form a coherent set of changes that will eventually become a commit. For example, above you could have run the add command three times, once for each file, to stage the same collection of changes. If you further edit a file after git add, you can add that file again to move the subsequent edits into the index.
Realize that git add does two similar things. If git is unaware of some file, then add will make it one of the files that git tracks and will put the current contents of that file into the index. And from now on, git will include relevant details about this file in reports. For example if the file is dirty, then certain reports will show the changes (see next paragraph). But “tracking” a file does not mean git automatically packages up changes. That is your job. You have control of exactly which changes git will manage, and when you want git to become aware of those changes. Subsequently, git add moves changes from a file into the index, and you can do this repeatedly to update which collection of changes are staged in the index.
With all this talk of a dirty directory, how can you tell if your directory is even dirty at all? The command is git diff. It takes no action and is merely informative. You can run it anytime you like and it is wise to do it often, especially when getting started. RAB often walks away from his writing with a dirty directory (not best practice). So it is a good habit he has to always run git diff when first returning to a project. The output of git diff is all of the changes in your working directory that are not staged into the index. It is organized by file (given in yellow on my computer), with red text being removed and green text being added. White text is unchanged and provides context for changed text, in order to help git apply changes in the right places. Solid red squares or bars are extraneous whitespace that serves no purpose other than to potentially confuse git. It is a good idea to become comfortable understanding this information. When all your changes are in the index, your working directory is now clean, and git diff reports nothing.
git diff drops you into a simple program known as a pager. The down and up arrows work to scroll through the output, the spacebar advances by a screenful, and the b key takes you back a screenful. Press h for help on more commands, and use q to quit and exit.
As you add changes to the index, you can see what your future commit looks like by running git diff --cached, which will report the accumulated changes in the index, using the same format.
After all this add’ing and diff’ing, making the commit itself is straightforward. git commit will do the job—moving changes from the index into a single collection of changes, a changeset, to be stored, managed and manipulated by git. Technically, this is an irreversible action, but in practice there are many ways to back-up and have a do-over, especially when you are solo. So don’t panic.
The -m switch allows you to make a commit message on the command line, which you should enclose in quotation marks (single or double, allowing use of the other kind in your message, if needed). Without it, git will dump you in your editor, a step we prefer to avoid. Either way, you will always want to include a commit message. They can have multiple lines, but in practice we like to keep them to one concise line, leading with a capitalized action verb, and not more than about sixty characters. These messages will help you find your way in your git repository, and they will be the first thing others see if they peruse your repository. We think they are worth some thought toward making them informative and helpful, rather than sloppy and uninformative. You are an author, no? Treat your commit messages much like the entries that form a Table of Contents.
Figure 2.1.2. xkcd “Git Commit” 1  (used with permission)
Unlike git add, the git commit command does acknowledge that something happened. The message shown above has the name of the branch (master) and the commit message. But what is c0f19a2? Every commit you ever create gets a hexadecimal identifier that is probabilistically unique across all the git repositories ever made and that will ever be made. And the first seven characters are usually good enough to uniquely identify commits within your repository.
Try the new command git log. It may not show much, but it will list information on every commit on your current branch. And you will see some huge commit hashes. There are \(268\,435\,456\) different possibilities for seven hexadecimal digits. A full 40-digit commit hash has about \(10^{48}\) possibilities. This is not a technical aside, we will see soon enough the critical role commit hashes play in a git repository (see Section 2.3). Even though your commit about the train station incident might often be shown shorthand as c0f19a2, it may in reality be
c0f19a223404c394d592661532747527038754e
which you would see in the log.
Here are two more useful diagnostic commands. git status will tell you which files are dirty, which files have changes staged in the index and destined for the next commit, and which files are lurking about in your directory, but which you have not ever told git about. This is a good command to run frequently, especially when you are beginning. Finally git ls-files will output all the files git has changes for. This one is interesting, but less useful day-to-day.
We will primarily teach by guiding you through exercises. They almost always have extra information, so read them just for that. Experiment with a scratch repository where you can try different things without the inevitable mistakes also being worrisome disasters. And realize you can always start over.

Checkpoint 2.1.3. My First Repository.

  1. Consult Appendix A for instructions, setup git, and init an empty repository.
  2. Make several commits, creating and adding at least three files into the repository. Use git diff, git diff --cached, git status, and git log liberally in the process.
    Put some non-trivial content into each file (though it does not need to be excessive). We will use this repository for future exercises, so do not get rid of it. Put a typographical mistake into one of your three files.
  3. Do not experiment with branches, we will do that next.
  4. Once completed, use git show master to see the changes in your last commit, in the diff format. Use git show master~1 to see the changes in the commit just prior to that one. And git show master~2 for the one before that. Try replacing the references (master~N) by the first seven or eight digits of the commit hash, which you can get from the output of git log, and see that this is the functional equivalent of using branch names with relative references.
When you are done, the logical arrangement of your commits might look like the following diagram. We list older commits at the bottom and do not include commit messages. We use a 4-digit hash, which will uniquely identify each commit. The name with the arrow points to the tip of the branch with that name. The commit at the bottom, the first commit ever, is known as the root commit. As your repository gains more branches, it will look more and more like a tree than a twig. This diagram should be similar in spirit to what git log reports for this simple first exercise.
Figure 2.1.4. Completed First Repository