Skip to main content

Git for Authors

Section 3.1 Collaborating across Time and Space

To get started, visit Appendix B, read the advice on making a GitHub account, and make an account. Then come back here to work the following example. To keep things simple, we will walk through an exercise with just two co-authors, but you might imagine up to about five individuals participating in the following.

Checkpoint 3.1.1. Alice and Bob Write Crypto.

Alice and Bob are two professional cryptographers who have discovered a weakness in a critical algorithm underlying much of the world’s electronic banking programs. They need to get the details out quickly as a research paper that they will host on the arXiv 1  for the security community to vet. Alice and Bob have known each other for years. They trust each others technical skills and writing style, and even better, they both have GitHub accounts. In the best traditions of cryptography research, they decide to write their paper openly as a public GitHub repository, and they decide to host the repository in Alice’s account. Everything else will be discussed on GitHub.
Work this exercise playing the role of Alice. If you have a friend who can be Bob, all the better, but you can also play both sides of the collaboration yourself and get almost as much out of the exercise (if Bob is somebody else, then he need a GitHub account, but if you are playing both sides, then your one GitHub account is enough.). Alice (you!) will log into her GitHub account and initiate a new repository. Recall that in Chapter 2 we created a new repository on our local computer at the command-line with git init. Now Alice will let GitHub do that step since GitHub will automatically configure the repository for subsequent communication.
See Section B.2 and Section B.3 for instructions on the steps in this paragraph. Alice will create a new repository and name it banking-paper. She will make Bob a collaborator on the repository since she knows Bob’s username on GitHub from their previous collaborations. So there is now a fresh repository on GitHub, which Alice and Bob can manipulate. We are going to call this the definitive repository, as it will hold the “official” version of their paper. In a minute we will setup Alice and Bob with local copies, but they have agreed that those are just their local workspaces and the repository on GitHub always holds the latest, and presumably best, version of their paper.
Section B.4 contains the necessary instructions for this paragraph, but are more general, so read them and this paragraph through completely before doing anything. In particular, ignore any discussion of “forks” until Chapter 4. Alice should make a copy of the fresh repository onto her work computer, and Bob should do the same. If you are playing both sides this exercise yourself, copy the repository once, and then rename the banking-paper directory to alice-banking. Then copy again and rename the resulting directory as bob-banking. These changes have zero effect on how your repository behaves, but you will need to mentally figure out which files you should be working with in the remainder of the exercise.
In principle, Alice and Bob are totally setup and organized, and never even need to visit the GitHub site ever again. But GitHub has some nice tools and Alice and Bob have decided to be 100% transparent in their work. A GitHub issue is like a topic on an online discussion forum. It is designed mostly for reporting and discussing bugs in software, or requesting and implementing new features in software. But they can also be used for planning and discussion. Alice and Bob would like to plan their writing as an open discussion on GitHub, deciding that Alice will concentrate on the introduction since she is the better overall writer, and Bob will therefore get started on the section with the details of the vulnerability. They will work more closely on the final section containing recommendations.
So in our exercise, Alice should create a branch off of master named intro, create a file intoduction.txt, add it to her branch, make some edits, commit the changes, and so on. Bob should do similarly but make a branch off master named vulnerable where he adds and edits a file vulnerability.txt as a series of commits. Recall that Principle 2.2.3 says Alice and Bob should do all of their work on branches.
Alice had the simpler task, so let us assume she finishes the introduction first. She does not know she is first, she does not even have any idea where Bob is in his writing. She has been doing her best to get the introduction right, and to not disturb Bob, who is presumably also working hard. So Alice suspects there are no new commits on the master branch, but does not really know. OK, Alice is going to update master with a pull, see no new commits there, do a fast-forward merge of her intro branch into master locally, and then push her master branch to GitHub. We will do the details carefully, but recognize that the push and pull are the only new concepts we did not cover in Chapter 2.
But first, a bit of diagnostic work. Alice’s repository was copied from GitHub and therefore is aware of its heritage.
alice@work:~/papers/banking-paper$ git remote -v
origin  https://github.com/alice-jones/banking-paper.git (fetch)
origin  https://github.com/alice-jones/banking-paper.git (push)
Alice’s local version of the repository has a remote that carries the information necessary to communicate with the definitive repository. Since Bob made a similar copy he has an identical remote (remember the co-authors are sharing a definitive repository in Alice’s account). The remote goes by the name origin, which is customary, similar to the master branch. You can add as many remotes as you like, putting your repository in contact with as many different copies as you can think of.
alice@work:~/papers/banking-paper$ git checkout master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
Notice that git seems to know something about the state of your local master in relation to the origin/master in the copy on GitHub. Let us check anyway.
alice@work:~/papers/banking-paper$ git pull
Already up-to-date.
So Alice attempted to update her local master branch from the definitive repository on GitHub, but there was nothing new to use as an update (her master is up-to-date). As we suspected (or hoped!), Bob is still working on the technical details locally. In case it was not obvious, we did not have to bother Bob with an email asking where he was with his task. Now Alice is going to merge her intro branch into master, which will be a fast-forward merge since master has not evolved beyond her original branch point for intro.
alice@work:~/papers/banking-paper$ git merge intro
Fast-forward
 introduction.txt | +++++++++++++
 1 file changed, 25 insertions(+), 0 deletions(-)
 create mode 100644 introduction.txt
Alice has incorporated her introduction to the master branch, but now will make it part of the definitive repository with a push. Note that Alice is still on her master branch.
alice@work:~/papers/banking-paper$ git push
Username for 'https://github.com': alice-jones
Password for 'https://alice-jones@github.com': xxxxxxxxxx
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 287 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To https://github.com/alice-jones/banking-paper.git
   80a9eb3..1aaedaf  master -> master
Notice that Alice must sign in to GitHub, since only she and Bob are allowed to modify the repository (the world can view it and copy it, not modify it). So Alice has placed her introduction in the definitive repository without any additional coordination with Bob. Time is important, so Alice initiates an issue on GitHub to discuss their recommendations, which will now interrupt Bob, but they need to form a plan. However, without waiting for Bob’s reply, Alice creates a new branch off master named last-chapter where she simply adds an empty file named recommendations.txt. She repeats the steps above and with a fast-forward merge, updates her local master and pushes it to the definitive repository on GitHub. Alice can now clean up by deleting her intro and last-chapter branch pointers that have become obsolete.
Now that Alice has communicated with a public repository, known to the world, it is the right time to introduce a principle that we will illustrate subsequently.
We discussed the nature of commit hashes in Section 2.3. We have seen how a local rebase changes commit hashes in Checkpoint 2.2.1. And we have a principle about a rebase changing hashes, while a merge does not (Principle 2.3.1). Alice can rebase her branch all she wants within her repository on her local computer, but the instant she pushes commits to the definitive repository, they become available to her co-author (Bob), and to the entire world. The commit hash for each of these commits is a globally unique ID (GUID). There are ways to modify these public commits in the definitive repository, but this would be tantamount to chopping off somebody’s finger and replacing it with a new one with a different fingerprint. Don’t do it!
Why not? All of git’s coordination is predicated on identical commits having identical ID (the commit hash). In Chapter 4 we will expand our circle of contributors to anybody in the world (don’t panic, we will have a procedure for approving changes before they go into the definitive repository). Manipulating a public commit will totally confuse git, make a big mess, and infuriate your collaborators, whose copies of the repository are no longer consistent with your ill-advised action. This may be the only advice that all the Internet git commentators can agree on. If you follow the procedures we are describing, this will never be a danger. But when you push a commit to the master branch of the definitive repository and feel like you made a mistake, resist the temptation to go backwards locally and then do a “forced push” to the definitive repository. You might get away with it for a while, but eventually you will regret it. Just live with it (a misspelled commit message), or add another commit to fix your mistake (a grammatically poor sentence). And forget we even mentioned the possibility of changing public commits.
Back to our exercise, now Bob has finished up his section, so he wants to make it part of the definitive repository. He suspects Alice has finished the introduction, since she was eager to discuss the recommendations. So, just like Alice, he is going to update his master branch.
bob@laptop:~/publications/banking-paper$ git checkout master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
Notice that this particular up-to-date message is misleading. Bob will really check with a pull. Remember that Bob has a remote named origin, with connection information for the definitive repository. (Why doesn’t a pull require a login?)
bob@laptop:~/publications/banking-paper$ git pull
remote: Counting objects: 2, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 2 (delta 0), reused 2 (delta 0), pack-reused 0
Unpacking objects: 100% (2/2), done.
From https://github.com/alice-jones/banking-paper
   1aaedaf..1912c1b  master     -> origin/master
Updating 80a9eb3..1912c1b
Fast-forward
 introduction.txt    | +++++++++++++
 recommendations.txt | 0
 2 files changed, 25 insertions(+), 0 deletions(-)
 create mode 100644 introduction.txt
 create mode 100644 recommendations.txt
Bob just picked up Alice’s introduction section and empty recommendations section from the definitive repository. That’s good. Do you see the subtlety? Think carefully about it for a minute, this is a major concept. The branch pointer master in Bob’s local repository just advanced two commits forward from the branch point of the vulnerable branch he has been working on. Bob sees:
bob@laptop:~/publications/banking-paper$ git show-branch
* [master] Empty chapter for recommendations
 ! [vulnerable] Vulnerability section
--
*  [master] Empty chapter for recommendations
*  [master^] Introduction
 + [vulnerable] Vulnerability section
*+ [master~2] Initial commit
Before Bob pushes his technical section, he will rebase his vulnerable branch onto master. This would normally risk a merge conflict, but he knows Alice has work only in introduction.txt, his new work is only in vulnerability.txt, and recommendations.txt is empty. So Bob switches to vulnerable, rebases onto master, switches to master, merges vulnerable into master (which is a fast-forward merge, as expected), pushes master to origin/master, and cleans up by deleting the obsolete vulnerable branch pointer.
bob@laptop:~/publications/banking-paper$ git checkout vulnerable
Switched to branch 'vulnerable'
bob@laptop:~/publications/banking-paper$ git rebase master
First, rewinding head to replay your work on top of it...
Applying: Vulnerability section
bob@laptop:~/publications/banking-paper$ git checkout master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
bob@laptop:~/publications/banking-paper$ git merge vulnerable
Updating 1912c1b..bceacaa
Fast-forward
 vulnerability.txt | +++++++++++++
 1 file changed, 90 insertions(+), 0 deletions(-)
 create mode 100644 vulnerability.txt
bob@laptop:~/publications/banking-paper$ git push
Username for 'https://github.com': bob-smith
Password for 'https://bob-smith@github.com': xxxxxxxxxxx
Counting objects: 2, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (2/2), 279 bytes | 0 bytes/s, done.
Total 2 (delta 1), reused 0 (delta 0)
To https://github.com/alice-jones/banking-paper.git
   1912c1b..bceacaa  master -> master
bob@laptop:~/publications/banking-paper$ git branch -d vulnerable
Deleted branch vulnerable (was bceacaa).
That looks complicated, but you will do it over and over if you collaborate with this model. Update master from the definitive repository, rebase (advance) your working branch to the tip of master, fast-forward merge your working branch into your local master, then push your improved master to the definitive repository. The clear and present danger here is to forget the first step, updating your master. You may have little idea what has happened to the definitive repository if you have had your head down working, so you need to always remember to update before interacting with the definitive repository.
Where are we now? Alice and Bob both have the same introduction, and they both have an empty section destined to hold recommendations. Bob has pushed his technical section to the definitive repository, just now. However, Alice does not yet have the technical section on the vulnerability. Notice that at any time, Alice (or Bob) can make a commit on their present branch (so their working directory is clean), checkout master, and pull origin/master from the definitive repository. This will likely advance the branch pointer for master but there is no harm in that at all. It is good, since now Alice’s local master has the latest changes from the definitive repository and she can see how the project is progressing.
Alice can return to her working branch, exactly as it was after she switched away from it. At any time, she can elect to rebase her working branch on a new tip of the master branch. And it would be best practice to rebase frequently and perhaps only encounter minor, easily resolvable merge conflicts regularly, rather than a head-in-the-sand approach that waits to do a single massive rebase before adding to the definitive repository. Even better would be to rebase onto an updated master immediately and then push immediately. Work can continue on a fresh local branch. That feels like a principle.
What does “private” mean here? We have seen that commit hashes form a chain of repeated hashes all the way back to the root commit (Section 2.3), and that we can identify commits by the leading digits of a commit hash. Also, a rebase will always change some commit hashes (Principle 2.3.1). So while the present principle advocates frequent rebases, never perform a rebase that changes a commit hash that has been made available to somebody else. This is the advice contained in Principle 3.1.2, and now could be a good time to back and re-read the discussion that follows it.
We have seen how to pull from the master branch of the definitive repository, and how to push commits to the master branch of the definitive repository. So we have the tools for two-way communication between repositories. We can pull from public repositories at will, but need permission to push to repositories where we are trusted to make unilateral changes.