Alice and Bob are two professional cryptographers who have discovered a weakness in a critical algorithm underlying much of the world’s electronic banking programs. They need to get the details out quickly as a research paper that they will host on the arXiv for the security community to vet. Alice and Bob have known each other for years. They trust each others technical skills and writing style, and even better, they both have GitHub accounts. In the best traditions of cryptography research, they decide to write their paper openly as a public GitHub repository, and they decide to host the repository in Alice’s account. Everything else will be discussed on GitHub.
1
arxiv.org/
Work this exercise playing the role of Alice. If you have a friend who can be Bob, all the better, but you can also play both sides of the collaboration yourself and get almost as much out of the exercise (if Bob is somebody else, then he need a GitHub account, but if you are playing both sides, then your one GitHub account is enough.). Alice (you!) will log into her GitHub account and initiate a new repository. Recall that in Chapter 2 we created a new repository on our local computer at the command-line with
git init
. Now Alice will let GitHub do that step since GitHub will automatically configure the repository for subsequent communication.See Section B.2 and Section B.3 for instructions on the steps in this paragraph. Alice will create a new repository and name it
banking-paper
. She will make Bob a collaborator on the repository since she knows Bob’s username on GitHub from their previous collaborations. So there is now a fresh repository on GitHub, which Alice and Bob can manipulate. We are going to call this the definitive repository, as it will hold the “official” version of their paper. In a minute we will setup Alice and Bob with local copies, but they have agreed that those are just their local workspaces and the repository on GitHub always holds the latest, and presumably best, version of their paper.
Section B.4 contains the necessary instructions for this paragraph, but are more general, so read them and this paragraph through completely before doing anything. In particular, ignore any discussion of “forks” until Chapter 4. Alice should make a copy of the fresh repository onto her work computer, and Bob should do the same. If you are playing both sides this exercise yourself, copy the repository once, and then rename the
banking-paper
directory to alice-banking
. Then copy again and rename the resulting directory as bob-banking
. These changes have zero effect on how your repository behaves, but you will need to mentally figure out which files you should be working with in the remainder of the exercise.In principle, Alice and Bob are totally setup and organized, and never even need to visit the GitHub site ever again. But GitHub has some nice tools and Alice and Bob have decided to be 100% transparent in their work. A GitHub issue is like a topic on an online discussion forum. It is designed mostly for reporting and discussing bugs in software, or requesting and implementing new features in software. But they can also be used for planning and discussion. Alice and Bob would like to plan their writing as an open discussion on GitHub, deciding that Alice will concentrate on the introduction since she is the better overall writer, and Bob will therefore get started on the section with the details of the vulnerability. They will work more closely on the final section containing recommendations.
So in our exercise, Alice should create a branch off of
master
named intro
, create a file intoduction.txt
, add it to her branch, make some edits, commit the changes, and so on. Bob should do similarly but make a branch off master
named vulnerable
where he adds and edits a file vulnerability.txt
as a series of commits. Recall that Principle 2.2.3 says Alice and Bob should do all of their work on branches.Alice had the simpler task, so let us assume she finishes the introduction first. She does not know she is first, she does not even have any idea where Bob is in his writing. She has been doing her best to get the introduction right, and to not disturb Bob, who is presumably also working hard. So Alice suspects there are no new commits on the
master
branch, but does not really know. OK, Alice is going to update master
with a pull, see no new commits there, do a fast-forward merge of her intro
branch into master
locally, and then push her master
branch to GitHub. We will do the details carefully, but recognize that the push
and pull
are the only new concepts we did not cover in Chapter 2.But first, a bit of diagnostic work. Alice’s repository was copied from GitHub and therefore is aware of its heritage.
alice@work:~/papers/banking-paper$ git remote -v
origin https://github.com/alice-jones/banking-paper.git (fetch)
origin https://github.com/alice-jones/banking-paper.git (push)
Alice’s local version of the repository has a remote that carries the information necessary to communicate with the definitive repository. Since Bob made a similar copy he has an identical remote (remember the co-authors are sharing a definitive repository in Alice’s account). The remote goes by the name
origin
, which is customary, similar to the master
branch. You can add as many remotes as you like, putting your repository in contact with as many different copies as you can think of.alice@work:~/papers/banking-paper$ git checkout master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
Notice that
git
seems to know something about the state of your local master
in relation to the origin/master
in the copy on GitHub. Let us check anyway.alice@work:~/papers/banking-paper$ git pull
Already up-to-date.
So Alice attempted to update her local
master
branch from the definitive repository on GitHub, but there was nothing new to use as an update (her master
is up-to-date
). As we suspected (or hoped!), Bob is still working on the technical details locally. In case it was not obvious, we did not have to bother Bob with an email asking where he was with his task. Now Alice is going to merge her intro
branch into master
, which will be a fast-forward merge since master
has not evolved beyond her original branch point for intro
.alice@work:~/papers/banking-paper$ git merge intro
Fast-forward
introduction.txt | +++++++++++++
1 file changed, 25 insertions(+), 0 deletions(-)
create mode 100644 introduction.txt
Alice has incorporated her introduction to the
master
branch, but now will make it part of the definitive repository with a push
. Note that Alice is still on her master
branch.alice@work:~/papers/banking-paper$ git push
Username for 'https://github.com': alice-jones
Password for 'https://alice-jones@github.com': xxxxxxxxxx
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 287 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To https://github.com/alice-jones/banking-paper.git
80a9eb3..1aaedaf master -> master
Notice that Alice must sign in to GitHub, since only she and Bob are allowed to modify the repository (the world can view it and copy it, not modify it). So Alice has placed her introduction in the definitive repository without any additional coordination with Bob. Time is important, so Alice initiates an issue on GitHub to discuss their recommendations, which will now interrupt Bob, but they need to form a plan. However, without waiting for Bob’s reply, Alice creates a new branch off
master
named last-chapter
where she simply adds an empty file named recommendations.txt
. She repeats the steps above and with a fast-forward merge, updates her local master
and pushes it to the definitive repository on GitHub. Alice can now clean up by deleting her intro
and last-chapter
branch pointers that have become obsolete.Now that Alice has communicated with a public repository, known to the world, it is the right time to introduce a principle that we will illustrate subsequently.
Principle 3.1.2. Never Alter a Public Commit.
Never, ever, alter in any way a commit that has been made available to anybody else.
We discussed the nature of commit hashes in Section 2.3. We have seen how a local rebase changes commit hashes in Checkpoint 2.2.1. And we have a principle about a
rebase
changing hashes, while a merge
does not (Principle 2.3.1). Alice can rebase her branch all she wants within her repository on her local computer, but the instant she pushes commits to the definitive repository, they become available to her co-author (Bob), and to the entire world. The commit hash for each of these commits is a globally unique ID (GUID). There are ways to modify these public commits in the definitive repository, but this would be tantamount to chopping off somebody’s finger and replacing it with a new one with a different fingerprint. Don’t do it!Why not? All of
git
’s coordination is predicated on identical commits having identical ID (the commit hash). In Chapter 4 we will expand our circle of contributors to anybody in the world (don’t panic, we will have a procedure for approving changes before they go into the definitive repository). Manipulating a public commit will totally confuse git
, make a big mess, and infuriate your collaborators, whose copies of the repository are no longer consistent with your ill-advised action. This may be the only advice that all the Internet git
commentators can agree on. If you follow the procedures we are describing, this will never be a danger. But when you push a commit to the master branch of the definitive repository and feel like you made a mistake, resist the temptation to go backwards locally and then do a “forced push” to the definitive repository. You might get away with it for a while, but eventually you will regret it. Just live with it (a misspelled commit message), or add another commit to fix your mistake (a grammatically poor sentence). And forget we even mentioned the possibility of changing public commits.Back to our exercise, now Bob has finished up his section, so he wants to make it part of the definitive repository. He suspects Alice has finished the introduction, since she was eager to discuss the recommendations. So, just like Alice, he is going to update his
master
branch.bob@laptop:~/publications/banking-paper$ git checkout master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
Notice that this particular
up-to-date
message is misleading. Bob will really check with a pull. Remember that Bob has a remote named origin
, with connection information for the definitive repository. (Why doesn’t a pull require a login?)bob@laptop:~/publications/banking-paper$ git pull
remote: Counting objects: 2, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 2 (delta 0), reused 2 (delta 0), pack-reused 0
Unpacking objects: 100% (2/2), done.
From https://github.com/alice-jones/banking-paper
1aaedaf..1912c1b master -> origin/master
Updating 80a9eb3..1912c1b
Fast-forward
introduction.txt | +++++++++++++
recommendations.txt | 0
2 files changed, 25 insertions(+), 0 deletions(-)
create mode 100644 introduction.txt
create mode 100644 recommendations.txt
Bob just picked up Alice’s introduction section and empty recommendations section from the definitive repository. That’s good. Do you see the subtlety? Think carefully about it for a minute, this is a major concept. The branch pointer
master
in Bob’s local repository just advanced two commits forward from the branch point of the vulnerable
branch he has been working on. Bob sees:bob@laptop:~/publications/banking-paper$ git show-branch
* [master] Empty chapter for recommendations
! [vulnerable] Vulnerability section
--
* [master] Empty chapter for recommendations
* [master^] Introduction
+ [vulnerable] Vulnerability section
*+ [master~2] Initial commit
Before Bob pushes his technical section, he will rebase his
vulnerable
branch onto master
. This would normally risk a merge conflict, but he knows Alice has work only in introduction.txt
, his new work is only in vulnerability.txt
, and recommendations.txt
is empty. So Bob switches to vulnerable
, rebases onto master
, switches to master
, merges vulnerable into master
(which is a fast-forward merge, as expected), pushes master
to origin/master
, and cleans up by deleting the obsolete vulnerable
branch pointer.bob@laptop:~/publications/banking-paper$ git checkout vulnerable Switched to branch 'vulnerable' bob@laptop:~/publications/banking-paper$ git rebase master First, rewinding head to replay your work on top of it... Applying: Vulnerability section bob@laptop:~/publications/banking-paper$ git checkout master Switched to branch 'master' Your branch is up-to-date with 'origin/master'. bob@laptop:~/publications/banking-paper$ git merge vulnerable Updating 1912c1b..bceacaa Fast-forward vulnerability.txt | +++++++++++++ 1 file changed, 90 insertions(+), 0 deletions(-) create mode 100644 vulnerability.txt bob@laptop:~/publications/banking-paper$ git push Username for 'https://github.com': bob-smith Password for 'https://bob-smith@github.com': xxxxxxxxxxx Counting objects: 2, done. Delta compression using up to 8 threads. Compressing objects: 100% (2/2), done. Writing objects: 100% (2/2), 279 bytes | 0 bytes/s, done. Total 2 (delta 1), reused 0 (delta 0) To https://github.com/alice-jones/banking-paper.git 1912c1b..bceacaa master -> master bob@laptop:~/publications/banking-paper$ git branch -d vulnerable Deleted branch vulnerable (was bceacaa).
That looks complicated, but you will do it over and over if you collaborate with this model. Update master from the definitive repository, rebase (advance) your working branch to the tip of master, fast-forward merge your working branch into your local master, then push your improved master to the definitive repository. The clear and present danger here is to forget the first step, updating your master. You may have little idea what has happened to the definitive repository if you have had your head down working, so you need to always remember to update before interacting with the definitive repository.