Skip to main content

Git for Authors

Section 2.3 Commit Hashes

In computer science, a hash function is a many-to-one function that takes lengthy input, massages it, and produces vastly shorter output of a fixed length, called the hash. While the output looks random, the function has no randomness, and the actual function is well-known to everyone. Identical inputs will always produce the same output. But if two inputs differ even slightly, they will have wildly different outputs—this is a design criteria for most hash functions. Similarly, we might require that if we know the hash (output), it is very hard to manufacture input to the hash function to produce that output. For the hash function used in git it is highly unlikely that two different inputs (commits) will produce the same output (hash). So the commit hash is similar to how we use fingerprints or retinal scans to identify humans. If you want to learn more, git uses the SHA-1 hash function, whose tamperproof properties were designed for use in cryptographic applications. It is no longer considered secure for that purpose, but works fine for use in git.
The input git uses to form the hash include things like your name, the date, the commit message, the changes in the commit, and most importantly, the commit hash of the previous commit in the sequence on the branch. This is similar to the blockchain technology used in BitCoin, and lately a darling of fintech (financial technology). A sequence of blocks of information and their hashes, each formed with input including the previous hash, makes it practically impossible to tamper with any one piece of information without radically disturbing every subsequent hash.
Return to the fast-forward merge above to see that there was absolutely no change in the information for each commit and there was no disruption in the sequence (chain) of commits. As explained, the only thing that really happened was that the master block pointer was changed to point to a new commit, the tip of the heroine branch.
Contrast that to when we rebased the hero branch onto the master branch that contained the typo fix. git rewound the commits on hero up to the old branch point, but then replayed them onto a commit (the tip of master) with a different commit hash. All of the replayed commits from the hero branch (we just had one) will have their commit hashes recomputed and will be radically different since the hash of the tip of master is an input to the first commit of the rewound branch. The previous hash(es) are meaningless (and lost to time). Notice that git updates the branch pointer hero to use a new hash from the tip of the replayed branch.
This is a principle that will be important once we get social and work with others.