Building Git

Building Git

English | 2021 | 737 Pages | PDF, EPUB, MOBI | 12 MB

Building Git is a deep dive into the internals of the Git version control system. By rebuilding it in a high-level programming language, we explore the computer science behind this widely used tool. In the process, we gain a deeper understanding of Git itself as well as covering a wide array of broadly applicable programming topics, including:

Unix concepts
Reading and writing from files, making writes appear atomic, prevent race conditions between processes
Launching child processes in the foreground and background, communicating with them concurrently
Displaying output in the terminal, including colour formatting, paged output, and interacting with the user’s text editor
Parsing various file formats, including Git’s Merkle-tree-based commit model, the index, configuration files and packed object files

Data structures
How Git stores content on disk to make effective use of space, make the history efficient to search, and make it easy to detect differences between commits
Using diffs to efficiently update the contents of the workspace when checking out a new commit
Effectively using simple in-memory data structures to solve programming problems
Parsing and interpreting a query language for addressing commits

Concurrent editing
How Git uses branches to model concurrent edits
Algorithms for detecting differences between file versions and merging branches back together
Why merge conflicts happen, how they can be avoided, and how Git helps users prevent lost updates
How merging can be used as the basis for numerous operations to edit the commit history

Software engineering
Bootstrapping and growing a self-hosting system
Test-driven development
Refactoring to enable new feature development
Crash-only software design that allows programs to be interrupted and resumed

Using SSH to bootstrap a network protocol
How Git repositories communicate to minimise the data they need to transfer when fetching content
How the network protocol uses atomic operations to prevent users overwriting each other’s changes