Demystifying Git: Internals Explained

Introduction
Git was created by Linus Torvalds (the creator of the Linux kernel) in April 2005, as a system for managing the Linux codebase over time. He created the initial version of Git in a very short time span of 10 days! Fast forward to 2025, over 90% of developers use this amazing software in every project, multiple businesses have been built around it, such as GitHub, GitLab, etc., and it has truly revolutionized writing software forever. Imagine creating the foundation of an entire system that will change coding forever, only in a span of 10 days!
However, many of us developers still do not understand how Git actually works internally. The understanding about the internals of Git is surely not going to help you in your day-to-day git activities, but it will make you a better engineer overall, helping you to understand the thought process behind such a robust system.
Pre-Requisites
I assume if you’re reading this blog post, you already know about what Git is, what problem does it solve, the basic git workflow and some basic commands too. If you do not, you can read my previous blog posts that cover these exact concepts:
Once you know these concepts, you will be ready to deep-dive into the mysteries of Git!
Porcelain vs Plumbing Commands
All the Git commands are divided into 2 categories:
Porcelain Commands: These are high-level, friendly commands that we actually use in our day-to-day Git usage. You might already be familiar with some of these commands like
init,add,commit,status, etc.Plumbing Commands: These are low-level internal commands that are not meant for daily use. They can be used to inspect/manipulate Git internals. Some of these commands are
hash-object,cat-file,write-tree, etc.
It is also important to understand that the Porcelain commands that we use in our day-to-day Git usage, are simply a high-level abstraction on top of one or more Plumbing Commands. In other words, they are built on top of one or more Plumbing commands only.
The .git Directory: Git’s Brain
The first command you run in order to start using Git in any project is:
git init
This simple porcelain command essentially converts your normal folder holding together a bunch of random code files, into a proper git source code repository. But how does git do it?
When you run this command, Git adds a hidden directory called .git at the root of your project directory. You can confirm this using the following command on unix terminals:
ls -a
This command will basically list out all the files and folders in that directory, including hidden ones (because of the -a flag). The output would look something like this:
. .. main.c .git
And that is literally it! A single directory is all that’s needed to manage all the files of your projects, every commit you make, every branch you make, every merge you do, literally everything! Let’s see how.
Anatomy of the .git Directory
The .git directory consists of several files and sub-directories that have specific purposes in Git’s overall design. A high-level overview of the directory would look something like this:
.git/
├── objects/
├── refs/
│ └── heads/
│ ├── main
│ ├── feature1_branch
│ └── feature2_branch
├── HEAD
├── index
└── config
There are a bunch of other files and directories too, but we will focus upon these for now.
How Git Stores Data?
Git is a content-addressable file system. This means that Git stores data by using the content itself as the “address” (ID). That address/ID is more properly referred to as a “hash” in Git terminology.
So for example, if you have the following content in any of your files:
Hello World
Then Git will generate a hash for this content that would look something like this:
0a4d55a8d778e5022fab701977c5d840bbc486d0
Technically, Git hashes the content along with a small header (object type + size).
Objects: Git’s Driving Force
Now that we understand that Git stores data using hashes, we need to understand: What exactly is Git storing behind those hashes? The answer is: Git stores everything as objects. Every commit you make, every file, and every folder structure is represented using Git objects, which are stored in the .git/objects directory.
Git primarily works with three main types of objects:
Blob (Binary Large Object)
Blob objects store the actual contents of your file. They do not store any metadata, no filename, no path, no permissions, nothing other than the contents of your file.
Do note that the contents of your files are not simply copied over to the .git/objects directory as blob objects. Instead, Git creates a blob object for the content and stores it in a compressed format (using zlib) to save space.
The following plumbing commands can be used on blob objects:
git cat-file -p blob_hash # shows original content
git cat-file -s blob_hash # shows uncompressed size
Tree
Tree objects store the actual directory structure. They store a mapping that looks like this:
100644 blob a1b2c3 main.cpp
040000 tree d4e5f6 src
The above output uses the format :
mode object_type hash filename
Basically, trees map blobs to their filenames. Trees also map sub-trees (i.e. sub-directories) to their corresponding tree objects.
The following plumbing command can be used to view a tree object tied to a particular commit:
git ls-tree commit_hash
In simpler words, a tree represents a directory and can point to:
blobs (raw file content)
other trees (sub-directories)
Commit
A commit object stores the snapshot, the history, and all the metadata. The following plumbing command can be used to view a commit object:
git cat-file -p commit_hash
The output would look something like this:
tree root_tree_hash
parent parent_commit_hash (0 or more)
author name email timestamp
committer name email timestamp
commit message
Relation Between all the Objects
Commits define history.
Trees define structure.
Blobs define content.
A commit points to exactly one tree. That tree represents the entire directory structure of the repository and also further points to blobs which hold the actual data.
Other Parts of the .git Directory
Let’s quickly go over the other files and directories of the .git directory:
refs
refs is a directory with the following structure:
refs/
└── heads/
├── main
├── feature1_branch
└── feature2_branch
These files represent different branches and store the hash that points to the latest commit in the respective branches.
HEAD
This is a file that simply stores the name of the current branch. If your current branch is main, the HEAD file’s content would be this:
ref: refs/heads/main
index
This is a file that represents the Git staging area. It stores information about the files you’ve added using git add, so Git knows what exactly to include in the next commit.
config
This is a file that stores repository-specific settings, like the remote URL (origin), default branch, and other local configurations.
Internal Flow of git add and git commit
Staging
Let’s say you edit a file called
file.txtin your working directory and run thegit add file.txtcommand.Git reads the content of the file, compresses it, and stores it as a blob object inside the
.git/objectsdirectory. Do note that if the same content already exists, Git just reuses the existing blob hash, without creating a new one.Git updates the index i.e., the staging area. It simply adds a record in the
.git/indexfile, which basically states “Forfile.txt, the staged version is this blob hash”.
Committing
After the staging phase is over, and you run the
git commitcommand, Git first reads the staging area (index) with the aim of turning it into a permanent snapshot in Git history.Git converts the staging area into a directory structure and creates a tree object.
Git creates a commit object that stores:
the tree hash (snapshot)
the parent commit hash (previous commit)
author info + timestamp
commit message
Git moves the branch pointer by updating the
.git/refs/heads/branch_namefile, to make sure that particular branch now points to the new commit hash.
Summary
Commit → points to a tree → points to blobs and sub-trees
Refs/branches → point to commits
HEAD → tells which branch you’re on
Conclusion
Git might look like magic from the outside, but internally it’s built on a very simple idea: store everything as objects, and reference them using hashes. Once you understand blobs, trees, commits, and how files like HEAD, refs, and index work together, Git starts to feel a lot less mysterious.




