Skip to main content

Command Palette

Search for a command to run...

Demystifying Git: Internals Explained

Updated
9 min read
Demystifying Git: Internals Explained
S
Full-stack developer obsessed with performance, scalability, and clean systems. I use Arch btw.

Introduction

Git was created by Linus Torvalds (the creator of the Linux kernel) in April 2005, as a system for managing the Linux codebase over time. He created the initial version of Git in a very short time span of 10 days! Fast forward to 2025, over 90% of developers use this amazing software in every project, multiple businesses have been built around it, such as GitHub, GitLab, etc., and it has truly revolutionized writing software forever. Imagine creating the foundation of an entire system that will change coding forever, only in a span of 10 days!

However, many of us developers still do not understand how Git actually works internally. The understanding about the internals of Git is surely not going to help you in your day-to-day git activities, but it will make you a better engineer overall, helping you to understand the thought process behind such a robust system.


Pre-Requisites

I assume if you’re reading this blog post, you already know about what Git is, what problem does it solve, the basic git workflow and some basic commands too. If you do not, you can read my previous blog posts that cover these exact concepts:

  1. Why Version Control Exists: A Video Game Perspective

  2. Learning Git Commands Through Video Game Mechanics

Once you know these concepts, you will be ready to deep-dive into the mysteries of Git!


Porcelain vs Plumbing Commands

All the Git commands are divided into 2 categories:

  1. Porcelain Commands: These are high-level, friendly commands that we actually use in our day-to-day Git usage. You might already be familiar with some of these commands like init, add, commit, status, etc.

  2. Plumbing Commands: These are low-level internal commands that are not meant for daily use. They can be used to inspect/manipulate Git internals. Some of these commands are hash-object, cat-file, write-tree, etc.

It is also important to understand that the Porcelain commands that we use in our day-to-day Git usage, are simply a high-level abstraction on top of one or more Plumbing Commands. In other words, they are built on top of one or more Plumbing commands only.


The .git Directory: Git’s Brain

The first command you run in order to start using Git in any project is:

git init

This simple porcelain command essentially converts your normal folder holding together a bunch of random code files, into a proper git source code repository. But how does git do it?

When you run this command, Git adds a hidden directory called .git at the root of your project directory. You can confirm this using the following command on unix terminals:

ls -a

This command will basically list out all the files and folders in that directory, including hidden ones (because of the -a flag). The output would look something like this:

.  ..  main.c  .git

And that is literally it! A single directory is all that’s needed to manage all the files of your projects, every commit you make, every branch you make, every merge you do, literally everything! Let’s see how.


Anatomy of the .git Directory

The .git directory consists of several files and sub-directories that have specific purposes in Git’s overall design. A high-level overview of the directory would look something like this:

.git/
├── objects/
├── refs/
│   └── heads/
│       ├── main
│       ├── feature1_branch
│       └── feature2_branch
├── HEAD
├── index
└── config

There are a bunch of other files and directories too, but we will focus upon these for now.


How Git Stores Data?

Git stores content using hashes as keys; same content produces the same hash.

Git is a content-addressable file system. This means that Git stores data by using the content itself as the “address” (ID). That address/ID is more properly referred to as a “hash” in Git terminology.

So for example, if you have the following content in any of your files:

Hello World

Then Git will generate a hash for this content that would look something like this:

0a4d55a8d778e5022fab701977c5d840bbc486d0

Technically, Git hashes the content along with a small header (object type + size).

💡
Git uses the SHA-1 algorithm to generate these hashes. The output is a 160-bit string encoded in hexadecimal, and since 1 hex digit = 4 bits, 160/4 = 40 characters. In short, each such hash ID is a 40 character long hexadecimal string.

Objects: Git’s Driving Force

Now that we understand that Git stores data using hashes, we need to understand: What exactly is Git storing behind those hashes? The answer is: Git stores everything as objects. Every commit you make, every file, and every folder structure is represented using Git objects, which are stored in the .git/objects directory.

Git primarily works with three main types of objects:

Blob (Binary Large Object)

Blob objects store the actual contents of your file. They do not store any metadata, no filename, no path, no permissions, nothing other than the contents of your file.

Do note that the contents of your files are not simply copied over to the .git/objects directory as blob objects. Instead, Git creates a blob object for the content and stores it in a compressed format (using zlib) to save space.

The following plumbing commands can be used on blob objects:

git cat-file -p blob_hash   # shows original content
git cat-file -s blob_hash   # shows uncompressed size

Tree

Tree objects store the actual directory structure. They store a mapping that looks like this:

100644 blob a1b2c3  main.cpp
040000 tree d4e5f6  src

The above output uses the format :

mode object_type hash filename

Basically, trees map blobs to their filenames. Trees also map sub-trees (i.e. sub-directories) to their corresponding tree objects.

The following plumbing command can be used to view a tree object tied to a particular commit:

git ls-tree commit_hash

In simpler words, a tree represents a directory and can point to:

  • blobs (raw file content)

  • other trees (sub-directories)

💡
Why are the filenames stored in the tree object and not the blob itself? If multiple files have the same content, the tree can simply create multiple entries that point to the same blob. This makes Git space-efficient.

Commit

A commit object stores the snapshot, the history, and all the metadata. The following plumbing command can be used to view a commit object:

git cat-file -p commit_hash

The output would look something like this:

tree root_tree_hash
parent parent_commit_hash (0 or more)
author name email timestamp
committer name email timestamp

commit message
💡
The entire commit timeline forms a graph, not a simple list. This is because a commit can have multiple parents (merge), and also multiple commits can share the same parent (branching). More accurately, a commit timeline is a DAG (Directed Acyclic Graph).

Relation Between all the Objects

Git commit points to a tree, and the tree points to blobs and subtrees.
  • Commits define history.

  • Trees define structure.

  • Blobs define content.

A commit points to exactly one tree. That tree represents the entire directory structure of the repository and also further points to blobs which hold the actual data.


Other Parts of the .git Directory

Let’s quickly go over the other files and directories of the .git directory:

refs

refs is a directory with the following structure:

refs/
└── heads/
    ├── main
    ├── feature1_branch
    └── feature2_branch

These files represent different branches and store the hash that points to the latest commit in the respective branches.

This is a file that simply stores the name of the current branch. If your current branch is main, the HEAD file’s content would be this:

ref: refs/heads/main

index

This is a file that represents the Git staging area. It stores information about the files you’ve added using git add, so Git knows what exactly to include in the next commit.

config

This is a file that stores repository-specific settings, like the remote URL (origin), default branch, and other local configurations.


Internal Flow of git add and git commit

Staging

Git add flow: working directory → blob object → staging area (index).
  1. Let’s say you edit a file called file.txt in your working directory and run the git add file.txt command.

  2. Git reads the content of the file, compresses it, and stores it as a blob object inside the .git/objects directory. Do note that if the same content already exists, Git just reuses the existing blob hash, without creating a new one.

  3. Git updates the index i.e., the staging area. It simply adds a record in the .git/index file, which basically states “For file.txt, the staged version is this blob hash”.

Committing

Git commit flow: staging area → tree object → commit object → branch pointer updated.
  1. After the staging phase is over, and you run the git commit command, Git first reads the staging area (index) with the aim of turning it into a permanent snapshot in Git history.

  2. Git converts the staging area into a directory structure and creates a tree object.

  3. Git creates a commit object that stores:

    • the tree hash (snapshot)

    • the parent commit hash (previous commit)

    • author info + timestamp

    • commit message

  4. Git moves the branch pointer by updating the .git/refs/heads/branch_name file, to make sure that particular branch now points to the new commit hash.


Summary

  • Commit → points to a tree → points to blobs and sub-trees

  • Refs/branches → point to commits

  • HEAD → tells which branch you’re on


Conclusion

Git might look like magic from the outside, but internally it’s built on a very simple idea: store everything as objects, and reference them using hashes. Once you understand blobs, trees, commits, and how files like HEAD, refs, and index work together, Git starts to feel a lot less mysterious.