How Git Works Internally: Understanding the Magic Behind Version Control
Explore Git's internal architecture — blobs, trees, commits, and SHA-1 hashes. Learn what really happens when you run git add and git commit.
BCA student and developer who loves learning in public. I build web and mobile projects, explore databases and backend systems, and document my journey through blogs. Currently focused on writing clean code and growing one commit at a time.
Ever stop and think about what really happens when you run git add or git commit? A lot of developers use Git every day, but honestly, most don’t know much about what’s going on behind the scenes. Let’s change that. Let’s take a peek under the hood and see what makes Git tick.
Introduction
So, here’s the thing: Git isn’t just a version control tool. At its core, it’s more like a content-addressable filesystem, with a version control system layered on top. Basically, you give Git some content, and it hands you back a unique key you can use to get that content again. It’s a pretty clever system.
Once you get how Git actually works inside, you stop just memorizing commands. You start to really understand it. Suddenly, debugging weird issues gets easier. You can fix mistakes without panicking. And those advanced features? They don’t seem so intimidating anymore.
Prerequisites
Basic familiarity with Git commands (
init,add,commit)Access to a terminal/command line
A code editor
The .git Folder Explained
When you run git init, Git creates a hidden .git directory. This folder is your repository — everything Git needs lives here.
$ git init my-project
$ cd my-project
$ ls -la .git/
Here's what you'll find:
.git/
├── HEAD # Points to current branch
├── config # Repository-specific configuration
├── description # Used by GitWeb (rarely needed)
├── hooks/ # Client/server-side scripts
├── info/ # Global exclude patterns
├── objects/ # All content (blobs, trees, commits)
└── refs/ # Pointers to commits (branches, tags)

The Four Critical Components
| Component | Purpose |
HEAD | A symbolic reference pointing to the current branch |
index | The staging area (created after first git add) |
objects/ | The object database — stores all your content |
refs/ | References to commit objects (branches, tags, remotes) |
💡 Tip: Want to back up your entire repository? Just copy the
.gitfolder — it contains everything!
Git Objects: The Building Blocks
At the heart of Git, things are refreshingly straightforward. Everything in your repository boils down to just three main object types:
1. Blob (Binary Large Object)
A blob is as simple as it gets. It holds the content of a file—nothing else. No filenames, no permissions, no extra details. Just the raw data.
# See what type of object a hash represents
$ git cat-file -t 83baae61804e65cc73a7201a7252750c76066a30
blob
# View the content of a blob
$ git cat-file -p 83baae61804e65cc73a7201a7252750c76066a30
Hello, World!
ℹ️ Note: Two files with identical content share the same blob object, regardless of their names. This is how Git achieves efficient storage!
2. Tree
Think of a tree as Git’s way of organizing files and folders.
It points to blobs (which are files)
other trees (which are subdirectories)
and keeps track of filenames and permissions.
$ git cat-file -p master^{tree}
100644 blob a906cb2a4a904a152e80877d4088654daad0c859 README.md
100644 blob 8f94139338f9404f26296befa88755fc2598c289 index.js
040000 tree 99f1a6d12cb4b6f19c8655fca46c3ecf317074e0 src
The numbers represent file modes:
100644— Normal file100755— Executable file040000— Directory (tree)120000— Symbolic link
3. Commit
This is where it all comes together:
A commit points to a specific tree (your project at that moment)
links back to its parent commit or commits (so you can trace the history)
and records who made the change
when it happened
and the message that explains why
$ git cat-file -p HEAD
tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579
parent cac0cab538b970a37ea1e769cbbde608743bc96d
author John Doe <vikash@example.com> 1609459200 +0000
committer John Doe <vikash@example.com> 1609459200 +0000
Add new feature to the project

The Object Relationship Model
Here's how these objects connect:
Commit ──────────────────────────────────────────────────┐
│ │
├── tree: d8329fc... ─────────────────────────┐ │
├── parent: cac0cab... │ │
├── author: John Doe ▼ │
├── committer: John Doe Tree │
└── message: "Add feature" │ │
├── blob: README.md
├── blob: index.js
└── tree: src/
└── blob: app.js
How Git Tracks Changes
Git doesn’t just save the changes between files. Instead, it takes full snapshots each time. Sounds like it’d eat up a ton of space, right? Don’t stress—Git handles storage in a really smart way.
The Staging Area (Index)
The staging area lives in a file called .git/index. It keeps track of what you’re about to commit next. Picture it like a draft of your upcoming snapshot, not the final version, but pretty close.
Working Directory Staging Area Repository
│ │ │
│ git add file.txt │ │
│ ─────────────────────► │ │
│ │ git commit │
│ │ ─────────────────► │
│ │ │
How Git Stores Objects
Every object is stored in .git/objects/ using its SHA-1 hash as the filename:
$ find .git/objects -type f
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
Notice the directory structure:
First 2 characters of the hash → directory name
Remaining 38 characters → filename
This prevents any single directory from having too many files.
SHA-1 Hashes and Data Integrity
Git uses a 40-character SHA-1 hash to identify every object. This isn’t just some random string—it’s a cryptographic fingerprint of the object’s actual content.
How Git Calculates Hashes
Git hashes the content along with a header:
header = "blob " + content.length + "\0"
hash = SHA1(header + content)
For example, hashing "Hello, World!":
$ echo -n "Hello, World!" | git hash-object --stdin
8ab686eafeb1f44702738c8b0f24f2567c36da6d
Why This Matters
Data Integrity: If even a single bit in an object change, the hash changes completely. That way, Git spots any corruption right away
Deduplication: When two files have the same content, they get the same hash. Git only stores them once
Immutability: You can’t tweak an object without changing its hash, and if you do, you have to update every reference pointing to it.
# Verify object integrity
$ git fsck
Checking object directories: 100% (256/256), done.
Warning: Never manually edit files in
.git/objects/. You'll corrupt your repository because the filename (hash) won't match the content.
What Actually Happens When You Run git add and git commit
Let’s break down what’s really going on under the hood with these commands.
What git add Does
So, you hit git add file.txt. Here’s what happens:
Git takes the contents of
file.txtand hashes it. This creates what’s called a “blob” object.Git stores that blob inside the
.git/objects/directory.it updates the index (that’s
.git/index) to include your file. That way, Git knows you want to track this exact version for your next commit.
# Before git add
$ git status
Untracked files:
file.txt
# Run git add
$ git add file.txt
# A new blob is created
$ find .git/objects -type f
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30
What git commit Does
When you run git commit -m "message", here’s what goes on:
Git takes a snapshot of everything you’ve staged and creates a tree object.
Then it builds a commit object. This one points to the tree and adds info like who made the commit and when.
Finally, Git moves the branch pointer forward, so it now points to your new commit.
# Create a commit
$ git commit -m "Initial commit"
# See the new objects
$ git cat-file -p HEAD
tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579
author You <you@email.com> 1609459200 +0000
committer You <you@email.com> 1609459200 +0000
Initial commit

The Complete Picture
Working Directory
│
git add file
│
▼
┌─────────────────────────────────────────┐
│ Staging Area (Index) │
│ ┌────────────────────────────────────┐ │
│ │ file.txt → blob 83baae61... │ │
│ └────────────────────────────────────┘ │
└─────────────────────────────────────────┘
│
git commit -m "msg"
│
▼
┌─────────────────────────────────────────┐
│ Object Database │
│ ┌──────────────────────────────────┐ │
│ │ Commit: abc123... │ │
│ │ └── Tree: d8329f... │ │
│ │ └── Blob: 83baae... │ │
│ └──────────────────────────────────┘ │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ refs/heads/main │
│ Points to: abc123... │
└─────────────────────────────────────────┘
Exploring Git Internals Yourself
Here are some commands to explore your own repositories:
Plumbing Commands (Low-Level)
# Hash content without storing
$ echo "test content" | git hash-object --stdin
# Hash and store content
$ echo "test content" | git hash-object -w --stdin
# View object type
$ git cat-file -t <hash>
# View object content
$ git cat-file -p <hash>
# View object size
$ git cat-file -s <hash>
Examine Your Repository
# List all objects
$ find .git/objects -type f
# View the staging area
$ git ls-files --stage
# Check HEAD reference
$ cat .git/HEAD
# Check branch reference
$ cat .git/refs/heads/main
# Verify repository integrity
$ git fsck --full
Best Practices
Don’t mess with files inside
.git/by hand. Always use Git commands.Run
git fsckonce in a while. It checks if your repo is healthy.Try to understand what each command does instead of just memorizing them.
Want to poke around? Set up a test repo and experiment there. It’s safer.
Common Mistakes to Avoid
Never delete files from
.git/objects/. You’ll break your repo for good.Don’t edit files in
.git/objects/pack/either. These are compressed—leave them alone.If
git fsckthrows warnings, don’t ignore them. They’re usually a sign something’s wrong with your data.
Wrapping Up
Git’s design is actually pretty straightforward when you break it down:
Blobs hold your file data.
Trees keep track of directory layouts.
Commits save snapshots and remember your history.
SHA-1 hashes glue everything together, making sure nothing gets lost or duplicated.
The staging area lines up your next changes.
References—like branches and tags—just point to specific commits.
Once you get how these pieces fit, Git stops being this black box and starts making sense. You can actually reason about what’s going on under the hood.
Want to dig deeper?
Try using
git cat-fileandgit hash-objecton your own projectsExplore how
git pack-objectscompresses your repositoryLearn about
git reflogfor recovering lost commits
Sources:
If this helped, stick around for more hands-on guides and deep dives into the tool’s developers use every day.






