Skip to main content

Command Palette

Search for a command to run...

How Git Works Internally: Understanding the Magic Behind Version Control

Explore Git's internal architecture — blobs, trees, commits, and SHA-1 hashes. Learn what really happens when you run git add and git commit.

Updated
8 min read
V

BCA student and developer who loves learning in public. I build web and mobile projects, explore databases and backend systems, and document my journey through blogs. Currently focused on writing clean code and growing one commit at a time.

Ever stop and think about what really happens when you run git add or git commit? A lot of developers use Git every day, but honestly, most don’t know much about what’s going on behind the scenes. Let’s change that. Let’s take a peek under the hood and see what makes Git tick.

Introduction

So, here’s the thing: Git isn’t just a version control tool. At its core, it’s more like a content-addressable filesystem, with a version control system layered on top. Basically, you give Git some content, and it hands you back a unique key you can use to get that content again. It’s a pretty clever system.

Once you get how Git actually works inside, you stop just memorizing commands. You start to really understand it. Suddenly, debugging weird issues gets easier. You can fix mistakes without panicking. And those advanced features? They don’t seem so intimidating anymore.

Prerequisites

  • Basic familiarity with Git commands (init, add, commit)

  • Access to a terminal/command line

  • A code editor


The .git Folder Explained

When you run git init, Git creates a hidden .git directory. This folder is your repository — everything Git needs lives here.

$ git init my-project
$ cd my-project
$ ls -la .git/

Here's what you'll find:

.git/
├── HEAD           # Points to current branch
├── config         # Repository-specific configuration
├── description    # Used by GitWeb (rarely needed)
├── hooks/         # Client/server-side scripts
├── info/          # Global exclude patterns
├── objects/       # All content (blobs, trees, commits)
└── refs/          # Pointers to commits (branches, tags)

The Four Critical Components

ComponentPurpose
HEADA symbolic reference pointing to the current branch
indexThe staging area (created after first git add)
objects/The object database — stores all your content
refs/References to commit objects (branches, tags, remotes)

💡 Tip: Want to back up your entire repository? Just copy the .git folder — it contains everything!

Git Objects: The Building Blocks

At the heart of Git, things are refreshingly straightforward. Everything in your repository boils down to just three main object types:

1. Blob (Binary Large Object)

A blob is as simple as it gets. It holds the content of a file—nothing else. No filenames, no permissions, no extra details. Just the raw data.

# See what type of object a hash represents
$ git cat-file -t 83baae61804e65cc73a7201a7252750c76066a30
blob

# View the content of a blob
$ git cat-file -p 83baae61804e65cc73a7201a7252750c76066a30
Hello, World!

ℹ️ Note: Two files with identical content share the same blob object, regardless of their names. This is how Git achieves efficient storage!

2. Tree

Think of a tree as Git’s way of organizing files and folders.

  • It points to blobs (which are files)

  • other trees (which are subdirectories)

  • and keeps track of filenames and permissions.

$ git cat-file -p master^{tree}
100644 blob a906cb2a4a904a152e80877d4088654daad0c859    README.md
100644 blob 8f94139338f9404f26296befa88755fc2598c289    index.js
040000 tree 99f1a6d12cb4b6f19c8655fca46c3ecf317074e0    src

The numbers represent file modes:

  • 100644 — Normal file

  • 100755 — Executable file

  • 040000 — Directory (tree)

  • 120000 — Symbolic link

3. Commit

This is where it all comes together:

  • A commit points to a specific tree (your project at that moment)

  • links back to its parent commit or commits (so you can trace the history)

  • and records who made the change

  • when it happened

  • and the message that explains why

$ git cat-file -p HEAD
tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579
parent cac0cab538b970a37ea1e769cbbde608743bc96d
author John Doe <vikash@example.com> 1609459200 +0000
committer John Doe <vikash@example.com> 1609459200 +0000

Add new feature to the project

The Object Relationship Model

Here's how these objects connect:

Commit ──────────────────────────────────────────────────┐
│                                                        │
├── tree: d8329fc...  ─────────────────────────┐         │
├── parent: cac0cab...                         │         │
├── author: John Doe                           ▼         │
├── committer: John Doe                      Tree        │
└── message: "Add feature"                     │         │
                                               ├── blob: README.md
                                               ├── blob: index.js
                                               └── tree: src/
                                                    └── blob: app.js

How Git Tracks Changes

Git doesn’t just save the changes between files. Instead, it takes full snapshots each time. Sounds like it’d eat up a ton of space, right? Don’t stress—Git handles storage in a really smart way.

The Staging Area (Index)

The staging area lives in a file called .git/index. It keeps track of what you’re about to commit next. Picture it like a draft of your upcoming snapshot, not the final version, but pretty close.

Working Directory          Staging Area           Repository
     │                          │                      │
     │    git add file.txt      │                      │
     │ ─────────────────────►   │                      │
     │                          │    git commit        │
     │                          │ ─────────────────►   │
     │                          │                      │

How Git Stores Objects

Every object is stored in .git/objects/ using its SHA-1 hash as the filename:

$ find .git/objects -type f
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4

Notice the directory structure:

  • First 2 characters of the hash → directory name

  • Remaining 38 characters → filename

This prevents any single directory from having too many files.


SHA-1 Hashes and Data Integrity

Git uses a 40-character SHA-1 hash to identify every object. This isn’t just some random string—it’s a cryptographic fingerprint of the object’s actual content.

How Git Calculates Hashes

Git hashes the content along with a header:

header = "blob " + content.length + "\0"
hash = SHA1(header + content)

For example, hashing "Hello, World!":

$ echo -n "Hello, World!" | git hash-object --stdin
8ab686eafeb1f44702738c8b0f24f2567c36da6d

Why This Matters

  1. Data Integrity: If even a single bit in an object change, the hash changes completely. That way, Git spots any corruption right away

  2. Deduplication: When two files have the same content, they get the same hash. Git only stores them once

  3. Immutability: You can’t tweak an object without changing its hash, and if you do, you have to update every reference pointing to it.

# Verify object integrity
$ git fsck
Checking object directories: 100% (256/256), done.

Warning: Never manually edit files in .git/objects/. You'll corrupt your repository because the filename (hash) won't match the content.


What Actually Happens When You Run git add and git commit

Let’s break down what’s really going on under the hood with these commands.

What git add Does

So, you hit git add file.txt. Here’s what happens:

  1. Git takes the contents of file.txt and hashes it. This creates what’s called a “blob” object.

  2. Git stores that blob inside the .git/objects/ directory.

  3. it updates the index (that’s .git/index) to include your file. That way, Git knows you want to track this exact version for your next commit.

# Before git add
$ git status
Untracked files:
    file.txt

# Run git add
$ git add file.txt

# A new blob is created
$ find .git/objects -type f
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30

What git commit Does

When you run git commit -m "message", here’s what goes on:

  1. Git takes a snapshot of everything you’ve staged and creates a tree object.

  2. Then it builds a commit object. This one points to the tree and adds info like who made the commit and when.

  3. Finally, Git moves the branch pointer forward, so it now points to your new commit.

# Create a commit
$ git commit -m "Initial commit"

# See the new objects
$ git cat-file -p HEAD
tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579
author You <you@email.com> 1609459200 +0000
committer You <you@email.com> 1609459200 +0000

Initial commit

The Complete Picture

                    Working Directory
                           │
                     git add file
                           │
                           ▼
    ┌─────────────────────────────────────────┐
    │           Staging Area (Index)          │
    │  ┌────────────────────────────────────┐ │
    │  │ file.txt → blob 83baae61...        │ │
    │  └────────────────────────────────────┘ │
    └─────────────────────────────────────────┘
                           │
                    git commit -m "msg"
                           │
                           ▼
    ┌─────────────────────────────────────────┐
    │           Object Database               │
    │  ┌──────────────────────────────────┐   │
    │  │ Commit: abc123...                │   │
    │  │   └── Tree: d8329f...            │   │
    │  │         └── Blob: 83baae...      │   │
    │  └──────────────────────────────────┘   │
    └─────────────────────────────────────────┘
                           │
                           ▼
    ┌─────────────────────────────────────────┐
    │           refs/heads/main               │
    │           Points to: abc123...          │
    └─────────────────────────────────────────┘

Exploring Git Internals Yourself

Here are some commands to explore your own repositories:

Plumbing Commands (Low-Level)

# Hash content without storing
$ echo "test content" | git hash-object --stdin

# Hash and store content
$ echo "test content" | git hash-object -w --stdin

# View object type
$ git cat-file -t <hash>

# View object content
$ git cat-file -p <hash>

# View object size
$ git cat-file -s <hash>

Examine Your Repository

# List all objects
$ find .git/objects -type f

# View the staging area
$ git ls-files --stage

# Check HEAD reference
$ cat .git/HEAD

# Check branch reference
$ cat .git/refs/heads/main

# Verify repository integrity
$ git fsck --full

Best Practices

  • Don’t mess with files inside .git/ by hand. Always use Git commands.

  • Run git fsck once in a while. It checks if your repo is healthy.

  • Try to understand what each command does instead of just memorizing them.

  • Want to poke around? Set up a test repo and experiment there. It’s safer.

Common Mistakes to Avoid

  1. Never delete files from .git/objects/. You’ll break your repo for good.

  2. Don’t edit files in .git/objects/pack/ either. These are compressed—leave them alone.

  3. If git fsck throws warnings, don’t ignore them. They’re usually a sign something’s wrong with your data.


Wrapping Up

Git’s design is actually pretty straightforward when you break it down:

  • Blobs hold your file data.

  • Trees keep track of directory layouts.

  • Commits save snapshots and remember your history.

  • SHA-1 hashes glue everything together, making sure nothing gets lost or duplicated.

  • The staging area lines up your next changes.

  • References—like branches and tags—just point to specific commits.

Once you get how these pieces fit, Git stops being this black box and starts making sense. You can actually reason about what’s going on under the hood.

Want to dig deeper?

  • Git Internals - Official Documentation

  • Try using git cat-file and git hash-object on your own projects

  • Explore how git pack-objects compresses your repository

  • Learn about git reflog for recovering lost commits

Sources:


If this helped, stick around for more hands-on guides and deep dives into the tool’s developers use every day.