Part 4: Memory Functionality Of Git

Henry Steinhauer
Analytics Vidhya
Published in
5 min readJan 6, 2021

--

Today we delve into the depths of the git storage system.

The memory functionality of git is not as difficult as you might think, but it’s still hard to understand for the time being. In my opinion, the best way to understand is not to use the lift, but to walk. This means that we will be creating hashes and commits manually. But first things first. Grab a cup of coffee ☕️ and let’s get started.

Today we will create all kinds of hashes manually. I explained it to them all in my last article. So here I will just examine how git stores changes into the git database. If you not safely aware, what a blob tree, or commit is, I strongly recommend you read my article about them first -> git-part-3-discover-the-git-folder.

Agenda

  • Blob
  • Three
  • Commit
  • Conclusion

Blob

To start we need a new git repository. Next, we have to create some changes. Therefore type

echo 'text' >> text.txt; echo 'text' >> text2.txt; echo 'text2' >> text3.txt

in your terminal. After that, we have something to stage. Now, the exciting part begins. To convert our changes into blobs we have to write git hash-object *filename. In our case, we need to replace the filename with text.txt, text2.txt and text3.txt or we can use the wildcard *. The command creates a 160-bit hash and displays it in the console as a 40 hexadecimal number. But actually, we don’t just want to see the blobs. What we want is to store them in the database. For this, we need to append -w to our command between the hash-object and the filename.

git hash-object -w *

After that, we persistently saved our changes. But only the content. The blob doesn’t care about file/folder names or other additional information. Below we see the generated blobs. You are probably wondering why there are just two blobs. This is because two of our changes were the same. Do you remember that blobs only store the content? Therefore we have a blob that stores the content for two different files.

The next step is to create our commit. This store’s more important information about our changes. We’ve already learned that when we normally generate our commit, git creates a tree and a commit hash. Let’s do that manual. However, we need to do something first. A few steps ago we stored our blobs in the local git database. Now we need to add the changes to the index or also called staging area. Normally this is done automatically with the git add command, but now we have saved our changes manually. Thus we have to attach our changes to the index with the command git update-index --add *. We need to append --add because usually, update-index ignores new files. Afterward, we are ready to create the tree.

Tree

git write-tree

The above command creates a tree hash where all entries from the entire index are tree items.

To inspect our tree, type git cat-file -p 9c... . In our case we have three entries.

100644 indicates the kind of file. In our case, it’s a normal text file. Other possibilities are 100755 which means it’s an executable file or 120000 to specify the reference as a symbolic link. I think blob should be self-explanatory. The hexadecimal number is a reference to the blob with our saved changes, and at the end, text.txt, text2.txt and text3.txt are the corresponding file names. You are probably wondering why two of the three entries have the same reference. As explained above, this is the case because both files have the same content and this results in the same blob being referenced. Now you should know what a git tree has stored. In a nutshell, a git tree contains which file or directory is associated with which blob.

Commit

Finally, the tree allows us to create a commit object. For this, you need to enter the following:

echo 'feat: Add new test.txt file with content test' | git commit-tree *tree-hash.

Let us now examine the commit object.

At the top, we see a reference to our tree hash, the author, and the commit agent with the date and the commit message. But the first commit is a special one. Normally we also see a reference to the previous commit, but our actual commit is the first. It’s also possible for a commit to have more than one parent commit.

Conclusion

Today you got a short recap on how git stores content. In addition, you should now be aware of how git’s database (file system) works. I hope the excursion was interesting and helpful. If you have anything to mention or questions, it would be great if you leave them in the comment section. See you soon 😃.

--

--

Henry Steinhauer
Analytics Vidhya

Passionate software developer who enjoys exploring new programming languages, design patterns and frameworks.