October 12, 2020

Large File Storage on Git

Did you already put a large file on git?
Do you know it can decrease your performance? Unluckily, a large file stored on git is not a good practice. Many developers don't recommend this and propose alternatives.

A binary under control is useless on GIT

Git is a powerful version control system. Thanks to commit and patch files, git has several advantages:

  • Usage of “patches” to avoid “macro-management” between files.
  • Powerful comparison.
  • Branch management and conflict management.

Those advantages are true by using text files. It can be sources, headers, configuration files...
But, what about the binary ones? Git is unable to create diff files and cannot manage correctly those.
Consequences: Your git repository will upload/download your complete file. When you request for a pull, change branches, clone…

I think you got the point. This kind of storage has those defaults:

  1. You cannot check the version used… You should use “workaround” like a text file to describe the version. Or use a single commit exclusive to your binary.
  2. A binary file can be large. Over 1MBytes. It will add bandwidth for each clone, checkout, or pull request.
  3. Concerning conflicts, how to resolve it if two developers push two different versions?

Even if you have unlimited bandwidth, a binary slow would down your git performances. It is too bad to use Git as a basic binary manager!

One or two binary files will not specially modify your repository performance… However, what about 50 files?

Fortunately, you have great alternatives to manage those files

GIT LFS

Git LFS (for Git “Large File Storage”) is a tool to store your binaries, images, or archives.

Git LFS will simply store your files on a remote server. It will optimize your data transfer and avoid some strange behavior that could occur with a simple Git.

Instead, Git LFS will store on git a text “pointer of file” to download the correct file remotely. Thanks to that, your Git repository didn't get polluted by large binaries and images.

GIT LFs large file

Use a package manager

To manage your external libraries or modules, there is a cool solution: Package managers.
For some languages, a package manager goes with your programming language. For instance:

  • Crate for Rust.
  • Nuget for C#.
  • pip for Python.
  • And others...

Sadly, C++ doesn’t support easily those package managers. This specificity is due to the divergences between developers during the 30 last years. It concerns library usage, includes, or build systems. But, some package managers are in development for C++: Conan and vcpkg. And it works pretty well!

The big advantage of those package managers is to maintain a dependency tree. This is one of the best suitable solutions to manage third party libraries and binaries.

As you saw, you have multiple tools to track those big files. Git LFS is great to store large files as images and binaries. Concerning the package manager, it is a suitable solution for all binaries and third-party libraries.

About the author 

Axel Fortun

​Developer specialized in Linux environment and embedded systems.
​Knowledge in multiple languages as C/C++, Java, Python​ and AngularJs.
​Working as Software developer since 2014 with a beginning in the car industry​ and then in ​medical systems.