A vulnerability in Git

By Jake Edge
March 10, 2021

A potentially nasty vulnerability in the Git distributed revision-control system was disclosed on March 9. There are enough qualifiers in the description of the vulnerability that it may appear to be fairly narrowly focused—and it is. That may make it less worrisome, but it is not entirely clear. As with most vulnerabilities, it all depends on how the software is being used and the environment in which it is running.

The vulnerability (CVE-2021-21300) could lead to code execution on the local system when cloning from a repository crafted to exploit it. It requires that some kind of Git filter be installed. Filters are used to manipulate files in between the filesystem and the Git repository; "smudge" filters are used when pulling blobs (binary objects) out of the repository to store in the working directory, while "clean" filters can change files as they are being committed into the repository. Which of those types is needed will depend on the type of transformation being performed. Git Large File Storage (LFS) is a commonly used extension (with both smudge and clean filters), which is installed by default with Git on Windows.

Filters are able to delay the normal processing of Git operations so that long-running filtering can be completed in the background. For example, Git LFS may need to copy a large file across the network in order to satisfy a checkout operation. But the delay feature changes the normal order in which files and directories are processed by Git. That, in turn, means that information cached by the tool may no longer be valid when it is relied upon, which is exactly where the vulnerability lies.

In order to reduce the number of lstat() calls that are made, Git maintains a cache called the "lstat cache". If a path collision (i.e. two files with the same path and name) occurs as the files are being checked out, for example if two files with names that differ only in their case are being checked out into a case-insensitive filesystem, that cache could be left in an invalid state. That does not typically lead to a problem because the checkouts proceed in a known order so the cache is not actually needed at the point where it is invalid.

However, if certain parts of the checkout are delayed by the filters, all bets are off. When the cache is consulted, the type of the files in the cached path may have changed; if that change was crafted by an attacker, unpleasantness is sure to occur. The patch fixing the vulnerability described the problem this way:

But, there are some users of the checkout machinery that do not always follow the index order. In particular: checkout-index writes the paths in the same order that they appear on the CLI (or stdin); and the delayed checkout feature -- used when a long-running filter process replies with "status=delayed" -- postpones the checkout of some entries, thus modifying the checkout order.
When we have to check out an out-of-order entry and the lstat() cache is invalid (due to a previous path collision), checkout_entry() may end up using the invalid data and [trusting] that the leading components are real directories when, in reality, they are not. In the best case scenario, where the directory was replaced by a regular file, the user will get an error: "fatal: unable to create file 'foo/bar': Not a directory". But if the directory was replaced by a symlink [symbolic link], checkout could actually end up following the symlink and writing the file at a wrong place, even outside the repository. Since delayed checkout is affected by this bug, it could be used by an attacker to write arbitrary files during the clone of a maliciously crafted repository.

Several paths to a fix were considered, including disabling the cache for unordered checkouts or sorting the file names so that they are always processed in the same order. Both of those had performance impacts and there was a concern that other code paths could someday lead to unordered processing, thus reviving the bug. Instead, the cache is simply invalidated whenever a remove-directory operation is performed.

As noted, symbolic links play a role in the ability to exploit the vulnerability. While highly useful, symbolic links have also historically been used to wreak havoc in various ways. They often feature in race condition exploits (e.g. for temporary files) and the like. Not all systems support symbolic links, though Unix-derived systems (Linux, macOS) generally do; these days, Windows administrators can also create symbolic links.

So it is a combination of several different features and situations that lead to an exploitable system—including the existence of an attacker-crafted repository that users need to be convinced into cloning. Even for Windows systems, where both Git LFS and case-insensitive filesystems are the norm, exploits are seemingly not at all common—perhaps even non-existent. This has the look of a problem discovered via code inspection or testing that was subsequently reported and fixed quickly—without even time for a catchy name, logo, and web site. If any systems have been exploited, it seems most probable that the attacks were highly targeted and may not have been discovered (yet).

While Linux-native filesystems are not usually case-insensitive, they can be. Beyond that, though, Linux can make use of native filesystem formats for Windows and macOS that have such functionality. In addition, the test cases provided with the fix show another way to cause the problem: via Unicode normalization. The test case uses two different Unicode representations for "ä" (U+0061 U+0308, "\141\314\210", and U+00e4, "\303\244") to ensure that no files are written to the wrong place. So it may be less likely that Linux systems are affected by the bug, but they are not immune.

Index entries for this article
Security	Git
Security	Race conditions

A vulnerability in Git

Posted Mar 10, 2021 23:51 UTC (Wed) by rvolgers (guest, #63218) [Link] (1 responses)

My first thought was that it would be nice if git could use RESOLVE_BENEATH to avoid this kind of problem, at least on recent Linux versions.

But I guess that's not really feasible since it has to work on various platforms, so RESOLVE_BENEATH would be at best an optimization or an additional safety net. Considering that, it probably makes more sense to try to solve it in a generic way and avoid the added complexity of abstracting over multiple implementations, especially in a C program.

There is an additional problem with that idea as well: you probably also want to prevent unexpected symbolic links from redirecting writes to inside the .git directory, as that can make git do all kinds of unexpected things.

Just as a thought experiment, I think this could be expressed by creating a new mount namespace and masking the .git directory with an empty read only mount. That'd be kind of crazy though and raises other concerns.

It seems using capability based APIs to solve this kind of problem for common directory layout patterns on unix requires either manual workarounds like the mount trick I mentioned (similar to the nonsense Docker has to do to secure /proc mounts in containers) or a new kind of API which has nearly the kind of expressiveness seen in .gitignore files. I guess there is even some precedent for that in the form of AppArmor.

A vulnerability in Git

Posted Mar 10, 2021 23:55 UTC (Wed) by rvolgers (guest, #63218) [Link]

Thinking on it a bit more, the problem seems to be *unexpected* symlinks. In that case, I suspect it could use RESOLVE_NO_SYMLINKS (again, on recent Linux) to avoid the problem. The objection that this might be more trouble than it's worth considering the need to support other platforms remains though.

A vulnerability in Git

Posted Mar 11, 2021 7:03 UTC (Thu) by jra (subscriber, #55261) [Link] (4 responses)

symlinks, the security-nightmare gift that keeps on giving. Symlinks really are a blight on POSIX file systems and the applications that have to run within them. I hate to say this, but Windows got things (partially) right with reparse points, that can only be created by Administrator users. Then of course they ruined everything by allowing them to be placed on top of directories (although I need to do more experiments on the exact semantics here) :-(. File systems are really hard.

A vulnerability in Git

Posted Mar 11, 2021 9:53 UTC (Thu) by johannbg (guest, #65743) [Link] (1 responses)

> I hate to say this, but Windows got things (partially) right with reparse points

Why do you have to hate to say this?
Is there something wrong with Microsoft getting this (partially) right?

A vulnerability in Git

Posted Mar 11, 2021 16:41 UTC (Thu) by jra (subscriber, #55261) [Link]

No terrible reason, just history and sadness really as most of my adult work life has been working on POSIX not Windows :-). Good on you for pointing out my bias here :-).

A vulnerability in Git

Posted Mar 11, 2021 11:25 UTC (Thu) by dgm (subscriber, #49227) [Link]

I'm sure reparse points are "good" from a security perspective. But so is as a powered down system (or a network disconnected one, if you know what I mean). The question is if they are good for anythig else.

Symlinks, on the other hand, are incredibly useful. This is the reason they still exists, along with many other security-dirty stuff like, god forgive me, the C language.

A vulnerability in Git

Posted Mar 12, 2021 0:26 UTC (Fri) by NYKevin (subscriber, #129325) [Link]

There are at least four different ways you can have a symlink-like-thing on Windows:

1. Shortcuts: Shortcuts are a very weird case, because they're not actually a kind of file at all. They are a data structure that (usually) lives in a (regular) file. This data structure essentially consists of a path and some metadata, but if you ask the operating system to resolve the shortcut, it will also do some magic where it tries to fix dangling shortcuts if it can figure out where the target got moved to. The operating system also has some funny special cases for things like "I installed program X, the installer put a shortcut on my desktop, and then I uninstalled X and now the shortcut is still there" (solution: prompt the user to delete the shortcut because X is gone), as well as the infamous "You have unused shortcuts on your desktop" (which has been removed with extreme prejudice from recent versions of Windows, thankfully). In general, they are intended as a DWIM solution for non-technical users (who cannot understand a "you moved it, now the shortcut can't find it" explanation).
2. Symbolic links: Pretty much the same as POSIX. Relatively new to Windows, having been introduced in Vista. I believe these are still restricted to administrators, unless you enable "developer mode."
3. Hard links: Again, pretty much the same as POSIX. However, NTFS made the it-made-sense-at-the-time decision to put some metadata in the directory entry, and provide an API for getting said metadata out of the Windows equivalent of readdir(), so if you make multiple hard links, that metadata can get out of date because the filesystem doesn't try very hard to keep it up-to-date. In practice, these are just about as unpopular in the Windows world as they are in the POSIX world.
4. Junctions: These are weird directory-only links that are in some respects similar to symlinks, but have been around for longer. Both symlinks and junctions are implemented using reparse points (which are basically a way of grafting custom parsing code into the standard path resolution algorithm).

Perhaps surprisingly, you can also mount whole volumes over (empty) directories on other volumes, in much the same way as a bind mount works on Linux. In fact, drive letters themselves are considered mount points and can be created or reassigned using exactly the same API.