|
|
Subscribe / Log in / New account

The risks of embedded bare repositories in Git

By Jake Edge
April 27, 2022

Running code from inside a cloned Git repository is potentially risky, but normally just inspecting such a repository is considered to be safe. As a recent posting to the Git mailing list shows, however, there are still risks lurking inside these repositories; code that lives in them can be triggered in unexpected ways. In particular, malicious "bare" repositories can be added as a subdirectory of a repository; they can be configured to run code whenever Git commands are executed there, which is something that can happen in surprising ways. There is now an effort underway to try to address the problem in Git, without breaking the legitimate need for including bare repositories into a Git tree.

In early April, Glen Choo posted to the list about the security risk of bare repositories in Git working tree. He linked to an admirably detailed advisory from Justin Steven that documents the problem and how it can be triggered by a wide variety of tools, including shells, integrated development environments (IDEs), editors, and more. The advisory has proof-of-concept (PoC) code for a whole slew of different scenarios, including ones that can be used to reproduce the problem locally, if desired.

The risks from automatically running code that comes from a remote (possibly untrusted) repository are well-known, so git clone does not copy the configuration file (normally .git/config) to the local system. Git can be configured via the file (or by using the git config command) in a wide variety of ways, including such things as changing the meaning of certain git subcommands. That is clearly dangerous, which is why the configuration file is excluded from the clone operation.

Bare repositories inside regular ones

But bare repositories are different than regular repositories; instead of storing all of the housekeeping information (including config) in the .git subdirectory of the repository, a bare repository stores all of those files directly in the directory where the repository is created. The difference can be easily seen by comparing the contents of the two directories created by the following:

    $ mkdir tmp1 tmp2
    $ cd tmp1; git init
    Initialized empty Git repository in .../tmp1/.git/
    $ cd ../tmp2; git init --bare
    Initialized empty Git repository in .../tmp2/
The tmp1/.git and tmp2 directories will have much the same content, including a config file.

But a bare repository does not have a "work tree" so many Git commands run in tmp2 will fail, but that is easily rectified. In tmp2:

    $ git status
    fatal: this operation must be run in a work tree
    $ mkdir worktree
    $ echo $'\tworktree = "worktree"' >> config
    $ git status
    warning: core.bare and core.worktree do not make sense
    fatal: unable to set up work tree using invalid config

That error message points to another difference between a regular repository and a bare one; the config file has a different setting for core.bare. For a bare repository, as one might guess, it is set to "true", but that is easily fixed with an editor or other tool:

    $ $EDITOR config  # change bare = true to bare = false
    $ git status
    On branch master

    No commits yet

    nothing to commit (create/copy files and use "git add" to track)
Now any Git command that is run in tmp2 will refer to the bare repository there; it will consult the config file there and use whatever options have been set for it. Now if we move that directory (and rename it to better describe its nature), we might have the following:
    $ cd ../tmp1
    $ mv ../tmp2 mal
    $ git status
    # shows untracked file mal/
    $ cd mal
    $ git status
    # shows the same empty repository as above
We can, of course, add and commit mal/ and then we have a repository with a bare repository in it. Anyone who clones the tmp1 repository, will get mal/ and any malicious configuration that comes along for the ride. Triggering it is only a matter of somehow causing a Git command to be executed in mal/, which might happen as easily as simply trying to set the shell prompt. For example, Steven cites the git-prompt.sh file, which is included with Git; users of the script who cd into a malicious bare repository will (perhaps unknowingly) run git and fall into this hole.

The perils of fsmonitor

So far, though, mal/ is lacking in the malicious department. As mentioned, there are a number of Git configuration directives and hooks that can be used to potentially do malicious things, but for the most part those require that the victim execute specific Git commands in the bare repository. The core.fsmonitor directive is used more widely by Git, though, making it a useful primitive for code execution.

The idea behind fsmonitor is to reduce the search space for commands like git status by returning a list of files that may have changed since a given date and time. The directive can be set to a command to run that should return the list; if it returns a failure exit code, Git assumes all files could have changed and acts accordingly. Steven listed five fairly common Git commands that invoke the fsmonitor program (e.g. git status, git add).

So, using Steven's PoC as a guide, we can do the following (in mal/):

    $ echo $'\tfsmonitor = "echo \\"Pwned as $(id)\\">&2; false"' >> config
    $ git status
    Pwned as uid=1000(jake) gid=1000(jake) groups=...
    ...
    $ cd ..   # to tmp1/
    $ git add mal/
    $ git commit -m "adding mal"
    ...
Note that the commit does not output the "Pwned..." line, since it is done in the top-level repository. But that config now lurks in mal/ waiting for any Git command that uses fsmonitor, when executed from mal/.

This seems clearly to be a security hole, though whether it can or will be addressed in Git is not entirely clear. Embedded bare repositories are apparently used in benign ways, especially for testing purposes, and the Git project does not want to prohibit them. Choo's message was seeking a way to reduce the danger, which he described this way:

Many `git` commands can be affected by malicious config files, and many users have tools that will run `git` in the current directory or the subdirectories of a repo. Once the malicious repo has been cloned, very little social engineering is needed; the user might only need to open the repo in an editor or `cd` into the correct subdirectory.

He lists several possible fixes ranging from preventing bare repositories from being added to work trees (or ignoring them in favor of their parent repository), through checking for them with git fsck, to educating users but not changing Git. The fsck check seems like it will be pursued; Choo posted a patch that will test a tree to see if it contains a bare repository and warn if it does. "This will help hosting sites detect and prevent transmission of such malicious repos."

There is some interest in putting further guardrails on Git's behavior with respect to these bare repositories, but it is important to ensure that projects can still use embedded bare repositories, especially given that some have them in their Git commit history, which will never go away even if a different solution is found. Johannes Schindelin pointed to the libgit2 repository as one example of a project that has embedded bare repositories.

There was some discussion of various possibilities, which Choo summarized, noting that he believed: "We all agree that something needs to be done about embedded bare repos." He listed some options (beyond the fsck change, which he will be working on), but Taylor Blau was not entirely sure that something needed be done since "there is significant social engineering required in order to meaningfully exploit this". However, not much more than cloning a malicious repository and poking around in it a bit while using Git-aware tools is all that is really needed to trigger the problem, so it is not clear why Blau thinks that is a significant hurdle. In any case, Blau did think it was worth exploring options to "prevent this type of attack or make it substantially less likely to have a user run git commands that execute parts of the config opportunistically".

Blau thought that the most promising option was one that Choo described as: "Detect if the bare repo is embedded and do not read its config/hooks, but everything else still 'works'." Blau extended that idea to allow users to explicitly opt into reading the configuration from the embedded bare repositories with a configuration option that would need to be set by the user, since it would live in the main repository. As noted, git clone will not copy the configuration from the remote repository since it has long been identified as a security hole.

To opt-out (i.e., to allow legitimate use-cases to start reading embedded bare repository config again), the embedding repository would have to set a multi-valued `safe.embeddedRepo` configuration. This would specify a list of paths relative to the embedding repository's root of known-safe bare repositories.

The advantage of that approach is that it would likely disrupt few projects or workflows, since the number of (legitimate) embedded bare repositories with useful or necessary configuration is probably low. Those projects could provide instructions to their users on how to set up the configuration option, which might be a little annoying, but perhaps not all that disruptive. It looks like work is underway down that path, though there is no huge rush since the problem has been known for quite some time.

There are plenty of other pitfalls when using untrusted Git repositories, but those are already well-known; simply using make or the build script for an untrusted project is a leap of faith unless the repository is carefully scrutinized, for example. Grabbing a tar file of a repository can also bring with it unwanted baggage in the form of .git/config, hooks, or embedded bare repositories.

While Choo and Steven pointed to earlier occurrences of similar or related problems, from as early as 2017, the existence of those types of problems is not really too surprising. Git is a powerful tool, with a lot of configuration knobs that may interact in surprising or unexpected ways. Meanwhile, a bunch of tooling has grown up around it, which also may be doing somewhat unexpected, seemingly harmless, things—liking setting a shell prompt—that can lead to unpleasant outcomes. Any tool, such as an editor or IDE that tries to helpfully display repository information in its interface, may fall prey to attacks of this nature. Users should be alert to the presence of these bare repositories in any projects that they clone.



to post comments

The risks of embedded bare repositories in Git

Posted Apr 28, 2022 2:46 UTC (Thu) by pabs (subscriber, #43278) [Link] (8 responses)

I feel like the right solution is to have a list of trustworthy repositories configured in your global git config and only allow running commands from them.

The risks of embedded bare repositories in Git

Posted Apr 28, 2022 9:27 UTC (Thu) by MrWim (subscriber, #47432) [Link] (2 responses)

Or a central list of hooks that are deemed "safe" that could run in any git repo. A malicious git repo might not be embedded inside another one afterall. It might come embedded inside a hg repo or tarball. Just because it's on your filesystem doesn't mean it can be trusted.

Generally speaking actions that feel safe should be made safe. Extracting a tarball, cloning a git repo, `cd`ing to a directory, `cat`ing a file all feel rather pedestrian - and if there are subtle security issues with them it's the software that needs to be fixed.

The risks of embedded bare repositories in Git

Posted Apr 28, 2022 10:04 UTC (Thu) by geert (subscriber, #98403) [Link] (1 responses)

The git repository might be inside a tarball.

The risks of embedded bare repositories in Git

Posted Apr 28, 2022 12:20 UTC (Thu) by MrWim (subscriber, #47432) [Link]

Exactly, that's what I meant by:

> It might come embedded inside a [...] tarball.

The risks of embedded bare repositories in Git

Posted Apr 28, 2022 11:13 UTC (Thu) by k3ninho (subscriber, #50375) [Link] (4 responses)

>I feel like the right solution is to have a list of trustworthy repositories configured in your global git config and only allow running commands from them.
Sure, give me the web address of the shell script to update the whitelist and I'll curl-pipe-sudo-bash it right away.

Oops.

K3n.

The risks of embedded bare repositories in Git

Posted Apr 28, 2022 14:19 UTC (Thu) by MrWim (subscriber, #47432) [Link] (3 responses)

I believe the suggestion was that you have a *local* list of repositories on your computer that *you* trust. It could be `~/.gittrusted` for example. It might look like:

Projects/linux
Projects/foo
Projects/bar

So then when you run `git status` in Projects/linux the hooks will be run, while if you run it in ~/Downloads/my-dodgy-project no hooks will be run.

The risks of embedded bare repositories in Git

Posted Apr 28, 2022 14:47 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (1 responses)

For prior art along these lines, see myrepos' `.mrtrust` file. https://myrepos.branchable.com/

The risks of embedded bare repositories in Git

Posted Apr 29, 2022 2:25 UTC (Fri) by pabs (subscriber, #43278) [Link]

Yep, that is where the idea came from; I'm one of the upstream maintainers of myrepos, and use it regularly, although not the mrtrust feature.

The risks of embedded bare repositories in Git

Posted Apr 29, 2022 2:24 UTC (Fri) by pabs (subscriber, #43278) [Link]

Right, although that doesn't solve the issue that k3ninho mentions; running arbitrary unreviewed code (which developers do a lot) could update the list of trusted directories. You would need to use bubblewrap or another container solution to prevent random code from touching the list of trusted dirs.

The risks of embedded bare repositories in Git

Posted Apr 28, 2022 7:48 UTC (Thu) by marcH (subscriber, #57642) [Link] (9 responses)

> There are plenty of other pitfalls when using untrusted Git repositories, but those are already well-known; simply using make or the build script for an untrusted project is a leap of faith unless the repository is carefully scrutinized, for example.

This type of problem happens every single time some "convenience" feature violates the "but I'm just looking" assumption.

Blurring the line between "just looking" and running code is EXACTLY the same problem as https://docs.microsoft.com/en-us/deployoffice/security/in... which Linux people have been using for decades to laugh at Microsoft's approach to security.

We never learn.

The risks of embedded bare repositories in Git

Posted Apr 28, 2022 8:49 UTC (Thu) by taladar (subscriber, #68407) [Link] (2 responses)

I wouldn't say it is the exact same problem since code as content is essential for source code repositories but not essential for office documents.

The risks of embedded bare repositories in Git

Posted Apr 28, 2022 11:34 UTC (Thu) by pbonzini (subscriber, #60935) [Link]

It is not essential for git repositories. Git repositories need not be code repositories.

The risks of embedded bare repositories in Git

Posted Apr 28, 2022 14:18 UTC (Thu) by jwarnica (subscriber, #27492) [Link]

The Word document could be prose instructions on how to format a drive.

The embedded macro might actually do it, automatically.

A git repository might be of, say, `shred`. Or a bunch of .md files instructing you how to use shred. And have a git-whatever that formats your drive.

Its not different, at all.

The risks of embedded bare repositories in Git

Posted Apr 28, 2022 9:56 UTC (Thu) by kleptog (subscriber, #1183) [Link] (3 responses)

The big difference being that when you look at a Git repository you're looking at something that many other people have already used and has possibly even been scanned by various public scanners and built using buildbots.

This is quite different to Word documents received via the email. If I received a Git repository in the email I wouldn't trust running make in it either.

The risks of embedded bare repositories in Git

Posted Apr 28, 2022 10:25 UTC (Thu) by Karellen (subscriber, #67644) [Link] (2 responses)

What? Not all git repos are hosted on github

But also, we're not talking about running `make`. We're talking running something only a bit more complex than `git status`, or even just `cd`ing into a subdirectory for the purposes of running `ls` or `cat readme.md` - except your fancy shell prompt runs some `git` commands to figure out the current branch and whether any changes have been made, in the background, and suddenly you're run attacker-controlled code.

The risks of embedded bare repositories in Git

Posted Apr 28, 2022 15:44 UTC (Thu) by kleptog (subscriber, #1183) [Link] (1 responses)

I think I didn't make myself clear. I wasn't saying that an untrusted Git repository is any safer than an untrusted Word document; it's obviously not.

What I'm saying is that the social and technical mechanisms around how Git repositories are usually managed means that the default level of trust for many Git repositories is much much higher than that of a random Word document. "git clone" can actually verify that what it received is consistent, whereas for a Word document via the email I can't even easily check it's the same thing my neighbour got.

The risks of embedded bare repositories in Git

Posted Apr 29, 2022 5:45 UTC (Fri) by marcH (subscriber, #57642) [Link]

> "git clone" can actually verify that what it received is consistent,

You can but do you actually verify every time?

When my "neighbour" tells me "take a quick _look_ at this project" she won't give me a SHA1 to make sure I'm looking at the exact same, audited and "safe" version of the project clone that she has right now. Because I'm _just looking_ so that would be overkill and overhead. If she wants me to _use_ the project then it's a totally different story and then yes she will probably point me at a specific git tag, maybe even a signed one.

> whereas for a Word document via the email I can't even easily check it's the same thing my neighbour got.

For a Word document you must indeed trust that the server you're downloading it from has not been hacked in the meantime which is in theory not required for git but it that really a huge security difference in practice? Also, you neighbor will likely have sent you the Word doc by email directly :-)

I think all these "secure transport" differences are fairly minor and TBH mostly off-topic compared to the very sneaky loss of a safe "read only" mode. Even non-technical users tend to understand the security difference between merely reading media and running code. I think anything blurring that conceptually simple et very useful line is doing users and security a great disservice - as Office macros did for many years.

The risks of embedded bare repositories in Git

Posted Apr 28, 2022 19:30 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (1 responses)

See also: JavaScript.

"The by-design purpose of JavaScript was to make the monkey dance when you moused over it. " - Eric Lippert

The risks of embedded bare repositories in Git

Posted Apr 29, 2022 6:09 UTC (Fri) by marcH (subscriber, #57642) [Link]

True - except web sandboxing (Javascript / WebAssembly etc.) has stood the test of time.

Javascript did really blur that line and created something new and intermediate between "just looking" and "running" but it produced something useful and incredibly popular and... it worked.

The risks of embedded bare repositories in Git

Posted Apr 29, 2022 2:32 UTC (Fri) by Alan.Stern (subscriber, #12437) [Link] (2 responses)

Maybe I'm dumb, but I don't see why the article concentrates on the dangers of embedded bare repositories. Isn't any embedded repository just as potentially dangerous, whether it is bare or not?

The risks of embedded bare repositories in Git

Posted Apr 29, 2022 12:13 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (1 responses)

Full repos show up as submodules which go through `clone` and therefore do not pull a config file. Maybe one could craft a tree to commit a non-bare repository, but the tooling would likely barf as it would expect a submodule or try to convert it to one at some point.

The risks of embedded bare repositories in Git

Posted Apr 29, 2022 14:42 UTC (Fri) by johill (subscriber, #25196) [Link]

It does in fact not even like to check out such a repo, saying

"error: invalid path 'inner/.git/config'"

or such things.

The risks of embedded bare repositories in Git

Posted Apr 29, 2022 18:39 UTC (Fri) by glasserc (subscriber, #108472) [Link] (3 responses)

It's a little surprising that Git still considers it a bare repository even after changing the core.bare setting to false! Glen Choo writes in the original post that apparently Git considers a directory a bare repository if it has subdirectories called "HEAD", "refs/" and "objects/". TIL!

The risks of embedded bare repositories in Git

Posted Apr 29, 2022 22:39 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (2 responses)

Now I'm wondering if a malicious user could construct an invalid bare repo in such a way that it causes Git to segfault and/or execute arbitrary code. If a bare repo is literally just a directory with magic contents, that seems like a possible outcome...

The risks of embedded bare repositories in Git

Posted Apr 30, 2022 13:31 UTC (Sat) by timon (subscriber, #152974) [Link] (1 responses)

> If a bare repo is literally just a directory with magic contents

Well, any git repo is just a directory with magic contents. Instead of `git init foo` you could do a minimal git init by hand and get a working repository:

mkdir -p foo/.git/objects
mkdir -p foo/.git/refs
echo 'ref: refs/heads/main' > foo/.git/HEAD

The risks of embedded bare repositories in Git

Posted May 1, 2022 18:31 UTC (Sun) by NYKevin (subscriber, #129325) [Link]

Yes, I'm aware of that.

I guess my concern is that a user might have a setup like this:

1. The user regularly clones untrusted Git repositories, for whatever reason.
2. If a repository containes a .git directory (actually checked in, not in the root of the repo), then the user (or some software acting on behalf of the user) will avoid cloning that repo, because they don't want to deal with the possibility of corrupt/malicious sub-repositories.
3. Bare repositories don't contain a .git directory, so this doesn't work.


Copyright © 2022, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds