Hardening fixes lead to hard questions
Linus Torvalds reacted
strongly to Cook's pull request after noticing that many of the commits
found within it had been modified in strange ways. Git tracks both the
author of a commit (the person who wrote the code), and the committer (the
person who put that code into the repository). In this case, there were
changes that claimed to have been committed by Torvalds, but they were
actually rewritten (but unmodified beyond the metadata) versions of his
commits with different SHA IDs. Torvalds said: "You seem to have
actively maliciously modified your tree completely
", implying that some
sort of deliberate, underhanded change had been made. He copied kernel.org
maintainer Konstantin Ryabitsev, asking that Cook's account there be
disabled; Ryabitsev duly
complied. News quickly spread around the Internet, along with a lot of
speculation about possible supply-chain attacks or other malicious
activity.
While use of kernel.org is not mandatory, most kernel maintainers do keep their repositories there. Banishment therefrom will, thus, leave a maintainer unable to do their work; unable, in this case, to even fix the problems that caused that banishment in the first place. It has never been explicitly said that a request from Torvalds is enough to cause a kernel.org account to be disabled, but it is not surprising in retrospect. Still, it must have come as a shock, even without the suggestions of possible malicious activity.
Cook, though, reacted calmly to his banishment, saying that he had not
created the problematic repository intentionally; "I think I have an
established track record of asking you first before I intentionally do
stupid things with git
". He went through the exercise
of recreating that repository, showing all the steps along with data from
the Git reflog. In the end, he was able to reproduce the problem with an
invocation of the b4 tool's
trailers subcommand.
B4 is a tool that has made life far easier for kernel developers and (especially) maintainers. It handles many of the tasks of applying patches, ensuring that all offered tags ("Reviewed-by" and such) are applied, and more. The b4 trailers command, in particular, will look for replies to a set of already-committed patches containing new tags, then rewrite the commit history to include those tags in the changelogs. It is, at its core, a rebasing operation. Those should always be approached with care, but they do not ordinarily lead to this kind of problem.
In this case, b4 trailers advised Cook that it was about to modify 39 commits. By his own admission, Cook missed that warning and told it to proceed, then used a forced push to upload the resulting repository to kernel.org. Ryabitsev, who is the b4 maintainer, was willing to share the blame:
Well, that's the point where the user, in theory, goes "this is weird, why is it 39 commits," and does Ctrl-C, but I'm happy to accept blame here -- we should be more careful with this operation and bail out whenever we recognize that something has gone wrong.
He added that he was "100% convinced
" that there was no malicious
activity involved. Cook's account was reactivated; he promptly put
together a new pull request for the hardening fixes, which was pulled
by Torvalds on June 1.
There will be some changes to b4 to try to prevent this kind of mistake from happening again. Torvalds has asked that it refuse to rewrite commits that were committed by anybody other than the user running the command; Ryabitsev has agreed to make that change. There will probably be others as well, once the developers involved understand why b4 decided to modify so many commits in this case.
So this episode appears to have run its course. The real moral of the
story, perhaps, is that powerful tools can sometimes have powerfully
adverse effects. It can take time — and hard experience with those effects
— to learn where the pitfalls are and what types of guard rails need to be
installed. We have just seen an example of how that experience is gained.
Index entries for this article | |
---|---|
Kernel | Development tools/Git |
Posted Jun 2, 2025 17:39 UTC (Mon)
by pwfxq (subscriber, #84695)
[Link]
Posted Jun 2, 2025 17:42 UTC (Mon)
by jgg (subscriber, #55211)
[Link] (5 responses)
But how on earth did the pull request get built with this:
The following changes since commit f8b59a0f90a2adfce5a9206ce5589ed0dc19543c:
??
All my scripts for building PRs work directly, and only, on the tip of Linus's branch. So if b4 trailers corrupted my local branch then the PR scripts would compute a merge base with Linu's branch and generate a PR that shows the mess - exactly how Linus would have noticed this when his tooling would have showed it against the merge-base?
Posted Jun 2, 2025 18:00 UTC (Mon)
by koverstreet (✭ supporter ✭, #4296)
[Link] (4 responses)
Posted Jun 2, 2025 18:03 UTC (Mon)
by jgg (subscriber, #55211)
[Link] (3 responses)
<start>
That start is effectively the "upstream repo". My scripts use 'git merge-base HEAD linus/master' to compute that automatically.
So maybe that is the answer, without a wrapper script the request-pull command is hard to use..
Posted Jun 2, 2025 18:11 UTC (Mon)
by koverstreet (✭ supporter ✭, #4296)
[Link]
Posted Jun 2, 2025 23:36 UTC (Mon)
by kees (subscriber, #27264)
[Link] (1 responses)
Posted Jun 3, 2025 5:22 UTC (Tue)
by pbonzini (subscriber, #60935)
[Link]
Posted Jun 3, 2025 13:13 UTC (Tue)
by SLi (subscriber, #53131)
[Link] (6 responses)
Although, here of course the issue ran deeper into things that b4 did using inherently dangerous Git operations.
What I'm trying to say is this: It doesn't shock me that someone managed to accidentally mess up a Git repository in a way that leaves the original architect of Git scratching their head and thinking there's no benign way to end up like this.
Posted Jun 4, 2025 3:11 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (5 responses)
* Commits have a field called "phase," which can take on one of three values: secret, draft, and public. The tooling applies various sanity checks to make sure that this field is respected (e.g. you cannot rewrite a public commit, you cannot push a secret commit, you cannot have a public descendant of a secret or draft commit, etc.), and it does not contribute to the hash (so that the tooling can automatically update it when you push to a public repository, without invalidating your entire DAG). If it has the wrong value, you can force-overwrite it, but this is rarely needed in practice because the tooling will usually DTRT.
Posted Jun 4, 2025 14:48 UTC (Wed)
by notriddle (subscriber, #130608)
[Link]
Instead of "phases," it lets you define a set of immutable commits based on a query. Usually, you define every commit in the `master` branch as immutable, which would have prevented this from happening. This should also make force-pushes a lot safer. https://github.com/jj-vcs/jj/blob/main/docs/config.md#set...
Finding commits that are "the same, but rewritten" would use change-id. Change IDs are not currently pushed, but that change is planned. https://lore.kernel.org/git/CAESOdVAspxUJKGAA58i0tvks4ZOf...
Posted Jun 7, 2025 20:19 UTC (Sat)
by mathstuf (subscriber, #69389)
[Link] (3 responses)
Is there a way to mark a remote as not "public"? For example, just because I push to my fork (or directly to another development machine), commits should not be marked as "public".
Posted Jun 9, 2025 2:47 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
Again, you do have the option of forcing the phase back into draft if public is wrong. The problem is, the commit will still be marked public on the remote side, and will be re-marked as public if you ever pull it. You can only fix that by logging into the remote and forcing it into draft locally... and then logging into every repo that has pulled it and doing the same in each of them, recursively until you've draftified every instance of the commit that exists anywhere. Which is probably intractable in the general case, but that's the idea - once a commit has been marked as public, it is meant to indicate that "this commit has been finalized and is no longer subject to history revision." Undoing that finalization is intentionally unsupported, because the whole point of finalization is that it is final.
Posted Jun 9, 2025 14:30 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
Is this something that Bitbucket (or other Mercurial forges) allowed? It sounds to me like it effectively forces the only-contribute-from-a-fork pattern (that I prefer anyways) unless you're dead set against rebasing (IMO, far too valuable to give up). But maybe the bookmark/branch distinction allows for some kind of "mark integration branches as public, but development work as draft" in a single repository?
Posted Jun 9, 2025 14:55 UTC (Mon)
by farnz (subscriber, #17727)
[Link]
And because Changeset Evolution was designed for this scenario, you'd be able to use hg evolve to help you keep one branch in sync between multiple places, rebasing where a clean rebase is possible, and handling content divergence in a cleanish fashion.
Most of what blocked Changeset Evolution from becoming Mercurial's killer feature that'd have made it worth migrating from GitHub to Bitbucket was the challenges involved in fixing up unstable commits when there was divergence between the different lines of development; there were also (solvable) issues with changeset exchange in the face of obsolescence markers, but it's not worth fixing that when the core functionality is still problematic.
Posted Jun 12, 2025 2:48 UTC (Thu)
by raven667 (subscriber, #5198)
[Link]
To err....
Where did the PR message come from?
Where did the PR message come from?
Where did the PR message come from?
Commit to start at. This names a commit that is already in the upstream history.
Where did the PR message come from?
Where did the PR message come from?
Where did the PR message come from?
Git be like
Git be like
* When a commit is rewritten or deleted, a special object called an "obsolescence marker" is created. It records the hash(es) of the commit(s) we deleted, the hash(es) of the replacement commit(s), if any, and some misc. metadata. These objects are pushable between repositories, which has the effect of reapplying the same rewrite operation at the remote (public repositories can refuse to accept such rewrites if they are so inclined, and such repositories are usually also configured to automatically promote anything pushed to them into the public phase, so that the tooling "reminds" you not to do it in the first place). Then you can push obsolescence markers instead of using push --force, which is considerably safer because the tooling can refuse a push that doesn't make sense.
* There is a command called "evolve" that finds commits whose ancestors have been rewritten, or that exist in more than one place in the DAG (e.g. because two repositories tried to amend the same commit in different ways), and does its level best to fix them by reading obsolescence markers and rebasing things until the DAG makes sense again (while preserving as much of your rewrite's intent as possible, e.g. if you rebase A on top of B, it will not arbitrarily put B on top of A, or anything dumb like that). It can also be directed to only fix one commit, or to only fix commits relevant to the current HEAD. It can't solve all problems, but it can solve a surprisingly wide range of problems. If it's not sure what it's doing, it will fail rather than guessing, but you can always use the regular rebase command to fix things by hand.
Git be like
Git be like
Git be like
Git be like
Bitbucket (when it supported Mercurial) definitely let you set a repo to non-publishing. So, you could fork a repo, mark your fork as non-publishing, and share draft work through the non-publishing repo.
Publishing versus non-publishing repos in Mercurial
Good catch