|
|
Log in / Subscribe / Register

A couple small clarifications

A couple small clarifications

Posted Oct 21, 2025 23:23 UTC (Tue) by newren (guest, #5160)
Parent article: Git considers SHA-256, Rust, LLMs, and more

Thanks for the article. Overall, it's a nice high-level overview of some recent happenings in Git. There's two small things I wanted to call out:

> Git, since the beginning, has used the SHA-1 hash algorithm

Actually, it uses sha1dc (https://github.com/cr-marcstevens/sha1collisiondetection) by default since about Git 2.13 (2.40 if on mac). sha1dc returns the same thing as sha1 on most all inputs, but on inputs where sha1 has certain already-published weaknesses it yields a different result. That provides a little bit of extra protection, though the need to move on to a better hash still stands.

> Elijah Newren said that he has already contributed some LLM-generated documentation

I think that summary is prone to mislead or at least be misunderstood; perhaps if you said "LLM-edited" rather than "LLM-generated"? To me, the latter implies the LLM was generating lots of new prose, which it wasn't. I was feeding the LLM existing documentation and telling it look for typos, grammatical errors, and awkward wordings and provide suggested corrections. I then (heavily) filtered (and perhaps even further edited) the output, divided it up into logical commits, and submitted it all a few years ago, and was fully up-front about what I was doing at the time. I think that's a good use of an LLM that an open source project shouldn't ban (and suggested a couple others), but it looks like I might lose on that point.


to post comments

A couple small clarifications

Posted Oct 22, 2025 3:10 UTC (Wed) by bronson (subscriber, #4806) [Link] (1 responses)

If the LLM merely fixed up your writing, then you should be in the clear? Junio's proposed guidelines allow for AI tool use, as long as the output is considered tainted:
This policy does not apply to other uses of AI, such as researching APIs or algorithms, static analysis, or debugging, provided their output is not to be included in contributions.
(That makes sense because useful LLMs are pretty much all trained on incompatible data. The entire industry has rampant piracy at its foundation.)

If you wrote understood everything, and the LLM made some occasional minor fixes, then there's little chance of it being able to generate infringing content. IIUC.

A couple small clarifications

Posted Oct 22, 2025 15:47 UTC (Wed) by newren (guest, #5160) [Link]

> If the LLM merely fixed up your writing, then you should be in the clear? Junio's proposed guidelines allow for AI tool use, as long as the output is considered tainted:

> This policy does not apply to other uses of AI, such as researching APIs or algorithms, static analysis, or debugging, provided their output is not to be included in contributions.

To me, the wording of that exception does not include my case because I didn't just have it notify me of problematic text that I then went and manually fixed up myself, I had it suggest corrected text and then I looked at the diffs and picked the pieces that I liked and split it up into different commits (and possibly modified further). So the whole "provided their output is not to be included in contributions" rules that out.

Also, note that I was feeding all the git manpages to the LLM, one at a time, so it is not just my documentation I was having it edit. Those manpages are the combined work of many people over a long time. It is true that the LLM only suggested minor occasional fixes to the manpages, and I don't think that would be enough to trigger any problems, but to me the latest proposal is extra strict and conservative and rules out usecases like, which I think would be regrettable.

A couple small clarifications

Posted Oct 22, 2025 3:30 UTC (Wed) by WolfWings (subscriber, #56790) [Link] (1 responses)

There was lot of early LLM-related usages that were still 'fancy grammar checker' stuff and generating colorful blobs just to use as textures like starry skies or a field of boulders that (at the time) were innocent, but there's no way to differentiate between those and todays abusive public uses of the tech now.

A bunch of artists I know re-drew those portions and/or just removed those pieces from their online gallery years later as it turned out how bad LLMs were for creatives as a whole.

In this case though what's the difference/improvement between what you did and just running the same documentation through hunspell for example?

A couple small clarifications

Posted Oct 22, 2025 15:56 UTC (Wed) by newren (guest, #5160) [Link]

> In this case though what's the difference/improvement between what you did and just running the same documentation through hunspell for example?

Logically, I was using the LLM kind of like a glorified spell checker, so that is very good comparison. It's certainly very similar, and I've run the git documentation through command line spell checkers before. However, I found that spell checkers tend to turn up far more false-positives than an LLM does, making it much more labor intensive (and meaning that I got through a smaller subset of the Documentation and didn't repeat the exercise again later). Further, spell checkers (at least whatever one I used -- aspell? It's been long enough that I don't recall which one I ended up using) only tend to catch typos and spelling errors, while missing grammatical errors and awkward wordings/phrasings. LLMs catch and fix a wider variety of problems while having fewer false positives. (Though there were still quite a few, mostly because it was attempting to standardize American vs. British English spellings, and the manpages were so inconsistent on that point that I didn't want to subject the list to the noise of standardizing all of those.)


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds