Debian dismisses AI-contributions policy

Posted May 10, 2024 19:33 UTC (Fri) by james (subscriber, #1325)
In reply to: Debian dismisses AI-contributions policy by hkario
Parent article: Debian dismisses AI-contributions policy

Of course, one could use AI to generate documentation on "rigorously undocumented" code, post it to the Internet, and wait for the "Someone is wrong on the Internet" effect to clean it up.

It's tempting to argue this would be a faster and/or more reliable way of getting good documentation.

Debian dismisses AI-contributions policy

Posted May 10, 2024 22:43 UTC (Fri) by taladar (subscriber, #68407) [Link] (2 responses)

It might be but only by forcing the people who have other productive things to do to focus on that documentation instead. It is never random people who are new to the project who can fix that kind of stuff.

Debian dismisses AI-contributions policy

Posted May 11, 2024 1:38 UTC (Sat) by mathstuf (subscriber, #69389) [Link] (1 responses)

It depends. I often ask coworkers who are inexperienced in a field to read the docs I write because…well I know I'm bad at it but it still needs to be done. Not that these things are anything LLMs are good at anyways ("observe what is done for procedure X and write instructions"). For example, our release process is finally a viable issue template that walks anyone through the steps of cutting one after having been through a few rounds of not-me using it and making fixes to it.

Debian dismisses AI-contributions policy

Posted May 11, 2024 13:47 UTC (Sat) by Paf (subscriber, #91811) [Link]

They actually are pretty good at the explanatory parts. They can’t get the detailed commands right unless it’s very common software, but they can absolutely guess some rather good background and help text. Not, like, better than me, but often as good and in much less time.

Debian dismisses AI-contributions policy

Posted May 12, 2024 14:41 UTC (Sun) by farnz (subscriber, #17727) [Link] (1 responses)

That leads into the other problem with a "no AI contributions" policy. Imagine I take my code, feed it to an LLM, and ask the LLM to write documentation for my code. I then take the LLM's output, and rewrite it myself to be accurate to my code, checking for mistakes; I used it as a prompt since I struggle to get going on documenting things, but I find it easy to fix bad documentation. The result is something that reflects considerable effort on my part, and that has very little trace of the original training data that was fed into the LLM; the LLM's role was to show me what good documentation would look like for my project, so I had a goal to aim for, even though I could not reuse the LLM's output.

Is this a contribution that we should reject? If so, why?

Debian dismisses AI-contributions policy

Posted May 14, 2024 1:06 UTC (Tue) by mirabilos (subscriber, #84359) [Link]

Yes, as it’s still a derivative of something that isn’t legal to use.

And for all the other reasons, such as model training being done in unethical ways that exploit people in the global south, and LLM use requiring too much power (and even other natural resouces like clean water, for reasons), so you should not be using them a̲t̲ ̲a̲l̲l̲, period.

Debian dismisses AI-contributions policy

Posted May 14, 2024 2:16 UTC (Tue) by Heretic_Blacksheep (guest, #169992) [Link]

There's two ways of generating technical documentation: fast & good.

Fast is how a lot of people tend to want to do technical documentation, mostly because they don't consider proper communication essential. It's an afterthought. You end up with a lot of mistakes, misunderstandings, and complaints about the quality of a project's information.

Then there's good: it's written by skilled communicators, reviewed and annotated by multiple stake holders including people new to the project, old hands, and technical writing/communications savvy participants, and if necessary skilled translators. This is the best documentation and only becomes better over time as the teams that write it gain feedback. It takes time to generate, time to proofread and edit, and time to manage corrections. But, arguably it actually saves time all around for those that produce it, and particularly those that use it.

Many people using language models are going for the first level, but the result is little better in quality because they discount the point of documentation (and message passing like e'mail) isn't just telling people like themselves how to use whatever they're producing. The point of documentation is to tell not only with intellectual peers how to use their tool, it's to communicate why it exists, why it's designed and implemented the way it is, how it's used, and how it may potentially be extended or repaired.

The first (fast) is a simple how-to like a terse set of reminders on a machine's control panel and may not even be accurate. The latter is the highly accurate, full documentation manual that accompanies it that tells operators what to, not to, when, why, and how to repair or extend it. You can't reliably use a complex tool without a training/documentation manual. It's also why open source never made huge strides into corporate/industrial markets till people started writing _manuals_ not just terse how-tos many open source hobby level projects generate. Training and certification is a big part of the computer industry.

But back to the topic: AI can help do both, but fast/unskilled is still going to be crap. Merely witness the flood of LLM generated fluff/low quality articles proliferating across the internet as formerly reliable media outlets switch from good human generated journalism, to ad-generated human fluff-journalism, to LLM generated "pink slime" or in one person's terminology I recently saw "slop" (building on bot-generated spam). Good documentation can use language model tools, but not without the same human communication skills that good documentation and translation requires... and many coders discount to their ultimate detriment. Tone matters. Nuance matters. Definitions matter. Common usage matters. _Intent_ matters. LLM tools can help with these things, but they can't completely substitute for either the native speaker nor the project originator. They definitely can't intuit a writer's intent.

However, right now any project that is concerned about the provenance of their code base should be wary the unanswered legal questions on the output of LLM code generators. This could end up being a tremendous _gotcha_ in some very important legal jurisdictions where copyright provenance matters and why legally wise companies are looking at fully audited SLMs instead.