Debian dismisses AI-contributions policy

Posted May 14, 2024 2:16 UTC (Tue) by Heretic_Blacksheep (guest, #169992)
In reply to: Debian dismisses AI-contributions policy by james
Parent article: Debian dismisses AI-contributions policy

There's two ways of generating technical documentation: fast & good.

Fast is how a lot of people tend to want to do technical documentation, mostly because they don't consider proper communication essential. It's an afterthought. You end up with a lot of mistakes, misunderstandings, and complaints about the quality of a project's information.

Then there's good: it's written by skilled communicators, reviewed and annotated by multiple stake holders including people new to the project, old hands, and technical writing/communications savvy participants, and if necessary skilled translators. This is the best documentation and only becomes better over time as the teams that write it gain feedback. It takes time to generate, time to proofread and edit, and time to manage corrections. But, arguably it actually saves time all around for those that produce it, and particularly those that use it.

Many people using language models are going for the first level, but the result is little better in quality because they discount the point of documentation (and message passing like e'mail) isn't just telling people like themselves how to use whatever they're producing. The point of documentation is to tell not only with intellectual peers how to use their tool, it's to communicate why it exists, why it's designed and implemented the way it is, how it's used, and how it may potentially be extended or repaired.

The first (fast) is a simple how-to like a terse set of reminders on a machine's control panel and may not even be accurate. The latter is the highly accurate, full documentation manual that accompanies it that tells operators what to, not to, when, why, and how to repair or extend it. You can't reliably use a complex tool without a training/documentation manual. It's also why open source never made huge strides into corporate/industrial markets till people started writing _manuals_ not just terse how-tos many open source hobby level projects generate. Training and certification is a big part of the computer industry.

But back to the topic: AI can help do both, but fast/unskilled is still going to be crap. Merely witness the flood of LLM generated fluff/low quality articles proliferating across the internet as formerly reliable media outlets switch from good human generated journalism, to ad-generated human fluff-journalism, to LLM generated "pink slime" or in one person's terminology I recently saw "slop" (building on bot-generated spam). Good documentation can use language model tools, but not without the same human communication skills that good documentation and translation requires... and many coders discount to their ultimate detriment. Tone matters. Nuance matters. Definitions matter. Common usage matters. _Intent_ matters. LLM tools can help with these things, but they can't completely substitute for either the native speaker nor the project originator. They definitely can't intuit a writer's intent.

However, right now any project that is concerned about the provenance of their code base should be wary the unanswered legal questions on the output of LLM code generators. This could end up being a tremendous _gotcha_ in some very important legal jurisdictions where copyright provenance matters and why legally wise companies are looking at fully audited SLMs instead.