The inimitable Jonathan Corbet

Posted Jul 12, 2024 11:29 UTC (Fri) by sdalley (subscriber, #18550)
Parent article: An empirical study of Rust for Linux

> As a bonus, it includes a ChatGPT analysis of LWN and Hacker News comments.

The dry, ironic tone in which this remark comes across made me laugh out loud!

The inimitable Jonathan Corbet

Posted Jul 12, 2024 14:21 UTC (Fri) by Zildj1an (subscriber, #152565) [Link] (5 responses)

An analysis that, by the way, classifies opinions into only positive and negative categories, implying that neutral views do not exist.

The inimitable Jonathan Corbet

Posted Jul 12, 2024 18:12 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (4 responses)

Sentiment analysis is still at a rather primitive stage of development. ChatGPT is probably better at it than prior methods (which were usually based on picking out individual words, scoring them individually, and trying to sort-of average the scores out over a whole message), but it's still a very difficult and inherently subjective problem. That subjectivity gets significantly worse if you add a middle ground, because now you have to decide what's "close enough" to the middle to qualify.

In other words: We're barely capable of classifying things into "positive" and "negative" as it is, so adding "neutral" is probably not happening any time soon. Especially since people will quibble over the dividing lines between the three categories, and there's no obvious way to figure out where they should be drawn.

The inimitable Jonathan Corbet

Posted Jul 14, 2024 3:38 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (3 responses)

Also it's just incredibly hard. The ideal is that we can somehow measure the view that was held by an individual when they communicated - but even a trained human, set this task on a much smaller scale, will sometimes fail miserably. And that's assuming, which is not given, that the person who made the communication we're analysing was *trying to help us to know their view* and didn't have some other reason (or none at all) for what they did.

The inimitable Jonathan Corbet

Posted Jul 14, 2024 7:15 UTC (Sun) by Wol (subscriber, #4433) [Link]

> And that's assuming, which is not given, that the person who made the communication we're analysing was *trying to help us to know their view* and didn't have some other reason (or none at all) for what they did.

Another reason - the person who made the communication may have been trying to be neutral - and a confusing factor - did the person making the communication understand their own views well enough to communicate them clearly?

Cheers,
Wol

The inimitable Jonathan Corbet

Posted Jul 14, 2024 8:32 UTC (Sun) by atnot (subscriber, #124910) [Link]

> that's assuming, which is not given, that the person who made the communication we're analysing was *trying to help us to know their view*

This is especially relevant for forums like hn or reddit, a lot of which moderate more based on the tone than the content. Which makes long-term residents of the site extremely adept at expressing whatever opinions they have in the words expected of them. So (for a more extreme recent real-world example) while someone may elsewhere complain about "woke mobs", they may find "concerns about moderation fairness" gets their comments deleted less. A computer system would not have the required context to not take that at face value and evaluate whether those concerns are actually relevant to the technology or not.

This would not be a problem if people used things like sentiment analysis properly in a way that evens out these effects. That is: to measure only relative historical trends within a fixed population, over huge amounts of unrelated conversations. But instead we get people using it for absolute analysis and even ridiculous things like moderating individual comments.

The inimitable Jonathan Corbet

Posted Jul 14, 2024 20:52 UTC (Sun) by flussence (guest, #85566) [Link]

Online literacy at this point in time is a cryptographic arms race against the advertising industry, who have forever been on the losing side and holding the map upside down.

Using one of their siege machines to try to understand conversations with any amount of nuance in them was a doomed idea from the start. It's a morbidly fascinating exercise in self-hypnosis, but the results have no value or accuracy for their stated purpose.