Almost always copyrighted ???
Almost always copyrighted ???
Posted May 1, 2025 8:20 UTC (Thu) by Wol (subscriber, #4433)In reply to: Almost always copyrighted ??? by NYKevin
Parent article: Distributions quote of the week
(Okay, with computing maybe, but even then mechanical computers (Babbage et al) go back many centuries ...)
And while the vernacular may change over a span of 20 years or so (1940s English has a recognisable style, I'm sure other eras do too), the last MAJOR shift in the English language came with Shakespeare - he is easily understood by modern ears, while anything pre-Elizabethan feels noticeably strange.
Cheers,
Wol
Posted May 1, 2025 10:33 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (3 responses)
Imagine asking such an LLM about women's role in the workplace, the political structure of the British Commonwealth, or literally any celebrity who became famous after the 1930's or thereabouts. At best, its answers will be hilariously out of date, if they are even coherent.
Taking the Commonwealth as an example, the (relevant) Balfour Declaration was in 1926, the US copyright cutoff year is currently 1930, and the UK does not have a relevant cutoff date because it moved to life+50 in 1911. So if we restrict ourselves to public domain training materials, we have at most four years' worth of source materials, that were mostly written by Americans, in the late 1920's, when the British Empire was still a thing and World War II was a decade away. That is not a recipe for accurate information about the state of the British Commonwealth in 2025, or for that matter 1945.
Posted May 1, 2025 11:20 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (2 responses)
And?
The initial comment was along the lines of "all the material is copyrighted". Yes most of the modern material is, but there is a HUGE corpus that isn't. And any decent LLM needs both.
(And it was you that said "why would we want old material?" Because - just as you said without modern material questions about today would be hilariously inaccurate because of missing information - without the old material questions about today are likely to be hilariously inaccurate because of missing information and historical context.)
"Unread comments" is a great tool in many ways, but when the original post disappears, the thread can go off the rails because context is lost, as appears to be the case here ...
Cheers,
Posted May 2, 2025 1:18 UTC (Fri)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
Posted May 2, 2025 8:58 UTC (Fri)
by Wol (subscriber, #4433)
[Link]
Russ's quote basically assumed that the two were the same.
Cheers,
Almost always copyrighted ???
Almost always copyrighted ???
Wol
Almost always copyrighted ???
Almost always copyrighted ???
Wol
