|
|
Subscribe / Log in / New account

Leaving python-dev behind

Leaving python-dev behind

Posted Jul 20, 2022 17:15 UTC (Wed) by NYKevin (subscriber, #129325)
In reply to: Leaving python-dev behind by amacater
Parent article: Leaving python-dev behind

Is there a reason that https://discuss.python.org/search?expanded=true does not meet your requirements? Or is this specifically about the ability to find everything that you posted, across all mailing lists and projects, from a specific time period?


to post comments

Leaving python-dev behind

Posted Jul 20, 2022 17:53 UTC (Wed) by amacater (subscriber, #790) [Link] (7 responses)

There's about 7M emails across all Debian mailing lists - pretty much all in plain text and going back to 1994 or so - that's a fair corpus and much of it is also archived by Google, various other archives ... that's the sort of thing I mean. There's also a bunch of corporate memory in there.

IRC logs are good and useful as long as they're kept - and can also be grepped, of course.

Fora and discussion tools are significantly less long-lived: Slack/Mattermost/Matrix/Discord?? and that's not to think of various other services built on XMPP that were commercialised or have just vanished. Digital dark ages, anybody?

Anything unthreaded - Heaven forbid. And no, there are some other areas where HTML and top-posting haven't caught on - supercomputing and the Beowulf list remains one of my favourites for focus and technical excellence.

Disadvantage of modern forum/chat software

Posted Jul 20, 2022 19:06 UTC (Wed) by dskoll (subscriber, #1630) [Link] (6 responses)

This is an important point. Mailing list message and IRC logs are really easy for people to archive on their own workstations, and they don't take up a ton of space by modern disk drive standards.

My workplace, for instance, uses Slack, but I use it via an IRC gateway that logs everything. If I want to search for something, it is significantly faster for me to grep the logs on my IRC gateway box than to use Slack's search feature. I can also look for regexes or even do more complicated searches by whipping up a script.

Disadvantage of modern forum/chat software

Posted Jul 20, 2022 20:19 UTC (Wed) by mattdm (subscriber, #18) [Link] (5 responses)

Discourse — which is 100% open source, with a support/hosting/consulting SaaS business model rather than the "open core" squeeze play — has a pretty nice open API. If you attach `.json` to the end of a topic URL, you get a pretty comprehensive machine-parsable representation.

It wouldn't be terribly hard to build archiving tools meant to preserve conversations in a way that is useful offline and does not depend on the forum software itself.

Discourse discoverability and archivability

Posted Jul 21, 2022 2:07 UTC (Thu) by michaelkjohnson (subscriber, #41438) [Link] (1 responses)

Yes and...

Discourse makes its content easily searchable; non-javascript users get a readable, full-text dump of all the content instead of dynamic scrolling, and that's the view that is presented to search engines and script-blocking users. It's fine!

When I ported lots of Google+ communities into a new Discourse instance, search engines found the content very quickly and gave relevant search results. Google's bot has a relatively light impact on site load compared to the other bots (although Bingbot has improved recently), while still indexing new content quickly.

Besides the ".json", just crawling with a non-browser User-Agent will produce the full, non-incremental-scrolling view, which makes it even easier to create an archive. The sitemap (just add sitemap.xml to the base forum URL for the complete sitemap) would be an easy place to start, other than XML being the standard for sitemap...

Discourse discoverability and archivability

Posted Jul 21, 2022 12:58 UTC (Thu) by dskoll (subscriber, #1630) [Link]

OK, that's cool. I've never used Discourse, but it sounds like decent software.

Disadvantage of modern forum/chat software

Posted Jul 22, 2022 1:24 UTC (Fri) by mm7323 (subscriber, #87386) [Link]

Such a tool exists already, it's called ArchiveDiscourse. Unfortunately the output is a bit basic and need some manual styling effort, but it does create flat HTML pages that can simply be served.

Disadvantage of modern forum/chat software

Posted Jul 22, 2022 5:28 UTC (Fri) by comex (subscriber, #71521) [Link] (1 responses)

Or if you're lazy, just subscribe in mailing list mode and archive it along with all the rest of your email. That's what I do.

Discourse's mailing list mode is not perfect – the biggest issue is that it doesn't notify you when someone edits their post – but it's good enough for most purposes.

(I wish there was a standard for editable email, for Discourse and similar software to use. It could just consist of a header that means "this email is a revision to the email with such-and-such Message-ID". Email clients supporting the standard would default to showing only the latest version of each email, but would have an option to show old versions. Old clients would just see all the revisions as separate messages.)

Disadvantage of modern forum/chat software

Posted Jul 23, 2022 14:08 UTC (Sat) by anton (subscriber, #25547) [Link]

(I wish there was a standard for editable email, for Discourse and similar software to use. It could just consist of a header that means "this email is a revision to the email with such-and-such Message-ID".
For Usenet there is "Supersedes: <old-message-id>". However, AFAIK NNTP servers honoring Supersedes: don't give the old message to the user, and I am not aware of NNTP clients having a functionality like you desire.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds