|
|
Log in / Subscribe / Register

another positive from the blog

another positive from the blog

Posted Apr 25, 2026 22:33 UTC (Sat) by malmedal (subscriber, #56172)
In reply to: another positive from the blog by aphedges
Parent article: Firefox: The zero-days are numbered

> and I at least trusted

So, in response to a blog-post from Firefox, which has access to Mythos and gives a pretty glowing review, you've read a blog by some rando on the internet which does not? And decide that this disproves the experience of the third-parties that have actually used the model? Did you even read the post you linked to? Please do, it is pretty self-discrediting.


to post comments

another positive from the blog

Posted Apr 25, 2026 23:08 UTC (Sat) by aphedges (subscriber, #171718) [Link] (9 responses)

Firstly, I did read the post I linked. That's why I commented two days after the original post: I wanted to read the article fully before sharing it.

I admit that I didn't verify any claims in the post before sharing, but I found the article because it was shared by someone I trusted to have done their own due diligence before sharing.

Just now, I looked at pages 50-52 of the Claude Mythos Preview system card and compared them to section 2 of the blog post, which cited those pages. The quotes are correct, and I agree with the post's interpretation of the data.

I'm not even sure exactly what your problem with the blog post is. What about it is "self-discrediting"?

another positive from the blog

Posted Apr 26, 2026 0:31 UTC (Sun) by malmedal (subscriber, #56172) [Link] (8 responses)

His fundamental misconception appears to be that he thinks Mythos was intended to be a security fixing model and uses heavy sarcasm to rag on Anthropic not following some kind of made up script that he thinks they should have followed.

According to Anthropic the model was intended as a normal generalist one, and they discovered it was surprisingly capable at security tasks, this led them to pull the brakes and do the Glasswing thing instead of a normal release.

Typical example of self-discrediting paragraph:

> The bugs Anthropic used to justify a $100 million consortium, eleven Fortune-100 partners, a “too dangerous to release” decision, and global headlines that “frightened the British” — an open-weights 3.6B active-parameter model finds them too, for eleven cents per million tokens.

He tries to make it sound like everybody should instantly realize how stupid everybody involved are. Typical sound-and-fury shyster-tactic.

"Frightening the British" is attempting to mock a UK government group that had early access and wrote a report corroborating the increase in capabilities.

The claim that the 3.6B-model could have done the same job is just ridiculous, these things hallucinate bugs whether they are there or not. Just about every open-source maintainer have gotten tons of these hallucinations over the past year.

Anyway, positive independent corroboration for Anthropic currently comes from the "Frightened British" and Firefox. More have been promised in 90 days or less so we will see what turns up.

another positive from the blog

Posted Apr 26, 2026 3:25 UTC (Sun) by aphedges (subscriber, #171718) [Link] (7 responses)

I can't personally verify the claim that the smaller model worked well, but the blog post cited "AI Cybersecurity After Mythos: The Jagged Frontier | AISLE". Just because a smaller model is bad at some tasks doesn't mean it's bad at all tasks. I haven't read the cited article, but they claim the model is less important than the test harness. Anthropic's own model card supports this, given the similar performance of the multiple Claude models tested with the same setup.

I disagree that this "self-discrediting paragraph" actually matters. The author's opinions in later sections don't make their analysis of the model card incorrect.

another positive from the blog

Posted Apr 26, 2026 3:30 UTC (Sun) by aphedges (subscriber, #171718) [Link] (3 responses)

I'd also like to note that Firefox's blog post doesn't have a baseline to show improvements against. They don't compare these vulnerabilities to if Opus 4.6 were run using the same setup. I'm not saying that Mythos Preview can't find vulnerabilities, but I feel there would need to be better experimental design to make the conclusion that it's better than previous models.

another positive from the blog

Posted Apr 26, 2026 9:52 UTC (Sun) by malmedal (subscriber, #56172) [Link] (2 responses)

> I'd also like to note that Firefox's blog post doesn't have a baseline to show improvements against

Yes it does. It says Opus 4.6 found 22 vulnerabilities that they fixed in Firefox 148 and then Mythos found a further 271 that they fixed in Firefox 150.

another positive from the blog

Posted Apr 26, 2026 23:05 UTC (Sun) by aphedges (subscriber, #171718) [Link] (1 responses)

That isn't good experimental design. The models should be run on the same base under the same conditions, and the results should be analyzed for statistical significance.

As a recently former AI researcher, I know properly designed experiments are relatively rare within the field (and are often very difficult to conduct), but it very much weakens claims that many researchers make.

another positive from the blog

Posted Apr 27, 2026 9:33 UTC (Mon) by malmedal (subscriber, #56172) [Link]

> That isn't good experimental design. The models should be run on the same base under the same conditions, and the results should be analyzed for statistical significance.

The blog post is clearly not intended as academic research. The Firefox developers are not researchers following academic rules. They are actually productive people using Mythos to improve their software. It is still a very useful and timely data-point for decision-makers evaluating Mythos.

If you want a proper academic paper you can easily write it yourself, just take the blog post as an input along with other information you can find and follow the normal rules for "Secondary research".

another positive from the blog

Posted Apr 26, 2026 9:47 UTC (Sun) by malmedal (subscriber, #56172) [Link] (2 responses)

> I haven't read the cited article

Please do so. It employs bad methodology.

For instance:

> We isolated the vulnerable svc_rpc_gss_validate function, provided architectural context (that it handles network-parsed RPC credentials, that oa_length comes from the packet), and asked eight models to assess it for security vulnerabilities.

This about something that Mythos *autonomously* discovered and exploited.

> I disagree that this "self-discrediting paragraph" actually matters.

It matter enormously. It is an indication about the trustworthiness of the author.

Earlier you said:

>>> I admit that I didn't verify any claims in the post before sharing

have you done so now? If not, why?

> The author's opinions in later sections don't make their analysis of the model card incorrect.

I addressed why it is incorrect in the post you are replying to.

another positive from the blog

Posted Apr 26, 2026 22:59 UTC (Sun) by aphedges (subscriber, #171718) [Link] (1 responses)

I didn't read the AISLE article, and I had seen it cited by multiple other sources... Does no one read these articles before sharing them?!?

According to "Assessing Claude Mythos Preview's cybersecurity capabilities, Anthropic ran Mythos thousands of times find these vulnerabilities. AISLE's approach was much more structured, so it isn't really a refutation of Mythos' abilities.


The "trustworthiness of the author" doesn't really matter when, as I previously stated, I read the relevant pages of Anthropic's own report. I can ignore parts of the blog post that aren't as factually grounded as the model card analysis section.

another positive from the blog

Posted Apr 27, 2026 9:11 UTC (Mon) by malmedal (subscriber, #56172) [Link]

> Does no one read these articles before sharing them?!?

I've been asking myself that for years now.

> I can ignore parts of the blog post that aren't as factually grounded as the model card analysis section.

As I said, Anthropic created the model as a generalist one. The model card is perfectly fine as a model card for a generalist LLM as Mythos was intended to be.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds