https://swelljoe.com/post/will-it-mythos/: "Poor performer here, only found the one bug that almost every model found, despite its performance on other benchmarks being excellent for its size. […] It also performs poorly in a chat without tools, exhibiting an ehthusiasm for hallucination. I’m currently working on a replication of this with full tool access, including bash/Python, which may allow this model to be competitive."
Ornith-1.0: self-improving open-source models for agentic coding
https://github.com/deepreinforce-ai/Ornith-1