Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding (deep-reinforce.com)

29 points by kordlessagain 19 hours ago | 5 comments

nzach 1 hours ago [-]

Instead of training the model to directly answer questions we trained the model to always write and execute the code that would solve the question ?

If that is the case, this isn't just a fancy way to perform prompt optimization?

SwellJoe 17 hours ago [-]

I added this to a benchmark I've been doing of how well agents find security bugs, specifically security bugs originally found by Mythos. It performs poorly with only read/grep/ls tools, but in a follow-up test with a full shell and Python, it doubled its findings (still a poor showing, but it does at least indicate it is doing what it says on the tin: making tools to help it solve problems). It also did worse than Qwen AgentWorld, another recent post-train of Qwen 3.6 MoE intended for agentic use.

https://swelljoe.com/post/will-it-mythos/

kordlessagain 9 hours ago [-]

Good to know. Thanks for the research!

Balinares 1 hours ago [-]

I'd have expected this to get more HN attention. Qwen 3.6 35B capability in a 9B model is a bonkers claim.

chid 11 minutes ago [-]

I thought so too when I read the headline but I expect it's basically Qwen3.5-9B

Rendered at 13:23:04 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.