The Neuron
Posts
😺 Anthropic: AI Is Building AI now

😺 Anthropic: AI Is Building AI now

PLUS: TSMC's supply warning, ChatGPT memory, and Gemma.

Matthew Robinson & Grant Harvey
June 05, 2026

Welcome, humans.

AI companies have spent the last two years asking enterprise buyers to believe in productivity based on demos, screenshots, and token bills.

Well, Cognition is adding AI’s first money-back guarantee:

The company introduced an AI Productivity Guarantee for enterprise Devin customers (if you don’t know, Devin is an AI coding agent).

Here’s the promise: if Devin delivers less engineering value than you pay for, Cognition says it will fund your usage until it does, up to $10M. Cognition says its measurement system checks whether Devin’s work was useful, then estimates how long a human engineer would have taken to do the same job…. or your money back!

That matters because corporate AI to date has been graded by the weirdest report card imaginable: number of tokens burned, or messages sent, or seats assigned, and charts that mostly prove someone opened the app.

Cognition is asking the expensive question: did using AI actually save time? Did it actually create value? If not, we’ll keep rolling the dough until it does.

Somewhere, a CFO just whispered “finally” into this next quarter’s financial projections.

Here’s what happened in AI today:

🙀 Anthropic says Claude now writes 80% of its code
📰 TSMC warned AI chip demand will outstrip supply for years
📰 OpenAI upgraded ChatGPT memory with reviewable summaries
🍪 Google brought Gemma 4 12B to laptop apps
💡 Ethan Mollick argued humans must negotiate with agents

Hey: Want to reach 700,000+ AI-hungry readers? Advertise with us!

P.S: Love robots? We’re starting a new robotics newsletter! Sign up early here.

🙀 Anthropic says Claude is already helping build Claude.

Closing the loop, aka the machines are starting to build themselves

Most people use AI to write emails, fix spreadsheets, or debug the kind of code that makes them briefly reconsider their career.

Anthropic is using Claude on a more consequential loop: building AI itself.

The company published a new Anthropic Institute post arguing that Claude is already accelerating AI development, and may be an early step toward recursive self-improvement, which means AI systems helping design and build more capable future AI systems.

Here’s what happened:

Anthropic says more than 80% of production code merged into its codebase in May 2026 was authored by Claude.
The average Anthropic engineer now merges 8x as much code per day as they did in 2024.
On Anthropic’s most open-ended coding tasks, Claude’s success rate reached 76%, up 50 points in six months.
In one internal research test, Claude Mythos Preview sped up model-training code by ~52x, compared with ~3x for Claude Opus 4 in May 2025.
In selected research sessions where humans took a wrong turn, Mythos suggested the better next step 64% of the time.

Why this matters: Anthropic’s argument is careful. The loop remains incomplete, and the company says recursive self-improvement is not inevitable.

But the evidence shows a clear change inside frontier AI labs. AI development has two big parts: execution and judgment. Execution is writing code, running experiments, fixing bugs, and testing results. Judgment is deciding which problems matter, which results to trust, and when an idea is a dead end.

Claude is getting much better at the execution layer. Anthropic says it can now write production code, debug live incidents, run experiments toward a fixed goal, and review human code for bugs before it ships.

Humans still have the lead on direction-setting. That is the important boundary. If models keep improving at execution, each researcher can steer more work. If models also improve at research judgment, the feedback loop gets much tighter.

Anthropic lays out three possible futures: progress stalls, AI labs keep compounding efficiency gains while humans stay in charge, or AI systems become capable of fully building their successors.

Our take: The precise claim here is more interesting than “Claude wrote itself.”

Anthropic is showing the human role narrowing in real time. First, humans typed the code. Then they directed and reviewed it. Next, they may spend most of their time choosing goals, checking outputs, and deciding which machine-run experiments deserve trust.

That still leaves humans in the loop. It also makes the loop smaller.

FROM OUR PARTNERS

The Neuron Exclusive: Invest in High-Potential AI Startups Like These

The Neuron and Alumni Ventures are giving readers early access to high-growth startup opportunities, including some of today’s most exciting AI, Deep Tech, Quantum Computing, and Cybersecurity companies co-invested alongside top VC firms like Andreessen Horowitz (a16z), Bessemer, & Y Combinator.

You get:

Curated deal flow of high-potential AI Firststartups
AV is already investing alongside elite lead venture firms in these deals
No cost to see deals
No obligation to invest

Don’t miss your chance before access closes.

→ Join Alumni Ventures AI First Syndicate Today

🎓 AI Skill of the Day: Make AI Show Its Work Receipt

Using AI more often is easy. Proving it helped is the hard part.

Today’s skill is to make your chatbot produce a “work receipt” after any important task. The goal is simple: measure finished output, review required, time saved, and risk. That keeps you from confusing activity with value, which is the same trap Cognition is trying to solve with its Devin guarantee.

After a project, paste this prompt into ChatGPT, Claude, or Gemini. Then compare the AI’s claims against what you actually shipped.

Review the work we just completed and create an AI work receipt.

Include:
1. Finished output: What was actually completed?
2. Human baseline: How long would this likely take me manually?
3. AI-assisted time: How long did this take with you?
4. Review required: What did I still need to check, rewrite, or fix?
5. Risk: What could be wrong, incomplete, or misleading?
6. Final value estimate: Was this a small assist, a major time saver, or not worth using AI for?

Be conservative. Do not count drafts, ideas, or unused output as completed work.

The key line is “be conservative.” AI is great at sounding productive. The receipt makes it prove the work survived contact with reality.

Total AI beginner? Start here (goes with this video).

Have a specific skill you want to learn? Request it here.

🍪 Treats to Try

*Asterisk = from our partners (only the first one!). Advertise to 700K+ readers here!

*Move beyond chatbots. Join Google Cloud’s Startup School (June 9-18) to build production-ready, autonomous multimodal AI agents. Save your spot
Google AI Edge runs Gemma 4 12B locally on your laptop so you can analyze data, generate scripts, and build on-device workflows without sending data to the cloud - free to try.
Raindrop 2.0 monitors production agents, detects silent failures, traces what went wrong, and verifies whether your fix worked on live traffic - no pricing details.
Locally puts LM Studio’s local models on your iPhone or iPad through an end-to-end encrypted link to your desktop setup - no pricing details.
Stemdeck splits songs into vocals, drums, bass, guitar, and piano stems from a YouTube link or MP3, then lets you edit them locally in a multitrack waveform view - free to try.
Tasklet for Teams turns personal agent workflows into shared company infrastructure with team workspaces, shared tools, shared knowledge, shared agents, and spend controls - no pricing details.
Higgsfield MCP helps you build a company from inside Claude by generating brand identity, app screens, motion videos, founder posts, ads, and viral creative tests - no pricing details.
Spiral 4.0 learns your writing style from examples and gives teams, agents, CLI, API, and MCP access to that voice system - starts at $15/mo.

NEW Podcast: Tudor Achim says the bigger shift is that AI may finally be able to prove what it says.

Did you know we have a podcast (The Neuron: AI Explained) where we talk to fascinating people in the industry who teach us how it actually works? Check it out:

New episodes air every week on: Spotify | Apple Podcasts | YouTube

📰 Around the Horn

TSMC CEO C.C. Wei told shareholders it will take a long time to meet AI-fueled chip demand, while saying the company plans to keep prices stable.
OpenAI rolled out a new ChatGPT memory system that updates useful context over time and gives users a reviewable summary to steer what gets remembered.
NVIDIA released Nemotron 3 Ultra, a 550B open model built for long-running agents with 1M context, faster inference, and lower complex-task costs. Corey got early access to; read his story on it here.
Supabase raised $500M at a $10.5B valuation as vibe-coding demand pushed its open-source database deeper into agent infrastructure.
U.K. regulators required Google to give publishers an opt-out tool for generative AI search features, with U.K. testing first and global rollout planned.
1X launched a World Model Lab for humanoid robots, arguing general-purpose robots need dedicated world models instead of fine-tuning alone.

FROM OUR PARTNERS

Build for global growth with language datasets that help you go to new markets faster. Mozilla Data Collective offers 600+ documented datasets across 300+ languages, helping companies reach new customers and strengthen multilingual AI capabilities with consented, traceable datasets.

Browse and download free datasets

💡 Intelligent Insights:

Co-Existence and the End of Co-Intelligence (Ethan Mollick) argues that humans now need to decide when to hand work to agents, when to refuse help, and when to keep judgment in charge.
To Boldly Go: The Case for Space Datacenters (SemiAnalysis) argues that orbital compute only starts making sense under severe earthbound power and chip constraints.
A Compute Tax Is a REALLY Dumb Idea (Brian Albrecht) argues that taxing compute targets a manipulable input while punishing capital and intermediate goods.
Games between Programs (Stephen Wolfram) explores why competition between programs can reward simple hacks, opaque strategies, and outcomes that are hard to predict.
GSM-Symbolic (Apple researchers) shows how adding irrelevant details to grade-school math problems can sharply reduce model performance, a reminder that fluent reasoning can still be brittle.
No, Artificial Intelligence Is Not Conscious (The Atlantic) argues that treating current AI systems as conscious creates moral confusion before the evidence is there.

A Cat’s Commentary

That’s all for now.

What'd you think of today's email?

P.S: Before you go… have you subscribed to our YouTube Channel? If not, can you?

Click the image to subscribe!

P.P.S: Love the newsletter, but only want to get it once per week? Don’t unsubscribe—update your preferences here.