• The Neuron
  • Posts
  • 😺 Anthropic: AI Is Building AI now

😺 Anthropic: AI Is Building AI now

PLUS: TSMC's supply warning, ChatGPT memory, and Gemma.

AI companies have spent the last two years asking enterprise buyers to believe in productivity based on demos, screenshots, and token bills. Cognition is adding a receipt.

The company introduced an AI Productivity Guarantee for enterprise Devin customers: if Devin delivers less engineering value than you pay for, Cognition says it will fund your usage until it does, up to $10M. Cognition says its measurement system checks whether Devin’s work was useful, then estimates how long a human engineer would have taken to do the same job.

That matters because corporate AI has been graded by the weirdest report card imaginable: tokens burned, messages sent, seats assigned, and charts that mostly prove someone opened the app. Cognition is asking the expensive question: did this save time? Somewhere, a CFO just whispered “finally” into a spreadsheet.

Here’s what happened in AI today:

  • 🙀 Anthropic says Claude now writes 80% of its code

  • 📰 TSMC warned AI chip demand will outstrip supply for years

  • 📰 OpenAI upgraded ChatGPT memory with reviewable summaries

  • 🍪 Google brought Gemma 4 12B to laptop apps

  • 💡 Ethan Mollick argued humans must negotiate with agents

Hey: Want to reach 700,000+ AI-hungry readers? Advertise with us! 

P.S: Love robots? We’re starting a new robotics newsletter! Sign up early here.

🙀 Anthropic says Claude is already helping build Claude.

Closing the loop, aka the machines are starting to build themselves

Most people use AI to write emails, fix spreadsheets, or debug the kind of code that makes them briefly reconsider their career.

Anthropic is using Claude on a more consequential loop: building AI itself.

The company published a new Anthropic Institute post arguing that Claude is already accelerating AI development, and may be an early step toward recursive self-improvement, which means AI systems helping design and build more capable future AI systems.

Here’s what happened:

  • Anthropic says more than 80% of production code merged into its codebase in May 2026 was authored by Claude.

  • The average Anthropic engineer now merges 8x as much code per day as they did in 2024.

  • On Anthropic’s most open-ended coding tasks, Claude’s success rate reached 76%, up 50 points in six months.

  • In one internal research test, Claude Mythos Preview sped up model-training code by ~52x, compared with ~3x for Claude Opus 4 in May 2025.

  • In selected research sessions where humans took a wrong turn, Mythos suggested the better next step 64% of the time.

Why this matters:

Anthropic’s argument is careful. The loop remains incomplete, and the company says recursive self-improvement is not inevitable.

But the evidence shows a clear change inside frontier AI labs. AI development has two big parts: execution and judgment. Execution is writing code, running experiments, fixing bugs, and testing results. Judgment is deciding which problems matter, which results to trust, and when an idea is a dead end.

Claude is getting much better at the execution layer. Anthropic says it can now write production code, debug live incidents, run experiments toward a fixed goal, and review human code for bugs before it ships.

Humans still have the lead on direction-setting. That is the important boundary. If models keep improving at execution, each researcher can steer more work. If models also improve at research judgment, the feedback loop gets much tighter.

Anthropic lays out three possible futures: progress stalls, AI labs keep compounding efficiency gains while humans stay in charge, or AI systems become capable of fully building their successors.

Our take:
The precise claim here is more interesting than “Claude wrote itself.”

Anthropic is showing the human role narrowing in real time. First, humans typed the code. Then they directed and reviewed it. Next, they may spend most of their time choosing goals, checking outputs, and deciding which machine-run experiments deserve trust.

That still leaves humans in the loop. It also makes the loop smaller.

The Neuron and Alumni Ventures are giving readers early access to high-growth startup opportunities, including some of today’s most exciting AI, Deep Tech, Quantum Computing, and Cybersecurity companies co-invested alongside top VC firms like Andreessen Horowitz (a16z), Bessemer, & Y Combinator.

You get:

  • Curated deal flow of high-potential AI Firststartups

  • AV is already investing alongside elite lead venture firms in these deals

  • No cost to see deals

  • No obligation to invest

Don’t miss your chance before access closes.

Using AI more often is easy. Proving it helped is the hard part.

Today’s skill is to make your chatbot produce a “work receipt” after any important task. The goal is simple: measure finished output, review required, time saved, and risk. That keeps you from confusing activity with value, which is the same trap Cognition is trying to solve with its Devin guarantee.

After a project, paste this prompt into ChatGPT, Claude, or Gemini. Then compare the AI’s claims against what you actually shipped.

Review the work we just completed and create an AI work receipt.

Include:
1. Finished output: What was actually completed?
2. Human baseline: How long would this likely take me manually?
3. AI-assisted time: How long did this take with you?
4. Review required: What did I still need to check, rewrite, or fix?
5. Risk: What could be wrong, incomplete, or misleading?
6. Final value estimate: Was this a small assist, a major time saver, or not worth using AI for?

Be conservative. Do not count drafts, ideas, or unused output as completed work.

The key line is “be conservative.” AI is great at sounding productive. The receipt makes it prove the work survived contact with reality.

Total AI beginner? Start here (goes with this video).

Have a specific skill you want to learn? Request it here. 

Did you know we have a podcast (The Neuron: AI Explained) where we talk to fascinating people in the industry who teach us how it actually works? Check it out:

New episodes air every week on: Spotify | Apple Podcasts | YouTube 

📰 Around the Horn

  • TSMC CEO C.C. Wei told shareholders it will take a long time to meet AI-fueled chip demand, while saying the company plans to keep prices stable.

  • OpenAI rolled out a new ChatGPT memory system that updates useful context over time and gives users a reviewable summary to steer what gets remembered.

  • NVIDIA released Nemotron 3 Ultra, a 550B open model built for long-running agents with 1M context, faster inference, and lower complex-task costs. Corey got early access to; read his story on it here.

  • Supabase raised $500M at a $10.5B valuation as vibe-coding demand pushed its open-source database deeper into agent infrastructure.

  • U.K. regulators required Google to give publishers an opt-out tool for generative AI search features, with U.K. testing first and global rollout planned.

  • 1X launched a World Model Lab for humanoid robots, arguing general-purpose robots need dedicated world models instead of fine-tuning alone.

Build for global growth with language datasets that help you go to new markets faster. Mozilla Data Collective offers 600+ documented datasets across 300+ languages, helping companies reach new customers and strengthen multilingual AI capabilities with consented, traceable datasets.

💡 Intelligent Insights:

  • Co-Existence and the End of Co-Intelligence (Ethan Mollick) argues that humans now need to decide when to hand work to agents, when to refuse help, and when to keep judgment in charge.

  • To Boldly Go: The Case for Space Datacenters (SemiAnalysis) argues that orbital compute only starts making sense under severe earthbound power and chip constraints.

  • A Compute Tax Is a REALLY Dumb Idea (Brian Albrecht) argues that taxing compute targets a manipulable input while punishing capital and intermediate goods.

  • Games between Programs (Stephen Wolfram) explores why competition between programs can reward simple hacks, opaque strategies, and outcomes that are hard to predict.

  • GSM-Symbolic (Apple researchers) shows how adding irrelevant details to grade-school math problems can sharply reduce model performance, a reminder that fluent reasoning can still be brittle.

  • No, Artificial Intelligence Is Not Conscious (The Atlantic) argues that treating current AI systems as conscious creates moral confusion before the evidence is there.

A Cat’s Commentary

That’s all for now.

What'd you think of today's email?

Login or Subscribe to participate in polls.

P.S: Before you go… have you subscribed to our YouTube Channel? If not, can you?

Click the image to subscribe!

P.P.S: Love the newsletter, but only want to get it once per week? Don’t unsubscribe—update your preferences here.