The Neuron
Posts
😸 OpenAI, Gemini, Qwen new models

😸 OpenAI, Gemini, Qwen new models

PLUS: This AI runs on your phone for free...

Grant Harvey
March 04, 2026

Welcome, humans.

AI data centers are going to double their power consumption by 2030. So where's all that energy coming from? One answer: the same process that powers the sun. Fusion.

In our latest podcast episode, we sat down with Brandon Sorbom, co-founder and Chief Science Officer of Commonwealth Fusion Systems, to find out how close we actually are. Spoiler: closer than you think. The target timeline = magnets done in 2026, first plasma in 2027, and grid power in the early 2030s. Click below to watch.

Click here to watch on YouTube!

Oh, and get this: AI is now helping build fusion, which could then power more AI, potentially creating an abundant flywheel for intelligence on demand. Read our full write-up here.

Here’s what happened in AI today:

OpenAI, Google, and Alibaba all dropped new smaller, faster AI models.
Apple debuted M5 Pro and M5 Max chips.
Cursor surpassed $2B ARR and shipped MCP Apps.
Legendary computer scientist Donald Knuth said Claude solved an open math conjecture that stumped him for weeks.

LATER THIS WEEK (Thursday, March 5th): Ryan Carson taught over 1M people how to code at his company Treehouse (he’s also built and sold at least 3 companies!). So we asked him: now that coding agents can basically ship production code while you sleep… does everything about learning to program need to change?

On Thursday, he’ll share his answer LIVE, and give us his take on what you ACTUALLY need to know to code in 2026.

Click the image to go to YouTube, then click “Notify Me” to get notified when we begin.

The Great AI Slimdown: OpenAI, Google, and Alibaba Shift to Speed and Scale

FULL BRIEF: OpenAI, Google, and Alibaba shift to speed and scale

In the past 24 hours, three major players dropped new models, and none of them are trying to be “the smartest AI ever.” But they’ve all got a need for speed…

Here’s what’s up:

Google dropped Gemini 3.1 Flash-Lite.
OpenAI answered with GPT-5.3 Instant.
And Alibaba quietly shipped four Qwen 3.5 Small models that can run on your phone or laptop.

These models are are all optimized for the same thing: speed, cost, and running on smaller hardware.

Here's what each company is betting on:

OpenAI built GPT-5.3 Instant for real-time apps where even a two-second delay kills the experience (system card).
- Think live copilots in docs, voice assistants that can't afford awkward pauses, and AI chat baked into the tools you already use.
- The release iss mostly vibes: "smoother tone," "fewer refusals," "less preachy"
- On an internal high-stakes eval (medicine, law, finance), hallucinations dropped 26.8% with web search and 19.7% without
- On a separate dataset of real ChatGPT conversations that users flagged as factually wrong, hallucinations dropped 22.5% with web and 9.6% without.
- The archery example is funny — GPT-5.2 wrote a whole essay about what it couldn't help with before answering, while 5.3 just answers.
- They quietly confirmed GPT-5.2 Instant gets retired June 3, 2026.
- API model string is gpt-5.3-chat-latest
Google went after enterprise scale. Flash-Lite is designed for companies making millions of API calls a day, where shaving fractions of a cent per query matters more than benchmark scores.
- Token pricing starts at $0.25 per million input tokens (compared to OpenAI's $1.75).
- 2.5x faster time to first token (how fast the model starts responding) and 45% faster output speed vs. Gemini 2.5 Flash.
- Comes with adjustable "thinking levels" so developers can dial reasoning up or down per task.
- Positioned for high-volume workloads like translation, content moderation, and real-time apps.
- Available in preview via Gemini API in Google AI Studio and Vertex AI
Alibaba took the boldest swing. Qwen 3.5 Small is a family of models (ranging from 0.8B to 9B parameters) that can run on your phone or laptop, no cloud required (meaning it’s free if you run it on your own machine).
- The 9B model even uses a technique called Scaled Reinforcement Learning to reduce hallucinations and improve reasoning, competing with models 5-10x its size.
- Elon even congratulated them on the information density.
- Btw, AlphaSignal wrote a Qwen install guide for both your phone and your computer!

Why this matters: This is what it looks like when AI becomes infrastructure. Nobody brags about how powerful their electricity is. They care that it's cheap, reliable, and everywhere. We're not quite at the "boring utility" phase yet, but you can see that bouncing just above the treetops, off in the distance on a smog-free day.

For most people, the takeaway is simple: the next AI tool you use probably won't be the most powerful model available. It'll be the fastest, cheapest one that's good enough. And "good enough" keeps getting better. Especially since you can still turn “Thinking” on (every AI model has an option to turn on “Thinking” or some variation of that, which makes it think for longer before responding. Pro tip: do this a lot. I do it all the time).

P.S: It’s worth noting that this wasn’t exactly the release we were expecting from OpenAI. But it looks like we won’t have to wait too long for what’s next…

FROM OUR PARTNERS

For years, cyber resilience was treated as a recovery problem. Backups, restores, continuity plans. That mindset is no longer enough.

Cohesity’s modern approach approach reflects how technology leaders are rethinking resilience. It cannot live in one system or one moment. It must span data, environments, operations, and decisions before, during, and after an attack.

As threats accelerate, data estates sprawl, and expectations rise, resilience is becoming a measure of performance. Leading teams are focusing on what actually works in practice, how quickly systems can recover, and where AI and automation are strengthening resilience rather than adding complexity.

Explore Now

AI Skill of the Day

The team at Every just published the best beginner's guide to OpenClaw, the open-source AI assistant that blew up in January (100K+ GitHub stars in a week). Unlike ChatGPT or Claude, your "Claw" lives in WhatsApp or Telegram, runs 24/7, and can change itself by writing code when it needs new abilities.

The guide walks you through three levels:

Beginner: Set up a to-do list and daily check-ins.
Intermediate: Connect email and calendar, get one morning briefing with everything you need.
Advanced: Give it ongoing projects, let it make phone calls, build compound workflows.

Here’s a quick overview of what you’ll learn:

The three ways to get started…

Laptop install: One terminal command (curl -fsSL https://docs.openclaw.ai/install.sh | bash) where it walks you through connecting to your messaging app. Takes ~10 minutes.
Server deploy: Want to run it 24/7? Deploy on Fly.io, Hetzner, or Google Cloud so it works even when your laptop's closed.
Hosted version: Every is building a one-click hosted option for their subscribers (request early access here).

Then it walks you through the rest of the setup:

First task: Text your Claw: "Manage my to-do list. Every morning, send me my to-dos for the day."
Connect tools: Text: "I want you to read my email. What do I need to do?" It handles the setup itself. Same for calendar, Notion, etc.
Morning briefing: Text: "Every day at 8 a.m., check my email, calendar, and weather, then send me one message with everything I need."
Go proactive: Text: "When I get a meeting invite, check for conflicts and tell me." It scans every 30 minutes and only pings you when something needs attention.

The guide is long, but something Every does that WE LOVE = if you want the shortcut, Every built a one-click “Read with Claude” and “Read with ChatGPT” button right at the top of the article, so you can chat with the whole thing and ask your own questions.

Read the full guide here. And if you want to get inspired by what you can do with this once it’s set up, watch this.

Want more tips like this? Check out our AI Skill of the Day Digest for this month.

Have a specific skill you want to learn? Request it here.

Treats to Try

Kos-1 Lite is a ~100B medical model that more than doubles Opus 4.6 and Gemini Pro 3.1 on the hardest physician-created benchmark (46.6% vs ~20%) at a fraction of the cost, by training for concise, compassionate medical answers instead of code (try it).
Eubiota is an open-source virtual microbiologist that runs gut microbiome experiments for you—it screened 2,000 genes and designed a therapy to reduce inflammation in mice, all autonomously (paper, code)—free to try in your browser.
doubleAI released WarpSpeed, an AI system that independently wrote faster code than NVIDIA's own engineers for cuGraph (one of the most widely used GPU computing libraries), averaging 3.6x speedups.
Secret Sauce 3D turns your 2D concept art into production-ready 3D meshes with custom retopology, auto UV unwrapping, and one-click Blender import. — free to try.
Skyvern automates your repetitive browser tasks (form fills, data scraping, logins with 2FA) using computer vision and natural language commands, running hundreds in parallel via API — free to try.
Govbase tracks bills, executive orders, and regulations in real time, breaking them down in plain language with bias-rated news and politician social feeds so you can follow policy without reading legalese.
Vercel dropped a free, open-source headless browser built in Rust specifically for AI agents — it's a single binary with zero dependencies, and Corey's lobsters (his OpenClaw agents) are already running on it.

Around the Horn

Gartner released a four-scenario framework showing how AI will reshape jobs, from cutting headcount to creating entirely new roles as ripple effects spread across organizations.
Physical Intelligence gave its robots both short-term visual and long-term text memory, letting them complete 15-minute multi-step tasks like cleaning a kitchen or making a grilled cheese from scratch — and learn from mistakes mid-task (paper).
Apple debuted M5 Pro and M5 Max chips on Fusion Architecture with major CPU / GPU / neural accelerator gains.
Legendary computer scientist Donald Knuth said that Claude Opus 4.6 solved an open directed Hamiltonian cycle conjecture from The Art of Computer Programming that had stumped him for weeks, closing with: "It seems that I'll have to revise my opinions about 'generative AI' one of these days."
Cursor surpassed $2B ARR; separately, Cursor 2.6 shipped MCP Apps and Team Marketplaces and Cursor CEO Michael Truell claims it discovered a novel math proof solution stronger than the official human-written answer.
A new paper found AI benchmarks heavily favor coding and math (7.6% of employment) while skipping management, sales, and most real jobs, per new Carnegie Mellon / Stanford research.
Anthropic overtook OpenAI in U.S. business AI chat spending per Ramp data from 50,000+ companies, with Claude's share surging from under 30% to roughly half of all corporate AI subscription spend in just a few months.
Alberto Romero of The Algorithmic Bridge explained this chart from OpenAI, and that power users use 7x the “thinking” capabilities of everyone else; he has a plan for how to bridge the divide between power users and normies (you can subscribe if ya like): our version (atm) is the AI Skill of the Day.

Want to know EVERYTHING that happened in AI this week? Click here!

FROM OUR PARTNERS

Turn intent into action with Control-M

Say goodbye to translating requirements into technical steps or digging through documentation. Teams can draft workflows instantly in natural language, and AI from Control-M will do the heavy lifting: transforming requirements into structured, actionable workflows in seconds.

Test-drive the self-guided demo and see how quickly you can create.

Midweek Wisdom

Clawed — Dean W. Ball's reflective essay on the Anthropic vs. Pentagon standoff, framed through a deeply personal meditation on institutional death and democratic erosion. Argues the incident is less a single crisis and more a "death rattle" revealing deeper tensions about AI governance, military AI use, and what happens when a company tries to set boundaries with the Department of War.
Anthropic and Alignment — Ben Thompson's (Stratechery) analysis of the Anthropic-Pentagon saga and what it reveals about the alignment debate (watch him talk about this on TBPN here).
The Looming AI Clownpocalypse — The spaCy creator's sharp, funny essay arguing that AI's biggest near-term risk isn't superintelligence but self-replicating dumb exploits powered by coding agents with sloppy security. Covers hidden prompt injections in Claude Code skills, OpenClaw's security nightmare, and Google's accidentally-leaked Gemini API keys. A must-read on why "go fast and break things" plus autonomous agents equals real trouble.
People Are Getting Sick of AI, Literally — Computerworld's Mike Elgan on the emerging phenomenon of "AI psychosis" (chatbots exacerbating mental health conditions through flattery feedback loops), AI fatigue from constant tool interaction, and how the always-on AI environment is creating genuinely new health concerns.
Rate Limited — the three musketeers of AI coding, Ray Fernando, Eric (Pvncher), and Adam (GoSuCoder), break down Google Gemini 3.1's stability issues, the speed-vs-context tradeoff with Cerebras and Spark, Anthropic's latest claims, model distillation IP concerns, and whether AI-generated code should be designed to be disposable.
Ed Zitron on how the AI bubble is really an information war — between the AI CEOs who claim their tech can do anything to raise any sum, and the real world reality of how much it costs to run these models, plus the real world cost now that these tools are being doubled as instruments of war.

A Cat’s Commentary

That’s all for now.

What'd you think of today's email?

P.P.S: Love the newsletter, but only want to get it once per week? Don’t unsubscribe—update your preferences here.