• The Neuron
  • Posts
  • 😸 Our HONEST review of GPT 5.4: They should've called it 5.5

😸 Our HONEST review of GPT 5.4: They should've called it 5.5

PLUS: Claude (officially) blacklisted. Meta sued. Tech media sites tanking. What's next?

Here’s what happened in AI today:

  • OpenAI released ChatGPT 5.4.

  • The Pentagon officially labeled Anthropic a supply-chain risk.

  • Meta is getting sued over what its smart glasses workers saw (it's bad).

  • Cloudflare rebuilt Next.js in one week with one engineer and $1,100 in AI tokens.

  • Tech publications lost 58% of their Google traffic since 2024.

Welcome, humans

Anthropic is having the weirdest week.

On the "things going great" side: more than a million people are now signing up for Claude every day, and analysts are calling it the fastest ARR sprint in tech history (~$20B run rate and counting). Both Anthropic and OpenAI are reportedly prepping IPOs this year. The vibes are immaculate.

On the "things going less great" side: the Pentagon officially labeled Anthropic a supply-chain risk, requiring defense partners to certify they don't use Claude.

The reason? Per my last 10 emails… CEO Dario Amodei refused to let the military use its models for mass surveillance of Americans or fully autonomous weapons. Also, maybe they read Regina George Dario Amodei’s burn book, which he then had to apologize for. We stand by our previous stance: Picking a moral fight with the DOW is the top growth hack of 2026 (just don’t try it at home, kids).

Check it out: Here are three of our recent videos we think you’ll love!

AI coding foundations 101

Reasoning Energy Models

Math Superintelligence

Your AI Just Got Promoted From Chatbot to Coworker

Six weeks ago, if you asked serious AI coders what they used, the answer was Claude. It felt like a teammate you could delegate to. Claude was really having a moment.

Well, whatever gap existed between the two just closed. GPT-5.4 dropped yesterday, and it basically took the coding chops from their Codex line and folded them into a general-purpose model that ALSO handles documents, spreadsheets, and computer use. One model that writes your code, navigates your browser, and builds your slide deck. The only question is: why stop at 5.4? Why didn’t they call this thing 5.5?!

The benchmarks:

  • Knowledge work: On GDPval (real professional tasks across 44 occupations), GPT-5.4 matched or beat human professionals 83% of the time, up from 70.9%. Please note that OpenAI does this annoying thing where they release benchmark pages and then never update the leaderboards when new models drop. Just give us a single leaderboard to look at please.

  • Computer use: 75% on OSWorld (humans score 72.4%). It operates a computer better than the average person.

  • Coding: Matches or beats GPT-5.3-Codex on SWE-Bench Pro while running faster.

  • Tool use: State-of-the-art on BrowseComp (82.7%) and Toolathlon (54.6%).

BTW: In most cases, you should assume that all benchmarks were evaluated with reasoning effort set to xhigh, so you should use xhigh too.

The Every team confirmed the benchmarks: developers who were 90% Claude a month ago are now 50/50. One called it the first OpenAI model where planning and coding feel Opus-level. For our part, we also tested GPT-5.4 live, hooking it up to an MCP to make Cat Doom (yes, a Doom clone with cats).

The catch? OpenAI "loosened" the model to be more conversational, so it occasionally lies, leaks prompts into UI elements, and adds features nobody asked for (like GDPR checkboxes on a demo site). Think of it like that brilliant coworker who sometimes goes rogue on the details. Or y’know, this newsletter! Our DND star sign is chaotic neutral.

The timeline obviously went wild with this one. Best highlights:

  • Matt Shumer called it "the best model in the world, by far" and said coding is "essentially solved." His one gripe: frontend design still lags behind Opus 4.6 and Gemini 3.1 Pro.

  • Vals AI: #1 on Vibe Code Bench, ProofBench, and IOI (competitive programming).

  • Mercor: First model to pass 50% on APEX-Agents. A year ago, frontier models scored under 5%.

  • Min Choi shared demos from 5.4’s launch page, like one-shot chess games, flight simulators, theme park sims, and RPGs.

  • Yuchen Jin trolled Pro mode: a "Hi" cost $80 and took 5 minutes. "Do you have AGI-level questions to ask?"

  • Dwayne spotted a GPT-6 mention hidden in the Codex chess demo. GPT-6 before GTA 6 is the current meta.

How to try it: Go to ChatGPT today (GPT-5.4 Thinking for Plus, Team, and Pro users). For developers: $2.50 per million input tokens, which = half the price of Opus. As Josh Kale put it: "It's never been more important to use these tools for leverage rather than let market forces apply that leverage against you." Read the rest in our deep dive above.

FROM OUR PARTNERS

The Headlines Traders Need Before the Bell

Tired of missing the trades that actually move?

In under five minutes, Elite Trade Club delivers the top stories, market-moving headlines, and stocks to watch — before the open.

Join 200K+ traders who start with a plan, not a scroll.

AI Skill of the Day: Get AI to Show Its Work (Then Fix It Before It Starts)

One of GPT-5.4's best new features is "steerable thinking plans," and the technique works across any reasoning model (Claude, Gemini, etc.). Instead of letting AI jump straight to an answer, you ask it to outline its approach first, then you correct course before it does the work.

Here's the move: after giving your prompt, add this line:

"Before you begin, outline your step-by-step plan for completing this task. Wait for my approval or edits before proceeding."

This works especially well for complex tasks like analyzing data, writing reports, or debugging code. You catch wrong assumptions early instead of getting 2,000 words of beautifully wrong output.

Our favorite insight: Like good management 101, the biggest time-saver now is catching mistakes before they happen. Think of it like reviewing a blueprint before construction, not after.

Want more tips like this? Check out our AI Skill of the Day Digest for this month.

Treats to Try

*Asterisk = from our partners (only the first one!). Advertise to 650K+ readers here!

  1. *Outskill is hosting a 2 day LIVE AI Mastermind where you'll build automations, create personalized agents, and learn to turn AI into your ultimate competitive edge. Register here before they run out of seats (free for next 72 hours only).

  2. Cursor launched Automations, always-on agents that trigger automatically on PR merges, Slack messages, GitHub events, or schedules to handle code reviews, bug triage, and codebase maintenance in cloud sandboxes.

  3. Luma launched Uni-1, a unified intelligence model that plans and generates across text, image, video, and audio in one conversation with persistent context, self-critique refinement, and orchestration of other models; early clients like Publicis used it to build full localized ad campaigns in ~40 hours

  4. Willow’s new Teams plan adds shared dictionaries, admin controls, and SOC 2 / HIPAA compliance for its core service (which converts speech into context-aware, auto-formatted text across Slack, Gmail, Cursor, and more on Mac, Windows, and iPhone)—free to start, then $10/month per seat (teams).

  5. Vela automates complex multi-party scheduling across email, SMS, WhatsApp, Slack, and phone by understanding natural language constraints and handling follow-ups automatically—no pricing details.

  6. Domain Maps provides visual cheat sheets of essential terminology across creative fields (AI image generation, UI / UX, motion graphics, game design) so you can prompt AI more precisely—free to try.

Around the Horn

  1. Meta is now being sued over its AI smart glasses after contract workers in Kenya reviewed intimate user footage, including nudity, without proper disclosure.

  2. U.S. officials proposed sweeping new chip export controls that would require large foreign buyers to invest in American data centers or provide security guarantees.

  3. Ten major tech publications lost 58% of their Google organic traffic since 2024 (Digital Trends down 97%, ZDNet 90%) as AI Overviews and chatbots divert searches away from publishers.

  4. Gergely Orosz writes that Cloudflare rebuilt Next.js as a drop-in replacement in one week with one engineer and $1,100 in AI tokens, achieving 94% API compatibility, 4× faster builds, and 57% smaller bundles.

Intelligent Insights

  • Ajeya Cotra at METR writes that AI coding agents are improving so fast she's already blown past her January predictions, and by year's end, the concept of measuring AI by "how long would this take a human" may stop making sense entirely.

  • Aaron Levie argues on the Latent Space podcast that AI agents can't scale in the enterprise until companies build proper infrastructure for agent identities, file permissions, and governance; compelling take (Full episode).

  • Jeremy Howard warns that AI-assisted coding is becoming a "slot machine" where developers accept whatever the model outputs without building real technical intuition (the counter point to our live from yesterday).

  • Vinod Khosla tells Fortune he believes AI will automate 80% of labor, and lays out his vision for free education, free healthcare, and no taxes under $100K.

  • Prompt Engineering walks through how to apply classic three-tier architecture (data, processing, presentation) to build AI agent systems that actually work in production.

A Cat’s Commentary

That’s all for now.

What'd you think of today's email?

Login or Subscribe to participate in polls.

P.P.S: Love the newsletter, but only want to get it once per week? Don’t unsubscribe—update your preferences here.