The Neuron
Posts
😺 Watch: AI can do your taxes now

😺 Watch: AI can do your taxes now

OpenAI's John de Wasseige and Arthur Fernandes share how to build world class AI products

Matthew Robinson
July 01, 2026

Click the image above to watch directly on YouTube!

Welcome, humans.

Most AI agents are still graded like interns: impressive in the demo, scary when the stakes are real.

Tax prep is where that framing gets useful. In our latest podcast episode, Grant and Corey sat down with OpenAI's John de Wasseige and Arthur Fernandes Araujo, who helped build Tax AI with Thrive Holdings.

Tax AI is a Codex-powered agent for complex accounting workflows. It can parse messy PDFs, spreadsheets, images, client notes, tax forms, and source evidence, then prepare returns for accountant review.

The wild part is not that the agent helps with taxes. It is that the improvement loop is built around expert corrections: practitioners review the work, corrections become structured signals, Codex investigates the relevant traces and evals, and engineers review scoped fixes before anything ships.

In plain English: the agent gets better because the experts correct it, and the system turns those corrections into measurable product improvements. That is a much more realistic future for agents than the usual magic demo.

Click to watch on YouTube!

Here's our favorite parts:

(5:23) The eight-hour tax pile: Arthur explains why complicated returns can mean raw PDFs, spreadsheets, handwritten notes, missing client info, and a deadline that does not care.
(8:18) Evidence before trust: John walks through how Tax AI shows the source behind an extracted value, down to the document, page, cell, or reconciled file.
(10:40) The harness is the product: Arthur explains that the self-improving part is the instruction and workflow layer around Codex, not a vague promise that the model magically fixes itself.
(11:11) When corrections become signal: A few accountants overriding the same field can reveal a repeatable edge case that the team can measure, test, and improve.
(13:43) Why the model ceiling moved: New Codex capabilities made orchestration, independent review, and multi-threaded work easier to build than they were just months earlier.
(36:37) Where tax is math, and where it is art: Arthur separates bounded data entry from the strategic, human parts of tax advice that still depend on judgment and client context.
(40:31) Start with evals: For any company building a domain agent, Arthur says the first step is knowing exactly what you can measure.
(44:15) The AI found the human mistake: The team found cases where the model extracted the right value, but the old ground truth was wrong.
(46:42) Review beats retyping: Arthur explains why reviewing a mostly correct return can take far less effort than entering every field yourself.

Why watch this? Because this is one of the clearest examples we have seen of how agents can work inside expert workflows without asking people to blindly trust them. If you use Codex, Claude Code, ChatGPT, or Gemini for real work, the lesson is the same: the loop matters as much as the model.

Watch and/or Listen now: YouTube | Spotify | Apple Podcasts

P.S. Jump to (40:31) if you only have a few minutes. The answer is basically: before you build the agent, build the scoreboard.

Keep scrolling for the practical agent-building lesson from Tax AI, tomorrow's live discussion on AI bans and model access, and four recent videos worth catching up on.

Real quick: Want to see your AI-adjacent product or service show up right here, below these podcast promos? Click the button below to advertise to our 700K+ readers!

How Tax AI Actually Gets Better

The lesson from Tax AI is not "replace the accountant." The lesson is "make expert review useful to the system."

Most failed agent projects break in the same place: the model produces an answer, the human fixes the answer, and the correction disappears into the void. Tax AI turns that correction into product fuel.

The loop looks like this:

Start with a bounded task: Tax AI focuses on preparing and mapping return data, not every possible tax judgment.
Preserve evidence: The practitioner can see where a value came from before accepting it.
Capture corrections: When experts override the same field or document type, the system treats that as a pattern.
Turn patterns into evals: The team can test whether a fix improves the edge case without regressing older behavior.
Keep engineers in the loop: Scoped fixes still get reviewed before they ship.

That is why harness engineering matters. The model is one part. The workflow, traces, review interface, evals, and deployment discipline are what make it usable in a job where "close enough" can get expensive.

🔑 The bottom line: agents get safer when their mistakes are visible, measurable, and useful. The future of expert AI may look less like a chatbot taking over and more like a review system that learns every time the expert says, "Nope, this field goes over here."

Learn more: Tax AI case study | Codex | Harness engineering | Thrive Holdings

🔴 LIVE TOMORROW: The Fallout From AI Bans

Click the image above to go to YouTube, then click “notify me” to get notified when the stream starts.

Thursday, July 2. Watch on The Neuron YouTube channel.

Tomorrow, Grant and Corey are going live to unpack the AI stop-start era: Anthropic's Fable 5 relaunch and instant takedown, OpenAI's GPT-5.6 limited rollout, and the bigger question underneath both stories.

What happens when governments can slow, restrict, or pause frontier AI systems right as companies, developers, and the economy start depending on them?

We will talk through how access restrictions could reshape the U.S. vs. China AI race, why every company needs an open-source backup strategy, and how AI uncertainty could ripple into hiring, budgets, product roadmaps, and the broader economy.

No financial advice, no panic theater. Just the strategic read on what is changing, what is fragile, and how to make smarter decisions while model access gets rocky.

Click to watch live: The Neuron on YouTube

🎙️ In Case You Missed It…

Four recent videos worth checking out next:

1. Want IT that fixes problems before employees complain? Watch: HP Built an AI That Fixes Your Computer Before It Breaks

TL;DW: Larry Meadows from HP shows how Workforce Experience Platform uses AI to predict device problems, recommend fixes, and help IT teams avoid unnecessary refreshes.

Why you should watch: If your company is trying to cut IT costs while adding more AI tools, this episode explains why memory, device health, and employee experience are now the same conversation.

YouTube: Watch Here
Spotify: Listen Here
Apple Podcasts: Listen Here

This is the practical enterprise AI episode: fewer dashboards, fewer mystery crashes, and fewer expensive guesses.

2. Worried AI still cannot really see? Watch: AI Still Sees Like a Toddler

TL;DW: Andrew Dai, co-founder and CEO of Elorian, explains why today's AI can describe images but still struggles with visual reasoning, from diagrams and tangled cords to floor plans and robots.

Why you should watch: If text agents feel powerful but visual agents still feel strangely brittle, this is the missing explanation. Better AI vision could change engineering, robotics, satellite analysis, and product design.

YouTube: Watch Here
Spotify: Listen Here
Apple Podcasts: Listen Here

It is a clean reset on why "multimodal" does not automatically mean "understands the world."

3. Not sure whether to use Skills, Projects, GPTs, or Agents? Watch: AI Skills vs Agents vs GPTs

TL;DW: Grant and Corey break down the confusing assistant stack: Projects for ongoing work, Custom GPTs and Gems for reusable assistants, Skills for repeatable workflows, and Agents for systems that can take actions.

Why you should watch: If the product names are starting to blur together, this gives you a simple decision tree. The most useful rule: if you do something more than twice, make it a Skill.

YouTube: Watch Here
Companion guide: Read Here

This one is especially good for forwarding to the person who keeps asking, "Wait, is this an agent or a GPT?"

4. Want to turn the messy spreadsheet into software? Watch: We Turned a Spreadsheet Into a Business App

TL;DW: Corey and Grant test Pave by QuickBase by turning a messy spreadsheet into a lightweight CRM and project tracker.

Why you should watch: This is a useful benchmark for AI app builders: can the tool understand messy starting data, build tables, add dashboards, support roles, and publish something usable without turning it into a full engineering project?

YouTube: Watch Here

Spotify and Apple links were not included for this one, so we kept it as a YouTube-only video.

One more before you go:

If this Tax AI episode clicked for you, subscribe to The Neuron on YouTube. We are trying to do more conversations with the builders, researchers, and operators turning AI from demos into actual workflows that we think you’ll love.

We have a goal to hit 50K subscribers by the end of the year (if not 100K), and we are less than 30K away. If you like learning about AI and already watch some of our videos, do us a favor and click here to subscribe today.

Stay curious,

The Neuron Team

That is all for today. For more AI treats, check out our website.

What'd you think of this podcast episode?

Pick an answer below, then tell us why with the "additional feedback" option.

P.P.S: Love the newsletter, but do not want to receive these podcast announcement emails? Do not unsubscribe, adjust your preferences to opt out of them here instead.