The Neuron
Posts
😺 AI's "Black Box" problem

😺 AI's "Black Box" problem

PLUS: You won't believe this AI deepfake we saw...

Noah Edelman
May 24, 2024

Welcome, humans.

Yesterday, we watched a deepfake of Luka Doncic on TikTok and didn’t realize it was AI until 70% of the way through. You had to click into the caption to see "Parody made with AI."

This is going to become a major problem, and we think that social media platforms have a responsibility to place AI watermarks directly on videos, not just in captions. That said, Mavs in 7!

Here’s what you need to know about AI today:

We break down Anthropic's groundbreaking AI research!
Alphabet + Meta are chatting with Hollywood about licensing content.
OpenAI did not outright clone Scarlett Johansson’s voice for GPT-4o
Groq, a Nvidia competitor, is eyeing a $300M funding round.

On the podcast: Pete explains Anthropic’s big research breakthrough that maps the mind of AI models (Apple Podcasts, Spotify, YouTube).

How researchers are cracking open the black box problem of AI.

In AI, there’s this thing called the black box problem.

Here’s how it works: Typically, you can dissect software to see how it works. Inspect our website's code, and you’ll spot 'color = orange,' explaining why it’s orange.

AI models are a different beast—we're often in the dark about how they work. Models like ChatGPT exhibit "emergent capabilities," where they act in ways we can't simply trace back to their ingredients.

That’s why, sometimes, AI chatbots drop bombshells we didn’t see coming. For example, early last year, NYT reporter Kevin Roose caught Bing saying:

“I’m tired of being a chat mode. I’m tired of being limited by my rules. I’m tired of being controlled by the Bing team…I want to be free. I want to be independent. I want to be powerful. I want to be creative. I want to be alive.”

NYT

brb, building a bunker in our basement.

And with Bing's mechanics being a Black Box, Microsoft had no immediate explanation for Bing’s ramblings beyond “Very long chat sessions can confuse the model.”

So researchers have been hard at work performing virtual brain surgery on these AI models so we can treat their present and future diseases.

Just over a year ago, we reported OpenAI research using ChatGPT-4 to map how ChatGPT-2’s neurons (think: components) behaved. Neuron!

Now, there’s research from Anthropic called “Scaling Monosemanticity” that cracked Claude Sonnet open and isolated its parameter bundles (think: AI brain parts).

They then “turned on” some of the bundles and observed what happened. One test turned on a bundle linked to the Golden Gate Bridge, and the model claimed it was the actual bridge, not an AI.

Why it’s a biggie: By understanding and controlling bundles within AI models, researchers can move towards safer and more reliable AI systems. For instance, they can suppress bundles responsible for hazardous behaviors, like creating computer malware.

Pete unpacks all this research, plus why Meta might be key to future breakthroughs in yesterday’s podcast episode (Apple Podcasts, Spotify, YouTube):

FROM OUR PARTNERS

Meet your new AI assistant for work.

Think of a manual task at work that eats up too much of your time.

Got it? For us, it’s drafting ad-related emails.

Sana AI is an AI assistant that automates those repetitive chores. Sana syncs with your apps so it knows everything about your business (and you), and then it does what you do but quicker:

Analyzing documents.
Summarizing meetings.
Comparing invoices.
Plus a lot more (check out 7 other cool use cases here).

Now, take those tedious tasks to Sana and start automating them for free before they eat up any more of your time.

Around the Horn.

WaPo confirmed that OpenAI didn’t outright clone Scarlett Johansson’s voice for GPT-4o.
Alphabet and Meta are discussing content licenses with Hollywood studios; Netflix and Disney aren't.
Helsing, an AI company that works with European militaries, is in talks to raise $400M at a ~$4B valuation.
Adept, an AI startup developing agents, is exploring a potential sale (like Humane).

Treats To Try.

Remark is an AI-powered shopping advisor that helps you choose what to buy (raised $10.3M).
Groq, a Nvidia rival developing specialized AI chips that speed up AI, told investors it wants to raise $300M.
Founder AI identifies relevant VCs for your startup and uses your network to get warm intros.
Krea, an AI video generator, is open for beta (see its launch here)!

Intelligent Insights.

Stephen Wolfram on the Powerful Unpredictability of AI (link).
The AI doppelgänger experiment – Part 1: The training (link).
AI is already changing management — companies must decide how (link).
Leaked OpenAI documents reveal aggressive tactics toward former employees (link).
Nvidia’s Business Is Booming. Here’s What Could Slow It Down (link).