The Neuron
Posts
😺Claude spills its own secret

😺Claude spills its own secret

PLUS: A Chinese robot that can BMX down mountains...

Grant Harvey
November 13, 2024

Welcome, humans.

Heads up! We need more submissions for #Where Do You Neuron (if you want to keep this part of the newsletter alive, that is!). Here’s the rules to get featured:

Share your monitors / phones with The Neuron on them in a unique location.
If you include your face, we’ll protect your identity.
Cats = heavily encouraged.
Dogs = case by case basis.

Sound fun? Submit here for a chance to be featured!

Now, here’s a robot that LITERALLY can BMX bike down mountainsides:

Here’s what you need to know about AI today:

We break down how Claude’s AI guardrails work.
Apple plans AI-powered home display for March launch.
Microsoft shared 200+ real life business AI case studies.
Slack survey: 17K workers report cooling AI adoption.

Claude's brain just showed us how it really works…

Ever say hello to a friend, and then all of a sudden, they start lecturing you about copyright law? No? Well, that's exactly what happened this week when a Reddit user greeted the chatbot Claude:

Instead of just saying “YO!” back, like a normal person, Claude went on a hallucinatory rant. This wasn't a bug, though—it was a sophisticated safety system accidentally showing its cards.

Here's what's happening under the hood:

When you type a message to Claude, it doesn't read your text like humans do—it breaks your message into “tokens” (smaller pieces of text) and analyzes them for patterns.
Sometimes, Claude spots patterns that could lead to trouble—even if they seem totally innocent to us.
These events trigger a pre-filter alert system that checks messages before they even reach Claude's main language model.

That’s because there are actually two types of hidden prompts working together:

System prompts: Always present, these guide Claude's overall behavior.
Injections: Added only when the safety system spots potential issues.

…so when the Redditor said “Yo!”, they triggered an injection flagging potential copyright infringement. Does that mean someone owns the rights to “YO!”?? Jealous.

Savvy Redditors investigated this behavior, and they found Claude was actually willing to discuss its own limitations. One user got Claude to analyze its own responses by printing out their conversation history, revealing the hidden prompts that get injected before every response. They even got Claude to critique its own prompt.

Why should YOU care? Because these safety features sometimes conflict with legitimate tasks. Redditors said they’ve gotten increased errors when using Claude for proofreading, and the system can be especially tricky when you need it to make small changes to text (since that might look like trying to dodge copyright).

Here’s some pro tips for when your AI is “getting triggered”:

Break a complex task into smaller pieces.
Use specific command patterns to make responses more consistent.
Switch languages to avoid certain English triggers.
Ask Claude to show you what's causing weird responses (like above).

For more, here’s another thread that explains all about Claude’s inner workings.

FROM OUR PARTNERS

Want to Know Where AI-Driven Business Intelligence Is Heading?

Join Zebra AI's free event The New Era of Business Intelligence: AI-Powered Business Decision-Making, on November 27th, 2024, to explore how AI is revolutionizing Business Intelligence.

Their expert panel features:

Nicholas Boucher, AI Finance Club founder, former advisor at Mercedes-Benz, Chanel, and KMPG.
Andrej Lapajne, Zebra BI CEO.
Benjamin Džubur, Zebra AI Team Lead.

At the event, you’ll uncover the truth about the current state of Artificial Intelligence in BI:

How AI can really help your data analysis efforts.
The current limitations of AI in Business Intelligence.
Which AI Business Intelligence tools you need.

Learn to truly empower your teams by making data analysis accessible to everyone, regardless of experience. Network with peers and explore real-world AI-driven BI implementation success stories.

Treats To Try.

*OpenAI, Character.ai, and Anthropic build with Statsig to consolidate performance, latency, and cost metrics in one platform. Say goodbye to scattered data and guessing what works and hello to smarter tracking. Try the same tool trusted by the biggest names in AI - get started today with 2M events free.
Lamatic builds and deploys custom apps using a visual builder, without you writing code, handling everything from the database to hosting.
Check out Google’s Machine Learning Crash Course, a free course to learn AI with interactive exercises and video lectures in ~15 hours.
Writer creates and customize enterprise content using AI models that understand your company's data and workflows (just raised $200M).
Particle helps you better understand news stories through summaries, explanations, and interactive Q&As—and supposedly drives traffic back to publishers, as some accuse Perplexity of NOT doing (raised $15.3M).
Segwise monitors your gaming app campaigns and alerts you when performance metrics drop below expected levels.
Agree lets you send, sign, and collect payments for any business agreement.

See our top 51 AI Tools for Business here!

*This is sponsored content. Advertise in The Neuron here.

Around the Horn.

Slack released a new survey of 17,000+ office workers—workplace AI adoption is plateauing and enthusiasm is cooling (dropping 6% globally) due to workers hiding their AI use from managers, fears of increased workload, and lack of training (61% have spent > 5 hours learning AI tools).
Microsoft released 210 use-cases of organizations “large and small” and how they use AI in their businesses today.
Apple developed a 6-inch wall-mounted smart display that functions as a home control center with FaceTime, Siri integration, and AI features (w/ plans for future versions that feature a robotic arm) that is set to debut next March.
Want to see what bias in AI output looks like in real life? Check out this data dump of what happens when you ask open-source AI to “imagine a person.”