😸 AI's recency bias problem

PLUS: There's a new top math AI (again)...

In partnership with

Welcome, humans.

It’s mind-blowing how rapidly AI has evolved in just a few short years. Check out the huge difference in Midjourney's quality from March 2022 to today:

We've reached a point where creators are generating full-on sitcom episodes with AI. Sure, the main character is mysteriously recast in almost every scene, but hey, it’s happening!

Some Reddit visionaries have peered into their crystal balls to predict where things go from here. Let's just say their forecast is a vivid vision of the future. And if they’re correct, we’re in for a “wild ride”…

Here’s what you need to know about AI today:

  • AI summarization has a "lost in the middle" problem.

  • Alibaba's new Qwen2-Math model outperformed other top models.

  • YouTube announced it will test new AI tools for video brainstorming.

  • HuggingFace acquired a new platform to make AI training easier.

AI researchers found why AI can’t summarize well… and we found out how to improve it.

Have you ever tried to get ChatGPT or Claude to summarize something, and it totally missed the mark?

It’s because AI has this "lost in the middle" problem. In other words, AI models seem to focus on the start and end of a document / prompt, while ignoring content in the middle.

It's kind of like recency bias in humans—we remember the last thing we heard better than other information, even if it's not the most important.

A recent paper looked into how well big AI models can process and remember large buckets of text.

The results showed that even AI models designed to remember a lot of information struggle.

It turns out, claims that certain models can process “a million tokens” (i.e. 750K words) in context aren't as accurate as they claim; sometimes, these models can't even link ideas together beyond a few thousand tokens.

One observer discovered this themselves when summarizing a 50-page paper on Dutch pension funds. I mean, can you blame them? PENSION FUNDS?! How boring is that?! 

Oh, and multimodal models (text, image, and audio) have the same issue.

After comparing a summary that ChatGPT wrote to the actual summary in the paper itself (and running some of their own tests), they concluded that ChatGPT doesn't truly summarize text—it just shortens it.

This means, if you use ChatGPT or Claude to summarize text: 

  • You could (and often will) miss crucial information.

  • The AI could introduce errors or contradictions to the original text.

Old post, relevant meme.

So you have 4 hours to summarize a 50-page doc—what do you do?

Here’s our tips for better results when “shortening” with AI: 

  1. Break up the text into smaller chunks, and summarize each chunk across separate prompts.

  2. Use specific prompts asking for “all concrete facts, figures, and insights” related to the topic you need, “in a bullet point list” (this is helpful for verifying). 

  3. Ask follow-up questions, like "Did you leave anything out?" or "Is all of this information accurate?"

And don’t forget: always verify key points in the original source to check for hallucinations.

FROM OUR PARTNERS

The fastest way to build AI apps

  • Writer Framework: build Python apps with drag-and-drop UI

  • API and SDKs to integrate into your codebase

  • Intuitive no-code tools for business users

Around the Horn.

  • Alibaba released an open-source model called Qwen2-Math, which has surpassed Claude 3.5 Sonnet, ChatGPT-4o and Gemini’s Math model in some benchmarks.

  • YouTubers will soon be able to “brainstorm with Gemini”, alongside other AI tools coming to the platform, like an AI music generator, deepfake flagger, and celebrity chatbots. 

  • Rabbit r1, an AI assistant gadget, launched a new “beta rabbit” mode to improve on the device’s conversational skills. 

  • Huggingface, the open-source AI platform, acquired XetHub, a platform that helps AI teams work more efficiently with large datasets and models. 

Treats To Try.

Feeling great helps you transform negative emotions into positive ones with cognitive therapy techniques.

  1. *Frank AI lets you use multiple AI models in one iOS and web app (GPT-4 Turbo + more). Supports 130+ languages, 140,000+ global users. Grab unlimited access here.

  2. NVIDIA has a new “interactive digital human” named James who you can talk to about the company’s products (freaky realistic!!). 

  3. Zoom will now let you generate custom AI virtual backgrounds for meetings using simple text prompts.

  4. Inkeep creates a knowledge base from your content, providing your users with instant answers to their questions.

  5. Silvia lets you dictate multilingual messages across chat apps, adapting to your speech patterns.

  6. Salesify coaches you on sales, offering real-time insights and follow-ups to improve your deal conversion rates.

  7. Text2Infographic generates infographics from text prompts or blog posts.

  8. Bardeen automates repetitive work tasks using simple language commands and turns them into single-click actions across various apps and websites (raised $22M). 

*This is sponsored content. Advertise in The Neuron here.

Intelligent Insights.

  • A new safety analysis from OpenAI warned that ChatGPT users could get emotionally attached to GPT 4o’s new human-like voice (link).

  • Check out this deep dive (17 min video) on advanced prompting techniques with a real-world demo (link).

  • Check out this piece in the WSJ on the rise of the billion-dollar AI company acqui-hire (link).

  • This rant explains why human reviewers might be messing up AI’s reinforcement learning algorithms (link). 

  • Great long read that dissects the AI boom through the lens of the dotcom bubble from the early 2000s (link).

  • Good read from MIT about why the biggest risk from AI might be “addictive intelligence” (link).

A Cat's Commentary.

That’s all for today, for more AI treats, check out our website.

The best way to support us is by checking out our sponsors—today’s are Writer and Frank AI

See you cool cats on Twitter: @nonmayorpete & @noahedelman02

What'd you think of today's email?

Login or Subscribe to participate in polls.