- The Neuron
- Posts
- šŗ AI's data dilemma
šŗ AI's data dilemma
PLUS: AI beauty pageant? Seriously???
Welcome, humans.
The AI builder community is filled with some of the brightest minds weāve met. These people literally building the futureāself-driving cars, computer vision, talking robots. Oh, and also Miss AI beauty pageants. We wish we were kidding.
NPR first covered the "Miss AI" contest, a beauty pageant among AI models, with an article headline that couldāve easily been mistaken for The Onion: āFake beauty queens charm judges at the Miss AI pageant.ā
If the robots do end up becoming the dominant species on Earth, wouldnāt we deserve it just a tiny bit? A teensy tiny bit?
Hereās what you need to know about AI today:
AI companies might be running out of training data for their AI models.
4 ways companies are improving their models beyond public text data.
AI chatbots are getting caught spewing incorrect election info.
Thereās a new Chinese competitor to Sora called Kling.
On the podcast: US Government Cranks Up the Heat, FTC vs. Big Tech, Microsoftās Inflection Deal (Apple Podcasts, Spotify, YouTube).
Will AI models plateau once they exhaust their training data?
Thereās one school of thought that AI models like ChatGPT improve exponentiallyāthey keep getting better and better, faster and faster.
Thereās another school of thought that says, actually, AI models arenāt getting exponentially better. In fact, they might be plateauing.
For instance, GPT-4o is marginally more intelligent than GPT-4, but GPT-4 is significantly more intelligent than GPT-3.5.
Part of the logic in school #2 is that these models are running out of high-quality training data, which has historically been one way to enhance chatbot IQ. Feed them more data ā they get smarter. Wash, rinse, repeat.
However, the global reservoir of public human text data isnāt infinite, and models like ChatGPT might have nearly drained it. A new study predicts that AI companies will āexhaust the available stock of public human text dataā between 2026 and 2032.
This could be a āserious bottleneck,ā warns one of the paperās authors.
On the other hand: Models donāt improve just by eating more public human training data. Simply put, you don't advance from GPT-4o to AGI by feeding it a few extra Wikipedia pages.
For instance, even though Llama 3 70B was trained with twice the words of GPT-4, it performs noticeably worse.
So what are the alternative methods researchers are turning to to build the next generation of AI models?
Here are a few:
Private, not public datasets. Think proprietary datasets, like Redditās data archives, which Google is already paying $60M/year for.
Using non-text data. Think video data. Think YouTube video data. Think a million hours of YouTube video data (cough cough OpenAI).
Food for thought: GPT-4o may only slightly edge out GPT-4 in IQ, but it has noticeably stronger vision and voice know-how.
Synthetic dataātraining AI with data generated by AI. There are pros and cons to this, but all the top AI firms are dabbling with synthetic data as we speak, like Zuck.
Making each data point smarter. Basically squeezing out more intelligence from the same amount of data.
FROM OUR PARTNERS
Jurnyās AI multi-agents are disrupting hospitality, and you can invest!
Six months back, we gave you a sneak peek at Jurny, an AI startup revolutionizing hospitality.
They help property managers like Airbnb to Booking.com automate everything from reservations to pricing, a $1 trillion inefficiency.
Likeā¦everything. Their new AI agents can provide informed, precise questions anytime a property manager or guest has a question or issue.
After 5x customer growth and processing $35M+ in bookings, Jurny is giving The Neuron readers an opportunity to invest.
And if youāre bullish, invest as little as $499 in Jurny on StartEngine. The round is closing soon!
Around the Horn.
Treats To Try.
*Brilliant offers bite-sized AI lessons so you stay competitive at work. Join 10M people around the world and start your 30-day free trial today.
Harvey is an AI platform for lawyers (itās preferred by lawyers 97% of the time over GPT-4). Theyāre targeting a $2B valuation.
Interviews provides real-time suggestions during job interviews.
NotebookLM is an AI notetaking tool gaining popularity for extracting information from lengthy PDFs.
Ultravox is a speech-to-speech AI model that recognizes non-textual speech elements.
*This is sponsored content. Advertise in The Neuron here.
Monday Meme.
A Cat's Commentary.
Thatās all for today, for more AI treats, check out our website. Get your brand in front of 450,000+ professionals here. See you cool cats on Twitter: @nonmayorpete & @noahedelman02 |
|