- The Neuron
- Posts
- šø12 mind-blowing o1 demos
šø12 mind-blowing o1 demos
PLUS: Why language models will "always" hallucinate
Welcome, humans.
Exciting news for anyone wanting to get your AI product or service (or AI-adjacent product / service) in front of 450,000+ daily Neuron readers: We have sponsorship slots available in Q4!
The worldās leading AI companies advertise with The Neuron (including todayās: Adobe) because of the trust weāve built with our audience.
Just fill out this form, fill in a few deets, and weāll reach out to follow up.
Hereās what you need to know about AI today:
We share 12 wild GPT o1 demos and 4 prompts that beat Claude 3.5.
Microsoft created a new playground to test AI agents on Windows OS.
Alibaba will release its new Qwen-2.5 language model this Thursday.
World Labs raised $230M to build a 3D AI model of the world.
12 Wild ChatGPT-o1 demos (and some cool prompts to try, too).
Anyone else spend all weekend messing around with GPT-o1? Nerds! Weāre guilty tooā¦
So many people hit the rate limits by Friday, OpenAI had to reset them to keep up the momentum.
Here are some WILD o1 demos we discovered:
Generated an animated solar system (in 5 prompts).
Wrote a poem with incredibly strict rules.
Solved complex crossword clues and intelligence tests.
Created fractal art using Javascript and Adobe Firefly.
Completed white-collar tasks, such as estimating how many Chinese people have an annual disposable income over 100K Yuan.
Achieved first place on the Norway Mensa IQ test (out of all the other models tested). And if youāre wondering, it earned an IQ of ~120.
Itās also very good at making games, likeā¦
This pocket-tanks style game.
This asteroids game.
This very basic Galaga clone.
This flappy birds knock off.
Whatever this game is.
This ā2048ā game with a playable demoābe warned, very addicting!!
Hereās a list of prompts people use on a daily basis (where o1 specifically outperformed Claude Sonnet 3.5):
Reusable template for building mac applications (example in GPT, example in Perplexity).
Identify security flaws in your code (prompt).
Explaining a certain topic with examples (prompt).
Analyze historical stock variances between the S&P 500 and Berkshire Hathaway (prompt).
The expert opinion: While o1 can do amazing stuff, its performance improvement from 4o is being compared to the difference between a ācompletely incompetentā grad student, and a āmediocre, but not completely incompetentā grad student.
Ethan Mollick says for us to really know how useful o1 is versus other chatbots, it will take lots of analysis from experts in areas that require deep expertise. For example, hereās an expert analysis comparing o1 to 4o for medical admin work.
Two things you shouldnāt do with o1:
Get help solving the NYT Connections (unlike previously reported).
Ask it for the meaning of life.
In all seriousness, o1 is not for most things. o1 is a model you use for complicated stuff. For simpler things, a simpler model will sufficeāo1 will just overthink it.
FROM OUR PARTNERS
Adobe Firefly Video Model is your new AI sidekick for video editing awesomeness.
Adobe just offered an awesome sneak peek at its upcoming Adobe Firefly Video Model!
Available later this year, Adobeās new Firefly model will help editors:
Ideate and explore their creative vision.
Fill gaps in their timeline.
Add new elements to existing footage.
It includes Text to Video AND Image to Video capabilities, so you can use reference images to generate B-Roll.
There are other cool tools too, like Generative Extend, which lets you:
Cover gaps in footage.
Smooth out transitions.
Hold on shots longer (for that perfectly timed edit).
And just like other Firefly generative AI models, Adobe Firefly Video will be commercially safeāitās only trained on content they have permission to use (a.k.a NOT user data).
Around the Horn.
Podcast-style recap of why language models will āalwaysā hallucinate, and will never be able to guarantee 100% accuracy (full paper here).
A new version of Alibabaās Qwen large language models, Qwen 2.5 will be released this Thursday, September 19th.
The spatial intelligence company from Fei-Fei Li (the āgodmother of AIā), World Labs, is building a ālarge world modelā to perceive, generate, and interact with the 3D world (and they raised $230M).
Microsoft created a new testing ground, called the Windows Agent Arena, to test AI agents in realistic Windows OS environmentsācheck it out here.
Apple will use a new framework called UI-JEPA to understand user intentions and process AI on device without requiring a ton of computation.
Treats To Try.
*Dell's Precision AI-ready workstations powered by NVIDIA RTXā¢ GPUs are the most efficient way to handle demanding AI workloads without costing a fortune. Check them out for yourself here.
Google Illuminate turns any content into AI-generated audio discussions (in a podcast format, waitlist only right now).
DepthCrafter captures video depth for ālong open-world videosā (code here).
Mneme AI is a personal assistant to chat with your stored notes, documents, and books on your phone.
AIPhone is a cross-language calling app, that translates calls in audio and text across 91 languages and dialects.
Cavela connects brands with global manufacturers to streamline product sourcing and production.
AFFiNE AI helps you write, draw, and create presentations.
*This is sponsored content. Advertise in The Neuron here.
Sunday Special
Weāre testing out a new section called the Sunday Special. Itāll be a rotating theme, featuring whatever cool stuff we found during the week, but didnāt fit in the usual Monday-Friday format.
Today, weāre featuring more intelligent insightsāthere were a ton of timely insights we found this week that were worth highlighting, but couldnāt all fit in Fridayās letter.
Great insight from Andrej Karpathy on why ālarge language modelā isnāt the right name for LLMs.
Gartnerās chief of research for AI thinks weāre in the ābrute-forceā era of AI, and once it ends, generative AI will be needed for only 5% of use-cases. Instead, he thinks the future is ācomposite AIā that uses genAI along with machine learning, knowledge graphs, or rule-based systems.
o1ās performance has caused many people to make the case that general practice doctors are the most at risk of being replaced by AI.
Great thread with theories on why, if language models are so useful, why havenāt we seen any spike in productivity?
Good insight into why AIās best use-case is improving the products we already own.
Read this amazing breakdown of how o1 performs on the ArcPrize, a competition to test for AGI.
A Cat's Commentary.
Thatās all for today, for more AI treats, check out our website. See you cool cats on Twitter: @nonmayorpete & @noahedelman02 |
|