šŸ˜ø12 mind-blowing o1 demos

PLUS: Why language models will "always" hallucinate

Welcome, humans.

Exciting news for anyone wanting to get your AI product or service (or AI-adjacent product / service) in front of 450,000+ daily Neuron readers: We have sponsorship slots available in Q4!

The worldā€™s leading AI companies advertise with The Neuron (including todayā€™s: Adobe) because of the trust weā€™ve built with our audience.

Just fill out this form, fill in a few deets, and weā€™ll reach out to follow up.

Hereā€™s what you need to know about AI today:

  • We share 12 wild GPT o1 demos and 4 prompts that beat Claude 3.5.

  • Microsoft created a new playground to test AI agents on Windows OS.

  • Alibaba will release its new Qwen-2.5 language model this Thursday.

  • World Labs raised $230M to build a 3D AI model of the world.

12 Wild ChatGPT-o1 demos (and some cool prompts to try, too).

Anyone else spend all weekend messing around with GPT-o1? Nerds! Weā€™re guilty tooā€¦

So many people hit the rate limits by Friday, OpenAI had to reset them to keep up the momentum.

Here are some WILD o1 demos we discovered:

  1. Generated an animated solar system (in 5 prompts).

  2. Wrote a poem with incredibly strict rules.

  3. Solved complex crossword clues and intelligence tests.

  4. Created fractal art using Javascript and Adobe Firefly.

  5. Completed white-collar tasks, such as estimating how many Chinese people have an annual disposable income over 100K Yuan.

  6. Achieved first place on the Norway Mensa IQ test (out of all the other models tested). And if youā€™re wondering, it earned an IQ of ~120.

Itā€™s also very good at making games, likeā€¦

Hereā€™s a list of prompts people use on a daily basis (where o1 specifically outperformed Claude Sonnet 3.5):

  • Reusable template for building mac applications (example in GPT, example in Perplexity).

  • Identify security flaws in your code (prompt).

  • Explaining a certain topic with examples (prompt).

  • Analyze historical stock variances between the S&P 500 and Berkshire Hathaway (prompt).

The expert opinion: While o1 can do amazing stuff, its performance improvement from 4o is being compared to the difference between a ā€œcompletely incompetentā€ grad student, and a ā€œmediocre, but not completely incompetentā€ grad student.

Ethan Mollick says for us to really know how useful o1 is versus other chatbots, it will take lots of analysis from experts in areas that require deep expertise. For example, hereā€™s an expert analysis comparing o1 to 4o for medical admin work.

Two things you shouldnā€™t do with o1: 

In all seriousness, o1 is not for most things. o1 is a model you use for complicated stuff. For simpler things, a simpler model will sufficeā€”o1 will just overthink it.

FROM OUR PARTNERS

Adobe Firefly Video Model is your new AI sidekick for video editing awesomeness.

Adobe just offered an awesome sneak peek at its upcoming Adobe Firefly Video Model!

Available later this year, Adobeā€™s new Firefly model will help editors: 

  • Ideate and explore their creative vision. 

  • Fill gaps in their timeline.

  • Add new elements to existing footage. 

It includes Text to Video AND Image to Video capabilities, so you can use reference images to generate B-Roll. 

There are other cool tools too, like Generative Extend, which lets you: 

  • Cover gaps in footage. 

  • Smooth out transitions. 

  • Hold on shots longer (for that perfectly timed edit). 

And just like other Firefly generative AI models, Adobe Firefly Video will be commercially safeā€”itā€™s only trained on content they have permission to use (a.k.a NOT user data). 

Around the Horn.

Podcast-style recap of why language models will ā€œalwaysā€ hallucinate, and will never be able to guarantee 100% accuracy (full paper here).

  • A new version of Alibabaā€™s Qwen large language models, Qwen 2.5 will be released this Thursday, September 19th.

  • The spatial intelligence company from Fei-Fei Li (the ā€œgodmother of AIā€), World Labs, is building a ā€œlarge world modelā€ to perceive, generate, and interact with the 3D world (and they raised $230M).

  • Microsoft created a new testing ground, called the Windows Agent Arena, to test AI agents in realistic Windows OS environmentsā€”check it out here.

  • Apple will use a new framework called UI-JEPA to understand user intentions and process AI on device without requiring a ton of computation.

Treats To Try.

  1. *Dell's Precision AI-ready workstations powered by NVIDIA RTXā„¢ GPUs are the most efficient way to handle demanding AI workloads without costing a fortune. Check them out for yourself here.

  2. Google Illuminate turns any content into AI-generated audio discussions (in a podcast format, waitlist only right now).

  3. DepthCrafter captures video depth for ā€œlong open-world videosā€ (code here).

  4. Mneme AI is a personal assistant to chat with your stored notes, documents, and books on your phone.

  5. AIPhone is a cross-language calling app, that translates calls in audio and text across 91 languages and dialects.

  6. Cavela connects brands with global manufacturers to streamline product sourcing and production.

  7. AFFiNE AI helps you write, draw, and create presentations.

*This is sponsored content. Advertise in The Neuron here.

Sunday Special

Weā€™re testing out a new section called the Sunday Special. Itā€™ll be a rotating theme, featuring whatever cool stuff we found during the week, but didnā€™t fit in the usual Monday-Friday format.

Today, weā€™re featuring more intelligent insightsā€”there were a ton of timely insights we found this week that were worth highlighting, but couldnā€™t all fit in Fridayā€™s letter.

  1. Great insight from Andrej Karpathy on why ā€œlarge language modelā€ isnā€™t the right name for LLMs.

  2. Gartnerā€™s chief of research for AI thinks weā€™re in the ā€œbrute-forceā€ era of AI, and once it ends, generative AI will be needed for only 5% of use-cases. Instead, he thinks the future is ā€œcomposite AIā€ that uses genAI along with machine learning, knowledge graphs, or rule-based systems.

  3. o1ā€™s performance has caused many people to make the case that general practice doctors are the most at risk of being replaced by AI.

  4. Great thread with theories on why, if language models are so useful, why havenā€™t we seen any spike in productivity?

  5. Good insight into why AIā€™s best use-case is improving the products we already own.

  6. Read this amazing breakdown of how o1 performs on the ArcPrize, a competition to test for AGI.

A Cat's Commentary.

Thatā€™s all for today, for more AI treats, check out our website.

The best way to support us is by checking out our sponsorsā€”todayā€™s are Adobe and Dell.

See you cool cats on Twitter: @nonmayorpete & @noahedelman02

What'd you think of today's email?

Login or Subscribe to participate in polls.