- The Neuron
- Posts
- šŗ Gemini's new AI, tested
šŗ Gemini's new AI, tested
PLUS: ChatGPT's a better doc than 90% of doctors...
Welcome, humans.
If you want to advertise your product or service in front of the BEST readers in AI (thatās yāall, Neuron readers!), fill out our this partnership form ASAP.
With 475,000+ readers, a 42% open rate, and a new standalone secondary placement, itās a great time to advertise in The Neuronā¦ and spots for Q1 are filling up fast!
The partnership form takes less than 60 seconds to fill out, so youāre < 1 minute away from launching a killer ad campaign in 2025!
Hereās what you need to know about AI today:
We tried Geminiās new (experimental) top model.
Anthropic has a new prompt fixer.
Researchers found smaller models may beat shrunken big ones.
ChatGPT beat doctors at diagnosing patients.
We tried Geminiās new top model so you donāt have toā¦ (unless you want!)
The AI world has a new champion (for now): Gemini-Exp-1114 just dethroned GPT-4o on multiple leaderboards, marking a win for Google (however fleeting it might be).
If youāre wondering why we sound so ominous about Googleās time to shine, keep reading.
According to the latest Chatbot Arena stats, Gemini-Exp-1114 is crushing it in areas like math, coding, and creative writing. It's currently ranked #1 in:
Hard prompts.
Math problems.
Creative writing.
Instruction following.
Multi-turn conversations.
It currently ranks #3 in coding tasks (behind o1-preview and o1-mini), though.
Early feedback is in:
Exp 1114 is good at reasoning, and gives impressively accurate responses.
Itās def smart, but thatās not always a good thing. Example? It gaslights you.
Hereās a demo asking Gemini to āCreate a viral threadā (still waitingā¦).
Some people even say 1114 could be considered āGemini 2ā If youāre curious, you can try it in the AI Studio here (fun fact: it looks like AI Studio is getting a revamp soon!). Exp 1114 is also available via API, so developers can work with it inside other apps.
We tried it, and hereās what we think:
Exp 1114 is very capable, and it explains itself well. When we asked it for specific tasks, like coding a particular Chrome extension or analyzing a script Claude wrote based on the Sam Altman vs Elon Musk, it easily handled them.
It doesnāt have the same magic as chatting w/ Claude, but thereās no doubt this will integrate well inside Googleās other applications.
Hereās the thing, though: This is a big moment for Google, but timing is everything. Thereās a new ChatGPT-4o version already in testing (because of course OpenAI couldn't let Google have its moment), and itās looking like new 4oāll retake the #1 spot.
Plus, thereās the recent controversy over Gemini's safety issuesā¦ and some ppl have pointed out Exp 1114ās scores are lower on Livebench (another industry benchmark).
Geminiās success from here will all depend on how 1114 (and any future models) get integrated into Googleās other applicationsāand whether or not these capabilities significantly improves the experience of using Workspaces or Search or Chrome.
FROM OUR PARTNERS
Hereās why we like Attention for sales and meeting call summariesāand why you will, too.
We've been testing Attention for our sales + internal calls lately, and wowāweāre actually obsessed.
Hereās how it works:
Why do we love it? No more scrambling to remember key talking points or frantically taking notes mid-call. We can ābe presentā (as the gurus say) without worrying about recording every detail, because we know Attentionās got our back.
And Attention can do much more, too.
Treats To Try.
*Incogni removes your personal data from the open internet so scammers and identity thieves canāt access it. Stay safe onlineāuse code NEURON to get 58% of Incogniās annual plans with their Black Friday offer now.
Stripe launched an agent toolkit that gives AI agents access to the Stripe API to handle invoices or purchase goods on your behalf (for the techies!).
GenSpark Finance attempts to make financial reports easier to read with graphics and chartsāand itās powered by Claude (raised $60M).
Reforged Labs creates ads for mobile games using templates generated from a custom model trained on successful video game ads (raised $3.9M).
Mikrotakt splits audio tracks and isolates certain elements like guitar, vocals, bass, drumsāitās free to try out, and the demo they provide is fun.
AI Game Master is a text-based RPG tool inspired by D&Dājust describe your actions, and the AI game master does the rest.
Parafact fact checks sources (human or AI) with citations and sourcesāthereās also Factiverse, which is a more B2B version (more about it here).
*This is sponsored content. Advertise in The Neuron here.
Around the Horn.
Claude now lets you improve your prompt in the Anthropic Console by applying the companyās best practices to your prompts to generate new ones.
The U.S. Department of Homeland Security released a new framework for how to safely integrate AI into critical sectors like energy and telecom.
Carnegie Mellon, Harvard, MIT, and Stanford researchers released a new paper that found it might be more efficient to train smaller models instead of shrinking down larger ones.
A new study found ChatGPT outperformed 50 physicians at diagnosing medical cases 90% to 74% (but doctors scored 76% when using ChatGPT).
Monday Meme.
FROM OUR PARTNERS
The fastest way to build AI apps
Writer is the full-stack generative AI platform for enterprises. Quickly and easily build and deploy AI apps with Writer AI Studio, a suite of developer tools fully integrated with our LLMs, graph-based RAG, AI guardrails, and more.
Use Writer Framework to build Python AI apps with drag-and-drop UI creation, our API and SDKs to integrate AI into your existing codebase, or intuitive no-code tools for business users.
A Cat's Commentary.
We stan a backhanded complimentāthe more the merrier!
Thatās all for today, for more AI treats, check out our website. The best way to support us is by checking out our sponsorsātodayās are Attention, Incogni, and Writer. See you cool cats on Twitter: @noahedelman02 |
|