The Neuron
Posts
😺 Google vs OpenAI smackdown

😺 Google vs OpenAI smackdown

PLUS: AI housing tool busted for discrimination...

Grant Harvey
November 22, 2024

Welcome, humans.

Here’s a fun little rumor to start your weekend: Apparently, many of the frontier model experiments are “failing” because “the models are fighting back and refusing instruction tuning”…

According to Aidan McLau, more than one technician at major AI labs has taken this to mean the models of this size are becoming “sentient.” Yup. Some AI researchers think these AI are developing free will… No big deal!

Okay, so there’s plenty of OTHER explanations for why this might be happening besides AI gaining free will. Plus, you could look at the pace of the model race between Google and OpenAI below as a counterpoint against this claim.

Then again, the current models can roleplay as sentient a little TOO well…

Here’s what you need to know about AI today:

Google and OpenAI traded blows with even newer AI models.
OpenAI plans a Samsung deal and Chrome-rival browser.
Apple plans a more ChatGPT-like Siri for 2026.
An AI company had to pay millions for rental discrimination.

Google and OpenAI are dueling each other with MULTIPLE new AI models right now…

Remember earlier this week when we tried Google’s new Gemini model? We warned at the time that OpenAI was already planning to counter with a new GPT-4o that would retake the top spot…and it did.

This new version of 4o is supposed to be much better at creative writing. And so far, it seems like it is (here’s the new GPT-4o writing an Eminem-style rap to prove it).

Well, the plot has thickened—before we could even test the new 4o, it turned out there is ANOTHER new Gemini model (Exp-1121), and it’s already overtaken the new 4o as the #1 model overall:

To quote Logan Kilpatrick, head of product for Google AI Studio “Yeah, gemini-exp-1121 is pretty good :)”

According to LM Arena, this new Gemini is now leading “across almost all domains” with particularly strong performance in coding and vision tasks.

On Livebench, a rival benchmark platform, it’s a much different story, with Exp-1121 lagging behind 4o and Claude’s latest models.

It’s “a tale of two benchmarks”… It was the best of models, it was the worst of models…

To us, the only benchmark that matters is the Minecraft Benchmark: how well can your AI bot follow a prompt to create something mind-blowing in Minecraft? For example, here’s Gemini Exp 1121 versus the new 4o as they compete to build a church of themselves:

You can also compare GPT-4o to Claude Sonnet 3.6 here and here.

Also, for those who deploy AI in your applications, Artificial Analysis is recommending developers don’t switch to the new Nov 20th 4o model yet because it’s likely smaller and less powerful than the current 4o (and not cheaper).

However, it’s the timing of the releases that has everyone talking:

X users are debating whether Google was playing “4D chess” by deliberately releasing a weaker initial model to bait OpenAI into showing their hand with the new GPT-4o.
Others countered that maybe OpenAI anticipated this all along, playing "n+1 dimensional 5D chess”, waiting to drop “gpt-4o-latest-fr-bro-trust-me-this-is-the-latest-one.”
And then there’s this guy, who said maybe Google was playing “6D underwater mahjong” all along, and perhaps there’s ANOTHER ultra Gemini model ready to be released next week…

Google and OpenAI rn…

Public service announcement: if you’re planning on dropping a state of the art AI model anytime soon, just assume OpenAI will be right around the corner with its own drop to beat you—and act accordingly.

Why don’t we put both models to the test, head to head? Pick your fave prompts across multiple topics (creative, reasoning, math, etc), and try it out:

New Gemini: Go to Google AI Studio and under model, select “Gemini Experimental 1121”
New 4o: If you don’t have paid ChatGPT, go to AnyChat and select “ChatGPT” at the top (GPT 4o-2024-11-20 will be the default). If you have paid, ChatGPT 4o is it.

Which is better: OpenAI or Gemini?

After you test both models, share which you prefer (and leave a comment as to why!)

FROM OUR PARTNERS

Your free guide to AI trends and best practices from webAI

Want to understand how businesses are succeeding with AI? Featuring insights from 300 industry leaders, this data-driven report covers:

Who’s leading the pack: What early adopters are doing differently to maximize outcomes
Local AI advantages: The rising preference for local deployments over cloud-based options
AI privacy and security: What’s working—and what’s not—in privacy protection
Scaling success: How company-wide integration unlocks AI’s full potential

Explore stats, facts, and insights to shape your strategy and improve AI’s impact within your organization.

Download the free report

Treats To Try.

*Here’s the answer to all your customer service problems: with JustAnswer's API, your customers can get their questions answered directly within your platform via a 24/7 network of human experts. Boost conversions, save costs, and enhance customer experiences. Sign up for a free demo.
Lovable turns your words into complete web apps - just describe what you want and it builds everything from login screens to databases instantly
NotClass helps you search for specific topics within YouTube videos and podcasts to find exact answers without watching entire videos.
Taurin is an “AI native” email tool that organizes your inbox, auto-summarizes messages, suggests responses, and turns emails into trackable tasks.
Papira helps you write better by correcting grammar, improving flow, and suggesting changes while letting you keep your writing style.
Voilà lets you highlight text on any website to get instant summaries, translations, and replies across all your browser tabs
HumanLayer lets your AI agents pause and ask humans for approval before taking important actions (Github here).
Youtube just released Dream Screen in the U.S., Canada, Australia and New Zealand which lets you generate new backgrounds for your Youtube Shorts.

See our top 51 AI Tools for Business here!

*This is sponsored content. Advertise in The Neuron here.

Around the Horn.

OpenAI might make a deal with Samsung to offer ChatGPT on Samsung devices (like its deal w/ Apple) and may soon launch a Browser to compete with Google (The Information says it just hired 2 O.G. Chrome devs).
Google launched AI Agent Space, a marketplace to help you discover and deploy pre-built AI agents to help you automate tasks like customer support or sales analysis—and it’ll probably promote any new agents you build, too.
Apple plans to launch a new, LLM powered version of Siri (similar to ChatGPT’s Advanced Voice Mode) in Spring 2026; and it’ll have the same access to info + apps that today’s Siri has, potentially solving the “context” problem of today’s AI (let’s hope they figure privacy out tho!)
An AI tool called SafeRent had to pay a $2.3M settlement after its AI scores for evaluating potential renters were found to be discriminatory.

Intelligent Insights

Here’s a fun little deep dive into how Zuckerberg transformed Meta through open-source AI.
Cool piece on robots and how they can now learn complex tasks like flipping pancakes and tying shoelaces in hours instead of weeks.
There’s been a lot of hype over using AI to design chips—well, Intel researchers discovered that traditional algorithms actually work better for complex chip layout problems, leading them to develop a hybrid approach that's dramatically faster than either AI or human designers working alone.
Here’s a good story on how, after the last AI winter of the 1990s, AI research became more pluralistic and probabilistic, with the field's shift away from rigid rule-based systems to statistical methods and neural networks enabling today’s tech (so a scaling winter might actually be good for the industry).
This economic analysis maps out how the transition to AGI could unfold: it shows that wages follow an “inverse U-shape” pattern during automation (first rising, then falling), but the long-term outcome depends on whether there's an upper limit to how complex human work can get—if human work can always get more complex, wages can grow forever as we tackle harder tasks; but if there's a limit to human capabilities that AI can reach, wages will eventually collapse.