• The Neuron
  • Posts
  • šŸ˜ŗ Google vs OpenAI smackdown

šŸ˜ŗ Google vs OpenAI smackdown

PLUS: AI housing tool busted for discrimination...

Welcome, humans.

Hereā€™s a fun little rumor to start your weekend: Apparently, many of the frontier model experiments are ā€œfailingā€ because ā€œthe models are fighting back and refusing instruction tuningā€ā€¦

According to Aidan McLau, more than one technician at major AI labs has taken this to mean the models of this size are becoming ā€œsentient.ā€ Yup. Some AI researchers think these AI are developing free willā€¦ No big deal!

Okay, so thereā€™s plenty of OTHER explanations for why this might be happening besides AI gaining free will. Plus, you could look at the pace of the model race between Google and OpenAI below as a counterpoint against this claim.

Then again, the current models can roleplay as sentient a little TOO wellā€¦

Hereā€™s what you need to know about AI today:

  • Google and OpenAI traded blows with even newer AI models.

  • OpenAI plans a Samsung deal and Chrome-rival browser.

  • Apple plans a more ChatGPT-like Siri for 2026.

  • An AI company had to pay millions for rental discrimination.

Google and OpenAI are dueling each other with MULTIPLE new AI models right nowā€¦

Remember earlier this week when we tried Googleā€™s new Gemini model? We warned at the time that OpenAI was already planning to counter with a new GPT-4o that would retake the top spotā€¦and it did.

This new version of 4o is supposed to be much better at creative writing. And so far, it seems like it is (hereā€™s the new GPT-4o writing an Eminem-style rap to prove it).

Well, the plot has thickenedā€”before we could even test the new 4o, it turned out there is ANOTHER new Gemini model (Exp-1121), and itā€™s already overtaken the new 4o as the #1 model overall:

To quote Logan Kilpatrick, head of product for Google AI Studio ā€œYeah, gemini-exp-1121 is pretty good :)ā€

According to LM Arena, this new Gemini is now leading ā€œacross almost all domainsā€ with particularly strong performance in coding and vision tasks.

On Livebench, a rival benchmark platform, itā€™s a much different story, with Exp-1121 lagging behind 4o and Claudeā€™s latest models.

Itā€™s ā€œa tale of two benchmarksā€ā€¦ It was the best of models, it was the worst of modelsā€¦

To us, the only benchmark that matters is the Minecraft Benchmark: how well can your AI bot follow a prompt to create something mind-blowing in Minecraft? For example, hereā€™s Gemini Exp 1121 versus the new 4o as they compete to build a church of themselves:

You can also compare GPT-4o to Claude Sonnet 3.6 here and here.

Also, for those who deploy AI in your applications, Artificial Analysis is recommending developers donā€™t switch to the new Nov 20th 4o model yet because itā€™s likely smaller and less powerful than the current 4o (and not cheaper).

However, itā€™s the timing of the releases that has everyone talking: 

  • X users are debating whether Google was playing ā€œ4D chessā€ by deliberately releasing a weaker initial model to bait OpenAI into showing their hand with the new GPT-4o.

  • Others countered that maybe OpenAI anticipated this all along, playing "n+1 dimensional 5D chessā€, waiting to drop ā€œgpt-4o-latest-fr-bro-trust-me-this-is-the-latest-one.ā€

  • And then thereā€™s this guy, who said maybe Google was playing ā€œ6D underwater mahjongā€ all along, and perhaps thereā€™s ANOTHER ultra Gemini model ready to be released next weekā€¦

Google and OpenAI rnā€¦

Public service announcement: if youā€™re planning on dropping a state of the art AI model anytime soon, just assume OpenAI will be right around the corner with its own drop to beat youā€”and act accordingly.

Why donā€™t we put both models to the test, head to head? Pick your fave prompts across multiple topics (creative, reasoning, math, etc), and try it out:

  1. New Gemini: Go to Google AI Studio and under model, select ā€œGemini Experimental 1121ā€

  2. New 4o: If you donā€™t have paid ChatGPT, go to AnyChat and select ā€œChatGPTā€ at the top (GPT 4o-2024-11-20 will be the default). If you have paid, ChatGPT 4o is it.

Which is better: OpenAI or Gemini?

After you test both models, share which you prefer (and leave a comment as to why!)

Login or Subscribe to participate in polls.

FROM OUR PARTNERS

Want to understand how businesses are succeeding with AI? Featuring insights from 300 industry leaders, this data-driven report covers: 

  • Whoā€™s leading the pack: What early adopters are doing differently to maximize outcomes

  • Local AI advantages: The rising preference for local deployments over cloud-based options

  • AI privacy and security: Whatā€™s workingā€”and whatā€™s notā€”in privacy protection 

  • Scaling success: How company-wide integration unlocks AIā€™s full potential

Explore stats, facts, and insights to shape your strategy and improve AIā€™s impact within your organization. 

Treats To Try.

  1. *Hereā€™s the answer to all your customer service problems: with JustAnswer's API, your customers can get their questions answered directly within your platform via a 24/7 network of human experts. Boost conversions, save costs, and enhance customer experiences. Sign up for a free demo.

  2. Lovable turns your words into complete web apps - just describe what you want and it builds everything from login screens to databases instantly

  3. NotClass helps you search for specific topics within YouTube videos and podcasts to find exact answers without watching entire videos.

  4. Taurin is an ā€œAI nativeā€ email tool that organizes your inbox, auto-summarizes messages, suggests responses, and turns emails into trackable tasks.

  5. Papira helps you write better by correcting grammar, improving flow, and suggesting changes while letting you keep your writing style.

  6. VoilĆ  lets you highlight text on any website to get instant summaries, translations, and replies across all your browser tabs

  7. HumanLayer lets your AI agents pause and ask humans for approval before taking important actions (Github here).

  8. Youtube just released Dream Screen in the U.S., Canada, Australia and New Zealand which lets you generate new backgrounds for your Youtube Shorts.

*This is sponsored content. Advertise in The Neuron here.

Around the Horn.

  • OpenAI might make a deal with Samsung to offer ChatGPT on Samsung devices (like its deal w/ Apple) and may soon launch a Browser to compete with Google (The Information says it just hired 2 O.G. Chrome devs).

  • Google launched AI Agent Space, a marketplace to help you discover and deploy pre-built AI agents to help you automate tasks like customer support or sales analysisā€”and itā€™ll probably promote any new agents you build, too.

  • Apple plans to launch a new, LLM powered version of Siri (similar to ChatGPTā€™s Advanced Voice Mode) in Spring 2026; and itā€™ll have the same access to info + apps that todayā€™s Siri has, potentially solving the ā€œcontextā€ problem of todayā€™s AI (letā€™s hope they figure privacy out tho!)

  • An AI tool called SafeRent had to pay a $2.3M settlement after its AI scores for evaluating potential renters were found to be discriminatory.

Intelligent Insights

  • Hereā€™s a fun little deep dive into how Zuckerberg transformed Meta through open-source AI.

  • Cool piece on robots and how they can now learn complex tasks like flipping pancakes and tying shoelaces in hours instead of weeks.

  • Thereā€™s been a lot of hype over using AI to design chipsā€”well, Intel researchers discovered that traditional algorithms actually work better for complex chip layout problems, leading them to develop a hybrid approach that's dramatically faster than either AI or human designers working alone.

  • Hereā€™s a good story on how, after the last AI winter of the 1990s, AI research became more pluralistic and probabilistic, with the field's shift away from rigid rule-based systems to statistical methods and neural networks enabling todayā€™s tech (so a scaling winter might actually be good for the industry).

  • This economic analysis maps out how the transition to AGI could unfold: it shows that wages follow an ā€œinverse U-shapeā€ pattern during automation (first rising, then falling), but the long-term outcome depends on whether there's an upper limit to how complex human work can getā€”if human work can always get more complex, wages can grow forever as we tackle harder tasks; but if there's a limit to human capabilities that AI can reach, wages will eventually collapse.

A Cat's Commentary.

Thatā€™s all for today, for more AI treats, check out our website.

The best way to support us is by checking out our sponsorsā€”todayā€™s are WebAI and JustAnswer.

See you cool cats on Twitter: @noahedelman02

What'd you think of today's email?

Login or Subscribe to participate in polls.