  • We’re spotlighting some incredible demos of ChatGPT-4o with vision.

  • Runway is in talks to raise $450M at a $4B valuation.

  • Google’s carbon footprint has surged due to more data centers needed to power AI.

  • Perplexity rolled out a feature that provides more thorough and better-researched responses.

What happens when ChatGPT can see everything you can…

Some folks daydream about the Lambos they'll buy when they hit it big. Me? I’m daydreaming about the epic stuff I’ll pull off with ChatGPT-4 when its vision and Voice Mode features are finally here.

The release date on these is TBD. It was supposed to be June, but OpenAI's doing more safety testing. Probably this month, but who knows with these AI magicians.

Anyway, we're drooling over this model even more this week, thanks to some sick demos the team's been dropping.

First up, the voice stuff from OpenAI's YouTube channel:

  • Our favorite: creating different character voices with GPT-4 voice (link).

  • Roleplaying an interview with GPT-4 that can 'see' the interviewer (link).

  • Using ChatGPT as a language tutor to practice your Portuguese (link).

These demos are unlike anything we've seen before.

But the vision stuff is what's really got us buzzing. OpenAI's developer advocate Romain Huet showed off some of these capabilities at the AI Engineer's World Fair last week, and holy moly:

  • It summarized a page from a physical book (yeah, those still exist) in a second flat just by glancing at it (link).

  • It understood sketches of images—your kindergarten art skills might finally be useful! (link)

  • Most importantly: it helped Romain fix a wonky design by looking at his screen and tweaking the code on-the-fly (link):

This last point is why it matters. Soon, we might have an AI that sees what we see on our computer screen 24/7. No more struggling to describe what you’re seeing or sending a million screenshots. It'll be a real-time AI assistant for all your visual tasks.

Around the Horn.

  • Runway, an AI video generator, is in talks to raise $450M at a $4B valuation.

  • Figma hit pause on a feature that creates designs from text prompts after it churned out a weather app eerily similar to Apple’s.

  • Google’s greenhouse gas emissions have risen 48% in the last 5 years thanks to the AI-fueled data center boom.

Treats To Try.

  Meta's 3D Gen is new research that can generate 3D assets from text prompts in less than one minute.

  2. Meta’s 3D Gen is new research that can generate 3D assets from text prompts in less than one minute.

  3. Resemble AI lets you clone your voice AND has a tool that sniffs out audio deepfakes with 94% accuracy.

  4. Proteus breathes life into still images, making snapshots of you or others laugh, rap, sing, and more.

