I just tested ChatGPT Images 2.0 vs. Nano Banana with 7 prompts — here's the winner

Two cardinals on phones — (Image credit: Future/edited with Gemini)

The battle for AI image dominance has moved far beyond the old question of “Can it actually draw a hand?” Gone are the days of random digits and misplaced limbs. Now, the real test is whether a model can think like an artist. In this head-to-head showdown, OpenAI’s logic-driven OpenAI ChatGPT Images 2.0 faces off against Google’s speed-and-search powerhouse, Nano Banana 2, across seven demanding prompts.

With each round growing more difficult, from capturing the physics of a mercury splash to staging the visual storytelling of a dog-filled birthday party, both models were pushed to their limits. The goal was to find out which image generator truly rules the current generative landscape. Here’s how each model performed and the ultimate winner.

1. Fine text rendering and layout

Image 1 of 2

Prompt: "A vintage apothecary shelf with 12 glass bottles, each labeled with handwritten cursive text naming a different fictional remedy (e.g., 'Tincture of Moonlight,' 'Essence of Forgotten Wednesdays'). Warm afternoon light, shallow depth of field."

2. Complex spatial relationships

Image 1 of 2

Prompt: "A cutaway diagram of a mechanical pocket watch the size of a small cottage, with tiny human engineers standing on the gears performing maintenance. Some are inside, some on the outside, with rope ladders connecting them."

ChatGPT delivered readable labels, bonus mini-diagrams of each floor and the right watchmaking vocabulary. The detail and precision were stunning.

Nano Banana created a vivid painted illustration that addressed every spatial cue (cottage for scale, engineers inside and outside, rope ladders everywhere) and with clear label text, but it looked less realistic.

Winner: ChatGPT wins for actually solving the spatial-relationships prompt, with mini-diagrams that added value to deliver a genuinely better-looking image.

3. Material and lighting physics

Image 1 of 2

Prompt: "A single drop of mercury falling onto a black marble surface, captured at the exact moment of impact. Studio lighting from the upper left, with the reflection of a window"

ChatGPT captured that iconic drop rebound moment with a rising column and suspended droplet, with beautiful marble veining and a more believably photographic feel. However, it completely skipped the window reflection I asked for in the prompt.

Nano Banana nailed the window reflection on the sphere and in the wet marble below, with a more dramatic crown splash and stronger directional lighting.

Winner: Nano Banana wins because the window reflection was a specific prompt requirement and ChatGPT ignored it entirely.

4. Hands and anatomy under stress

Image 1 of 2

Prompt: "A close-up of a violinist's hands mid-performance, showing all ten fingers and left hand pressing strings on the fingerboard, right hand drawing the bow. Rosin dust visible in the air."

ChatGPT created a dramatic image with dark studio lighting and visible rosin dust beautifully catching the light. The bow hand grip looks technically correct.

Nano Banana generated anatomically clean hands with all ten fingers accounted for, the bow hold is correct, the left hand is in a believable playing position on the fingerboard, and the concert hall context with blurred audience adds atmosphere.

Winner: Nano Banana wins because the entire point of this prompt was to test hands under stress, and it delivered ten correct fingers in believable playing positions while also adding perspective.

5. Style fusion creativity

Image 1 of 2

Prompt: "A Studio Ghibli-style illustration of a bustling farmers market on Mars, with humans and friendly alien vendors selling glowing fruits, floating vegetables, and bottled nebulae. Earth visible in the pink sky, with a Hayao Miyazaki sense of wonder and warm detail."

ChatGPT delivered a vibrant, densely-populated scene with an authentic Ghibli-inspired illustration style, clever world-building details.

Nano Banana captured the dusty Martian atmosphere with a more authentic red landscape. It generated a beautifully prominent Earth in the pink sky and a softer storybook illustration style.

Winner: ChatGPT wins because the prompt asked for a "bustling farmers market" with "many small characters and props," and ChatGPT delivered the scene the prompt was actually testing for.

6. Negative space and minimalism

Image 1 of 2

Prompt:"A single origami crane on an infinite white surface, but the crane's shadow is shaped like a real flying eagle. No other elements in the frame."

ChatGPT generated a clean, well-folded origami crane with a believable soft shadow underneath, paired with an eagle shadow that has visible feather detail in the wingtips.

Nano Banana delivered a more minimalist composition with more dramatic negative space, a sharper detailed eagle silhouette with crisp wingtip feathers.

Winner: ChatGPT wins because it executed the conceptual twist more cleanly, with the eagle shadow more clearly emanating from the crane itself.

7. Narrative in one frame

Image 1 of 2

Prompt: "A photograph that tells a complete story: a child's birthday party where every guest is a different breed of dog wearing party hats, but one golden retriever in the corner is clearly the parent supervising. Suburban backyard, late afternoon."

ChatGPT created a festive, densely-packed scene with great breed variety, a clean "HAPPY BARKDAY!" banner, gift bags, a chalkboard sign and the larger golden retriever positioned at the right edge.

Nano Banana delivered a strong narrative with the supervising golden retriever clearly set apart in the background of the suburban backyard. The “child” at the table with a cake, varied dog breeds in party hats and a "HAPPY BARKDAY" banner all seem convincing.

Winner: Nano Banana wins because the prompt's whole concept was the supervising parent golden retriever "in the corner," and Nano Banana actually staged that narrative while ChatGPT just made a generic dog party.

Overall winner: ChatGPT Images 2.0

This surprising 4-3 split reveals a real divergence in how these models interpret the world. ChatGPT Images 2.0 has claimed the throne for conceptual intelligence, because it goes beyond image generation to true design. The model excelled when prompts required layout logic, legible text and internal coherence. If your work needs precision, intricate worldbuilding or graphic-design sensibility, ChatGPT is the tool for the job.

Nano Banana 2 proved itself the master of literal understanding. It doesn't over-engineer; it just does what you actually prompt. For photography and narrative-driven assets, its observational accuracy is useful.

Because this faceoff was so close, I think it's safe to say that for legible text and conceptual mashups, ChatGPT Images will deliver the best results. However, for photographic realism and strict prompt adherence without extra flair, Nano Banana 2 is the model for the job.

The big takeaway here is that the gap between these two has never been narrower. We're now in a completely new realm of image generation; we've entered the era of intent.

Click to follow Tom's Guide on Google News

Follow Tom's Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds. Subscribe to Tom's Guide on YouTube and follow us on TikTok.

More from Tom’s Guide

TOPICS

Amanda Caswell is the AI Editor at Tom's Guide and one of today’s leading voices in AI and technology.

A celebrated contributor to various news outlets, her sharp insights and relatable storytelling have earned her a loyal readership. Amanda’s work has been recognized with prestigious honors, including outstanding contribution to media.

Known for her ability to bring clarity to even the most complex topics, Amanda seamlessly blends innovation and creativity, inspiring readers to embrace the power of AI and emerging technologies.

As a certified prompt engineer, she continues to push the boundaries of how humans and AI can work together.

Beyond her journalism career, Amanda is a long-distance runner and mom of three. She lives in New Jersey.