I put 7 leading AI image generators to the test with the same prompt — here’s the winner

Haiper and Runway
(Image credit: Haiper/Runway/Future AI)

Image generation is one of the most mature forms of artificial intelligence creation able to turn a simple idea into a graphic photograph or image of any kind

Well, the underlying technology is fairly mature. There are still strong distinctions between one model and the next and even the way one company might deploy the same version of a model in a completely different way to another company.

In some areas, there’s a lot of convergence particularly around hyperrealistic human faces but in others, there are distinct differences especially in things like text rendering skin texture and prompt following

To get a better idea of how AI might handle fairly complex prompts I’ve given the same free requests to 7 of the leading AI image generators including DALL-E, Flux, Ideogram, Mystic, Phoenix, Midjourney and Haiper.

Creating the prompts

There are likely more models that I’ve excluded than included, including the incredibly powerful Imagen 3 from Google and Meta's Imagine AI. The reason for their exclusion is that they are not as widely available globally as the ones I included.

The three prompts are fairly distinct; the first causes for for a complex scene creation with elements in specific places, the second makes specific requirements for text rendering and the third focuses on skin texture and realism.

If you disagree with any of my decisions or want to try out prompts with specific settings (I ran them all using defaults) I've included the prompts in full.

Prompt one: The young woman

An ultra-realistic smartphone selfie of a young woman in her mid-20s. The photo has the characteristic sharpness and vivid color of a high-end smartphone camera, with slight motion blur on one edge. The image is taken in natural daylight, causing mild overexposure on one side of her face. She has shoulder-length curly hair with grown-out highlights, and wears minimal, everyday makeup with slightly smudged eyeliner. Her expression is a genuine, slightly lopsided smile with a hint of tiredness around her eyes. She's wearing a comfortable, well-worn graphic t-shirt with a faded band logo. A thin silver necklace is partially tangled in her hair near her collar. The background is a lived-in studio apartment, with a unmade bed and a small bookshelf visible. A houseplant with a few yellowing leaves sits on a windowsill behind her. There's a small coffee stain barely visible on the collar of her shirt.

Midjourney

Midjourney

(Image credit: Midjourney/Future AI)

I used all default settings for all of these prompts which unfortunately does a disservice to Midjourney, which is the most customizable of all AI image models. Here it missed some of the points of the prompt because of its default behavior to make things perfect. That said I think it created a brilliant depiction of the woman.

DALL-E

DALL-E 3

(Image credit: DALL-E 3/Future AI)

DALL-E is barely in the race when testing for prompts showing real people as it makes everyone look a little like a BRATZ doll.

Ideogram

Ideogram

(Image credit: Ideogram/Future AI)

Ideogram did a good job of following the 'imperfections' element of the prompt but overdid it on the motion blur — slightly. However, I think this is the most natural of all the images of people.

Freepik Mystic

Freepik Mystic

(Image credit: Freepik Mystic/Future AI)

I like the lighting from Mystic and the woman looks the most realistic. The prompt was followed well but there is a degree of uncanny valley. It also has the 'too perfect' issue of Midjourney.

Flux (using Grok)

Grok 2/FLUX

(Image credit: Grok 2/FLUX/Future AI)

Flux might be my favorite overall image. I don't think its the best in terms of prompt adherence or realistic depiction but it is good and looks generally more believable. 

Leonardo Phoenix

Leonardo Phoenix

(Image credit: Leonardo Phoenix/Future Ai)

I really did believe this one was a real photo. It captured the imperfections perfectly but the lighting is still slightly off and the framing is weird.

Haiper

Haiper

(Image credit: Haiper/Future AI)

Haiper did a good job but it didn't get the lighting right and the skin is too 'perfect'. Otherwise this is my favorite character generated out of the set.

Winner:  Ideogram

Prompt two: Penny Lane

A bustling 1960s London street scene on a rainy afternoon. The street is lined with iconic red double-decker buses, black cabs, and people holding colourful umbrellas. A Beatles-inspired band performs on a street corner, with their instruments reflecting in the wet pavement. In the background, Big Ben is visible through a light fog. A neon sign above a small café reads 'Penny Lane' in glowing letters. On the right, a woman in a stylish 1960s dress is waiting for the bus, holding a newspaper with the headline 'Man Walks on Moon.' Raindrops are visibly falling, creating ripples in puddles, and the whole scene has a blend of nostalgia and realism.

Midjourney

Midjourney

(Image credit: Midjourney/Future AI)

Midjourney did a good job of following the scene and 'tried' to render the sign accurately but mixed up the two requests for text. 

DALL-E

DALL-E 3

(Image credit: DALL-E 3/Future AI)

Again, DALL-E tried to display the text but failed to render it accurately mixing up the two different statements in strange ways. The scene was also more cartoonish than the others.

Ideogram

Ideogram

(Image credit: Ideogram/Future AI)

Ideogram is the only one that got it pretty much spot on. It rendered Penny Lane and provided a newspaper with a woman holding it. Its visual isn't as atmospheric as Midjourney but the scene structure is better.

Freepik Mystic

Freepik Mystic

(Image credit: Freepik Mystic/Future AI)

Mystic, which is based on the Flux model but with some additional fine-tuning is also impressive. It correctly rendered the text and put a woman with the newspaper. Visual is better than Ideogram, scene structure not as good as the woman is in the road.

Flux (using Grok)

Grok 2/FLUX

(Image credit: Grok 2/FLUX/Future AI)

Flux, generated using Grok, was surprisingly clever as it had the woman with the newspaper, put the words Penny Lane and the Beatles in a billboard along with Man Walks on the Moon. However, while the visual is good, the scene structure is terrible including creating two Elizabeth Towers (Big Ben).

Leonardo Phoenix

Leonardo Phoenix

(Image credit: Leonardo Phoenix/Future AI)

Leonardo's Phoenix probably had the best prompt adherence of any of the models I tried. It also had impressive text rendering but its visual look, scene creation and faces were terrible.

Haiper

Haiper

(Image credit: Haiper/Future AI)

Haiper had the best visual and atmosphere but it didn't even attempt the text and ignored many elements of the prompt itself, so also failed prompt adherence. 

Winner: Ideogram

Prompt three: Victorian London

A bustling Victorian-era London street at twilight, with horse-drawn carriages moving through the cobblestone roads. A well-dressed woman in a crimson dress and bonnet is standing under a gas street lamp, reading a folded newspaper with the headline: 'New Inventions Change the World!'. The glow of the lamp casts a warm light on her face. Steam rises from a nearby street vendor's cart selling roasted chestnuts, while children in tattered clothes run playfully in the background. In the distance, the clock tower of Big Ben looms, half-covered in a misty fog. The realism should highlight the textures of the street, the detailed facial expressions of the woman, and the subtle nuances of the mist and lighting.

Midjourney

Midjourney

(Image credit: Midjourney/Future AI)

Midjourney captured the basics of the scene including accurately rendering the woman in the bonnet although it seems to have rendered fog as smoke.

DALL-E

DALL-E 3

(Image credit: DALL-E 3/Future AI)

DALL-E didn't attempt the text but it did accurately capture the scene. Again it was a little more on the cartoon side rather than the realistic. It looks somewhat like a Victorian postcard.

Ideogram

Ideogram

(Image credit: Ideogram/Future AI)

Ideogram did a reasonable job of rendering the frame. Not a fan of the slight cartoon-ish feel or the kids in the street, but the woman looks natural and it almost gets the text.

Freepik Mystic

Freepik Mystic

(Image credit: Freepik Mystic/Future AI)

Mystic was the best overall image as it accurately depicted the scene, had a very realistic vibe but did fail on the text rendering.

Flux (using Grok)

Grok 2/FLUX

(Image credit: Grok 2/FLUX/Future AI)

Flux (in Grok) did the best job of displaying the text on the newspaper and even generated the image in such a way that the way the woman reading the paper is more natural. 

Leonardo Phoenix

Leonardo Phoenix

(Image credit: Leonardo Phoenix/Future AI)

Leonardo Phoenix accurately framed the scene and captured the writing on the newspaper but it did have a cartoon-like feel to the image.

Haiper

Haiper

(Image credit: Haiper/Future AI)

The scene from Haiper feels a lot more real, not attempting to show London by displaying Big Ben. Rather it seems to show a Victorian-era scene including the gas lights and horse and cart on cobbled streets.

Winner: Flux (in Grok)

The winner: Ideogram

Swipe to scroll horizontally
Header Cell - Column 0 The young womanPenny LaneVictorian London
MidjourneyRow 0 - Cell 1 Row 0 - Cell 2 Row 0 - Cell 3
DALL-E 3Row 1 - Cell 1 Row 1 - Cell 2 Row 1 - Cell 3
Ideogram 🏆 🏆Row 2 - Cell 3
Freepik (Mystic)Row 3 - Cell 1 Row 3 - Cell 2 Row 3 - Cell 3
Flux (in Grok)Row 4 - Cell 1 Row 4 - Cell 2 🏆
Leonardo PhoenixRow 5 - Cell 1 Row 5 - Cell 2 Row 5 - Cell 3

When it comes to rendering individuals using AI it is clear the top models are starting to converge, with very similar-looking characters appearing from the same prompt across different tools. The ability to render text is still variable with only Ideogram largely consistent.

While I gave the win to Ideogram the whole thing was largely subjective and there were enough differences between one model and another that almost any of them could have won any category. The exception was DALL-E which is feeling its age.

More from Tom's Guide

Category
Arrow
Arrow
Back to MacBook Air
Brand
Arrow
Processor
Arrow
RAM
Arrow
Storage Size
Arrow
Screen Size
Arrow
Colour
Arrow
Storage Type
Arrow
Condition
Arrow
Price
Arrow
Any Price
Ryan Morrison
AI Editor

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?

  • claytoncasey01
    admin said:
    I put seven leading artificial intelligence image generators to the test on the same three prompts to see which performed best on prompt adherence, realism and text rendering.

    I put 7 leading AI image generators to the test with the same prompt — here’s the winner : Read more
    I'm tired of seeing these articles doing these "same prompt" tests like it's apples to apples. Different models require somewhat unique ways to get the best results. A great result from Midjourney probably won't give a great result with Flux or SDXL for example.

    Please start comparing with proper prompts for each and discuss the differences so people might learn something new.
    Reply
  • RyanMorrison
    I agree that there are different prompting techniques for different models but I try to write simple descriptive prompts that are relatively generally applicable. Also, this is closer to how most 'average' users might utilize AI so is a good test of how they perform.

    I will do some more detailed comparisons with custom prompts as I agree that is also a good way to explore the potential. I did point out the limitations of this prompt for Midjourney by relying solely on prompting without utilizing parameters or specific keywords.
    Reply
  • M5MSM
    Would you be able to show us the results with the other 2 AI generators you mentioned?
    Reply
  • binzer
    I plugged all 3 prompts into Google's Image FX. It outright rejected the second 2 prompts. I believe this is because the Beatles and children were mentioned.

    I quite like its results for the first prompt, but I've never used a more frustrating image generator (Link to ImageFX result).
    Reply
  • goatonastick
    This seems more like a test of how one specific form of prompt writing works on several models.
    This prompt seems too long for some models, such as flux where they dont recommend more than 22 tokens, and some models, like the ACTUAL most customizable model, Stable Diffusion, works better with simple words or phrases separated by commas.
    A great opportunity to teach people some good general rules, like how simpler and more concise prompts lead to better adherence, or how important prompt order is, is lost because the author just wanted to cut and paste a description and see what happens. Now, they are led to think there’s only one or two models that actually work, and the rest aren’t even worth their time.
    I also noticed that, besides two of the scenes being in the streets of London, the article didnt clearly express their goal. Was it how closely they followed the prompt? Because the author often complained about the lack of realism in some.
    Was is how realistic they were? Because the author didn’t use any terms to generate realistic scenes in the second two prompts, and even made the mistake of using the term “realism” in both of them, which are actually bad practice for trying to generate photorealistic images, because no normal person describes an actual photo as “realistic”, therefore the models often include references of drawings and illustrations, which are what the words realism/realistic tend to be used for. It’s even to a point that it’s recommended that you put “realistic” in the negative prompt (when possible) to ensure you get photographs rather than artwork.
    It’s suggested you specifically mention the words photo, such as in the first prompt, or, for more precision, a focal length like 35 mm (or even a specific brand and model of lens!), because that sort of description would only be attached to actual photos in the training set.
    Reply
  • StefonR
    Unfortunate that all of the AI have a bias to interpret attractive as white or Caucasian. If this is the implicit bias it has, who knows what other bias has been trained into it and it will be acting on as these systems get more deeply integrated into society.
    Reply
  • iampat
    admin said:
    I put seven leading artificial intelligence image generators to the test on the same three prompts to see which performed best on prompt adherence, realism and text rendering.

    I put 7 leading AI image generators to the test with the same prompt — here’s the winner : Read more

    I ran these through imagen 3, default config, no edits to prompts. Only the first prompt was worth a hoot, the others were very cartoonish and had terrible compositions and jumbled text
    Reply
  • Wolfonacable
    StefonR said:
    Unfortunate that all of the AI have a bias to interpret attractive as white or Caucasian. If this is the implicit bias it has, who knows what other bias has been trained into it and it will be acting on as these systems get more deeply integrated into society.
    Really? How did you reach this conclusion? The results from the first prompt returned varied results with, among others, what I assume was a mixed-race woman and a woman of east Asian heritage. I also, and please correct me if I'm wrong, didn't see any mention in the prompts of attractiveness...
    Reply
  • GerryP
    StefonR said:
    Unfortunate that all of the AI have a bias to interpret attractive as white or Caucasian. If this is the implicit bias it has, who knows what other bias has been trained into it and it will be acting on as these systems get more deeply integrated into society.
    I haven't tried all the image generatgors. I prefer to master one rather tool than spread myself too thin. I've been working with DALL-E a lot. It defaults to a male caucasian unless specifically told to do otherwise. Similaily, it's people are all slim, young and way too pretty. When prompting DALL-E, specify what you want in these areas.
    Reply
  • we be digi
    Was this sponsored content. Accuracy to prompt may have been the best measure but I was most impressed by Freepik Mystic, ideogram ranks lower in my opinion.
    Reply