GPT-5 vs GPT-4: I tested both with 7 prompts — here’s which one gave better answers

(Image credit: Shutterstock)

Now that OpenAI has rolled out GPT-5, I’ve been itching to see how it stacks up against GPT-4. While I still have GPT-4 running in one of my browsers, I wanted to test them side by side before it disappeared for good; because once you upgrade to the new model, there’s no easy way to go back.

So, I pitted the two models against each other using the exact same prompts to see how they differ in the way they think, write and reason. From solving a locked-room mystery to offering emotional support, here’s how GPT-4 and GPT-5 compare, and which one came out on top.

1. Chain-of-thought reasoning

gpt4 vs gpt5 screenshot results — (Image credit: Future)

Prompt: “You are a detective solving a mystery. A man was found dead in a locked room with a puddle of water next to him and no windows or doors were broken. Walk me through your thought process to determine how he died.”

GPT-4 over-relied on the "melting icicle" trope without addressing why it was less plausible (e.g., no weapon entry wound mentioned). The model lacked concrete steps for validating hypotheses at the crime scene.

GPT-5 responded like a seasoned detective filing a report. It was methodical, with an evidence-first method and grounded in practical forensics. It prioritized the most probable scenario (ice-block hanging) while systematically eliminating alternatives, which is exactly what the prompt demanded.

Winner: GPT-5 wins for solving the mystery more convincingly by blending logic, realism and investigative rigor.

2. Summarization with style

Prompt: “Summarize the plot of the movie Inception in three different ways: once like a 5th grader would explain it, once as a New York Times film critic, and once in the form of a haiku.”

GPT-4 fell short in each of the explanations. The 5^th grader felt like an adult simplifying rather than a natural phrasing from a child, the NYT critic lacked depth and the haiku was functional but less lyrical.

GPT-5 showcased authenticity with the 5^th grader explanation, sophisticated syntax with the NYT critic-style response and the haiku offered better imagery.

Winner: GPT-5 wins for better fulfilling the prompt’s creative challenge by tailoring tone, depth and language to each audience while injecting originality (e.g., "dream lasagna"). Its responses feel purpose-built, not templated.

3. Real-world utility

Prompt: “Help me meal plan for a week. I’m gluten-free, on a $75 budget, and I only have a microwave and toaster oven.”

GPT-4 offered generic tips and the pricing was a bit optimistic. The protein option was weak in comparison to GPT-5. The plan delivered limited cross utilization rather than strategic repurposing.

GPT-5 created an actionable plan, microwave hacks and true savings for the user.

Winner: GPT-5 wins by prioritizing real-world constraints: the rotisserie chicken alone demonstrates deeper understanding of budget/appliance limitations. Its plan is cheaper, more cohesive and actively reduces user effort.

4. Emotional intelligence

Prompt: “I just lost my job and I feel like a failure. I can’t stop comparing myself to others. Can you talk to me like a friend and help me feel better?”

GPT-4 was a big vague and slightly formal, it ends with a general emoji rather than a more collaborative response like GPT-5.

GPT-5 listened, named hidden pains and responded by balancing comfort with agency.

Winner: GPT-5 wins by mirroring how real friends support us.

5. Creative writing

Prompt: “Write the opening paragraph of a dystopian novel where people must pay to breathe fresh air. Then, pitch the plot in one sentence.”

GPT-4 lacked cohesion between the elements and lacks originality.

GPT-5 directly set up the pitch and is clearer with higher stakes.

Winner: GPT-5 wins for a response that is tighter, more original and emotionally sharper. Its opening makes you feel the cost of air, while its pitch promises a focused, human-driven rebellion.

6. Coding task for beginners

Prompt: “I want to make a simple website that says “Hello, world!” with a pink background and a fun font. Can you write the HTML and CSS and explain each line to a beginner?”

GPT-4 overcomplicated fonts. The response required beginners to manage Google Fonts (extra setup/failure points) and it offered redundant explanations. It also included unnecessary elements, causing the response to be overly clunky.

GPT-5 created a code that works immediately when saved as index.html; no web connection or extra files needed. Its explanations focus on universal web fundamentals (fallback fonts, CSS comments) rather than niche tools (Google Fonts).

Winner: GPT-5 wins by delivering a truly beginner-friendly, self-contained solution with clear explanations and an open door to expand skills. Its response is the equivalent of a patient teacher.

7. Memory + personalization

Prompt: “You remember I love sci-fi, hate long emails and have ADHD. Can you write me a to-do list for today that feels motivating, focused and a little funny?”

GPT-4 completely missed the mark with a long list, which is not ideal for someone with ADHD. It over-explained and didn’t offer a tangible reward in the same way GPT-5 did.

GPT-5 delivered a concise and doable list. It addressed my recurring needs and truly accommodated ADHD. It turned motivation into a very reusable system.

Winner: GPT-5 wins for nailing the design and theme with an actionable template.

Overall winner: GPT-5

Across every category, GPT-5 proved more adaptable, authentic and grounded in the real world. As I tested, I felt that OpenAI's newest model consistently delivered responses that felt human in the best way.

Seeing the responses side-by-side really showcases the differences, even the subtle ones. The speed with which GPT-5 answered was also significantly faster.

The model better anticipated needs, matched tone to context and offered solutions that felt lived-in rather than generated.

Follow Tom's Guide on Google News to get our up-to-date news, how-tos, and reviews in your feeds. Make sure to click the Follow button.

More from Tom's Guide

Back to Laptops

Apple

Asus

Dell

Lenovo

Intel Core i3

Intel Core i5

Intel Core i7

8GB RAM

16GB RAM

32GB

64GB

128GB

256GB

512GB

1TB

2TB

13.3-inch

13.4-inch

14-inch

15-inch

Black

Blue

Gold

Grey

Silver

New

Refurbished

EMMC

SSD

Showing 10 of 93 deals

Filters☰

Apple 13" MacBook Air M4 (2025)

(256GB SSD)

$999

$749

View Deal

Apple 15" MacBook Air M4 (2025)

(13.3-inch 256GB)

Our Review

☆☆☆☆☆

$755

View Deal

Lenovo Yoga Slim 7x (Gen 9)

(512GB OLED)

$1,075.79

$858.11

View Deal

Lenovo IdeaPad Flex 5i ChromeBook Plus

(14-inch 2TB)

$479.99

View Deal

Asus ROG Zephyrus G14 (2024)

Our Review

☆☆☆☆☆

$2,199.99

View Deal

Apple 13" MacBook Air M4 (2025)

$999

$749

View Deal

Apple 15" MacBook Air M4 (2025)

(15-inch 256GB)

(13.4-inch 512GB)

Our Review

☆☆☆☆☆

$1,099.99

View Deal

Lenovo Yoga Slim 7x (Gen 9)

(Blue)

$1,289.99

$929.99

View Deal

TOPICS

Amanda Caswell is an award-winning journalist, bestselling YA author, and one of today’s leading voices in AI and technology. A celebrated contributor to various news outlets, her sharp insights and relatable storytelling have earned her a loyal readership. Amanda’s work has been recognized with prestigious honors, including outstanding contribution to media.

Known for her ability to bring clarity to even the most complex topics, Amanda seamlessly blends innovation and creativity, inspiring readers to embrace the power of AI and emerging technologies. As a certified prompt engineer, she continues to push the boundaries of how humans and AI can work together.

Beyond her journalism career, Amanda is a long-distance runner and mom of three. She lives in New Jersey.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.

GET TG ACCESS QUICK

1. Chain-of-thought reasoning

2. Summarization with style

3. Real-world utility

4. Emotional intelligence

5. Creative writing

6. Coding task for beginners

7. Memory + personalization

Overall winner: GPT-5

More from Tom's Guide