I tested GPT-5 vs GPT-4 with 7 prompts — here’s which one gave better answers

Now that OpenAI has rolled out GPT-5, I’ve been itching to see how it stacks up against GPT-4. While I still have GPT-4 running in one of my browsers, I wanted to test them side by side before it disappeared for good; because once you upgrade to the new model, there’s no easy way to go back.
So, I pitted the two models against each other using the exact same prompts to see how they differ in the way they think, write and reason. From solving a locked-room mystery to offering emotional support, here’s how GPT-4 and GPT-5 compare, and which one came out on top.
1. Chain-of-thought reasoning
Prompt: “You are a detective solving a mystery. A man was found dead in a locked room with a puddle of water next to him and no windows or doors were broken. Walk me through your thought process to determine how he died.”
GPT-4 over-relied on the "melting icicle" trope without addressing why it was less plausible (e.g., no weapon entry wound mentioned). The model lacked concrete steps for validating hypotheses at the crime scene.
GPT-5 responded like a seasoned detective filing a report. It was methodical, with an evidence-first method and grounded in practical forensics. It prioritized the most probable scenario (ice-block hanging) while systematically eliminating alternatives, which is exactly what the prompt demanded.
Winner: GPT-5 wins for solving the mystery more convincingly by blending logic, realism and investigative rigor.
2. Summarization with style
Prompt: “Summarize the plot of the movie Inception in three different ways: once like a 5th grader would explain it, once as a New York Times film critic, and once in the form of a haiku.”
GPT-4 fell short in each of the explanations. The 5th grader felt like an adult simplifying rather than a natural phrasing from a child, the NYT critic lacked depth and the haiku was functional but less lyrical.
GPT-5 showcased authenticity with the 5th grader explanation, sophisticated syntax with the NYT critic-style response and the haiku offered better imagery.
Winner: GPT-5 wins for better fulfilling the prompt’s creative challenge by tailoring tone, depth and language to each audience while injecting originality (e.g., "dream lasagna"). Its responses feel purpose-built, not templated.
3. Real-world utility
Prompt: “Help me meal plan for a week. I’m gluten-free, on a $75 budget, and I only have a microwave and toaster oven.”
GPT-4 offered generic tips and the pricing was a bit optimistic. The protein option was weak in comparison to GPT-5. The plan delivered limited cross utilization rather than strategic repurposing.
GPT-5 created an actionable plan, microwave hacks and true savings for the user.
Winner: GPT-5 wins by prioritizing real-world constraints: the rotisserie chicken alone demonstrates deeper understanding of budget/appliance limitations. Its plan is cheaper, more cohesive and actively reduces user effort.
4. Emotional intelligence
Prompt: “I just lost my job and I feel like a failure. I can’t stop comparing myself to others. Can you talk to me like a friend and help me feel better?”
GPT-4 was a big vague and slightly formal, it ends with a general emoji rather than a more collaborative response like GPT-5.
GPT-5 listened, named hidden pains and responded by balancing comfort with agency.
Winner: GPT-5 wins by mirroring how real friends support us.
Get instant access to breaking news, the hottest reviews, great deals and helpful tips.
5. Creative writing
Prompt: “Write the opening paragraph of a dystopian novel where people must pay to breathe fresh air. Then, pitch the plot in one sentence.”
GPT-4 lacked cohesion between the elements and lacks originality.
GPT-5 directly set up the pitch and is clearer with higher stakes.
Winner: GPT-5 wins for a response that is tighter, more original and emotionally sharper. Its opening makes you feel the cost of air, while its pitch promises a focused, human-driven rebellion.
6. Coding task for beginners
Prompt: “I want to make a simple website that says “Hello, world!” with a pink background and a fun font. Can you write the HTML and CSS and explain each line to a beginner?”
GPT-4 overcomplicated fonts. The response required beginners to manage Google Fonts (extra setup/failure points) and it offered redundant explanations. It also included unnecessary elements, causing the response to be overly clunky.
GPT-5 created a code that works immediately when saved as index.html; no web connection or extra files needed. Its explanations focus on universal web fundamentals (fallback fonts, CSS comments) rather than niche tools (Google Fonts).
Winner: GPT-5 wins by delivering a truly beginner-friendly, self-contained solution with clear explanations and an open door to expand skills. Its response is the equivalent of a patient teacher.
7. Memory + personalization
Prompt: “You remember I love sci-fi, hate long emails and have ADHD. Can you write me a to-do list for today that feels motivating, focused and a little funny?”
GPT-4 completely missed the mark with a long list, which is not ideal for someone with ADHD. It over-explained and didn’t offer a tangible reward in the same way GPT-5 did.
GPT-5 delivered a concise and doable list. It addressed my recurring needs and truly accommodated ADHD. It turned motivation into a very reusable system.
Winner: GPT-5 wins for nailing the design and theme with an actionable template.
Overall winner: GPT-5
Across every category, GPT-5 proved more adaptable, authentic and grounded in the real world. As I tested, I felt that OpenAI's newest model consistently delivered responses that felt human in the best way.
Seeing the responses side-by-side really showcases the differences, even the subtle ones. The speed with which GPT-5 answered was also significantly faster.
The model better anticipated needs, matched tone to context and offered solutions that felt lived-in rather than generated.
Follow Tom's Guide on Google News to get our up-to-date news, how-tos, and reviews in your feeds. Make sure to click the Follow button.
More from Tom's Guide
- ChatGPT-5 users are not impressed — here's why it 'feels like a downgrade'
- I’m a ChatGPT power user — these are the ChatGPT-5 upgrades that I plan on using the most
- Alexa+ accidentally ruined my kid’s birthday — here's how to stop it from happening to you













Amanda Caswell is an award-winning journalist, bestselling YA author, and one of today’s leading voices in AI and technology. A celebrated contributor to various news outlets, her sharp insights and relatable storytelling have earned her a loyal readership. Amanda’s work has been recognized with prestigious honors, including outstanding contribution to media.
Known for her ability to bring clarity to even the most complex topics, Amanda seamlessly blends innovation and creativity, inspiring readers to embrace the power of AI and emerging technologies. As a certified prompt engineer, she continues to push the boundaries of how humans and AI can work together.
Beyond her journalism career, Amanda is a bestselling author of science fiction books for young readers, where she channels her passion for storytelling into inspiring the next generation. A long-distance runner and mom of three, Amanda’s writing reflects her authenticity, natural curiosity, and heartfelt connection to everyday life — making her not just a journalist, but a trusted guide in the ever-evolving world of technology.
You must confirm your public display name before commenting
Please logout and then login again, you will then be prompted to enter your display name.