Microsoft reveals Phi-3.5 — this new small AI model outperforms Gemini and GPT-4o

The Microsoft logo on a sign at the company's Redmond, Washington, headquarters.
(Image credit: VDB Photos/Shutterstock)

Microsoft has published the latest version of its small language model Phi-3.5. This new version is a big upgrade on the previous generation, beating smaller models from leading players like Google, OpenAI, Mistral, and Meta on several important metrics.

Phi-3.5 comes in 3.8 billion, 4.15 billion, and 41.9 billion parameter versions. All three are available to download for free and can be run using a local tool like Ollama.

It performed particularly well at reasoning, only being beaten by GPT-4o-mini out of the leading small models. It also did well on math benchmarks, significantly passing Llama and Gemini.

Small language models like Phi-3.5 demonstrate efficiency improvements in AI and add credence to OpenAI CEO Sam Altman's goal of creating intelligence too cheap to meter.

What’s new in Phi-3.5

Phi-3.5 comes in a vision model version that can understand images and not just text, as well as a mixture of expert models to split learning tasks across different sub-networks for more efficient processing.

The mixture of expert models beats Gemini Flash 1.5, which is the model used in the free version of the Gemini chatbot on multiple benchmarks and has a large 128k context window. While this is significantly smaller than Gemini itself, it is equal to ChatGPT and Claude.

The main benefit of a very small model like the one I installed is that it could be bundled with an application or even installed on an Internet of Things device such as a smart doorbell. This would allow for facial recognition without sending data to the cloud.

The smallest model was trained on 3.4 trillion tokens of data using 512 Nvidia H100 GPUs over 10 days. The mixture of expert models comprised 16 3.8b parameter models, used 4.9 trillion tokens and took 23 days to train.

How well does Phi-3.5 actually work?

I installed and ran the smaller 3.8 billion parameter version of Phi-3.5 on my laptop and found it less impressive than the benchmarks suggest. While it was verbose in its responses, often the phrasing left a lot to be desired, and it struggled with some simple tests.

I asked it a classic: “Write a short one-sentence story where the first letter of a word is the same as the last letter of the previous word.” Even after clarification, it failed spectacularly.

I haven’t tried the larger mixture of expert models. However, I’m told that judging by the benchmarks, it solves some of the issues with the version of the model I tried. The benchmarks suggest its output will be of similar quality to OpenAI’s GPT-4o-mini, the version that comes with the free version of ChatGPT.

One area that seems to outperform GPT-4o-mini above others is in STEM and social sciences areas. Its architecture allows it to maintain efficiency while managing complex AI tasks in different languages.

More from Tom's Guide

Category
Arrow
Arrow
Back to MacBook Air
Brand
Arrow
Processor
Arrow
RAM
Arrow
Storage Size
Arrow
Screen Size
Arrow
Colour
Arrow
Storage Type
Arrow
Condition
Arrow
Price
Arrow
Any Price
Showing 10 of 70 deals
Filters
Arrow
Show more
Ryan Morrison
AI Editor

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?

Read more
llama3.3B
The new Llama 3.3 70B model has just dropped — here’s why it’s a big deal
chatgpt logo on phone and blurred image of Sam Altman
OpenAI just released 03-mini to fight DeepSeek — the first 'reasoning model' that's free in ChatGPT
OpenAI logo on phone sitting on top of laptop keyboard
OpenAI unveils o3 and o3 mini — here’s why these 'reasoning' models are a giant leap
ChatGPT app icon on mobile device
ChatGPT 4.5 — 5 big upgrades you need to know
OpenAI logo
OpenAI ChatGPT-4.5 is here and it's the most human-like chatbot yet — here's how to try it
OpenAI logo on phone sitting on top of laptop keyboard
OpenAI’s ‘o3-mini’ is free for all users — what you need to know
Latest in AI
A young woman eats chocolate before bed
I used Gemini deep research to explore the history of Pringles and other snack foods — here’s what I learned
Sam Altman
ChatGPT-4.5 delayed in surprise announcement — and it could launch with a controversial new payment model
AI Mode of google search
Google launches 'AI Mode' for search — here's how to try it now
Project Astra AI agent
Project Astra — everything you need to know about Google's next-gen smart glasses and new AI assistant
The new Gemini app home page vs the old
Forget ChatGPT — Google Gemini can now see the world with live video and screen-sharing
iOS 18 logo on iPhone in person's lap
iOS 18.5 is coming soon with huge Siri upgrades — here’s everything to expect
Latest in News
Nvidia GeForce RTX 5070 Ti
RTX 5070 can't match RTX 4090 performance in new benchmark — despite Nvidia's claims
GTA 6
GTA 6 may sell for an unprecedented price — here's how much it could cost
iPad Air M3
iPad Air M3 and iPad 11 surprisingly miss out on Apple's C1 modem
Meghan Markle in kitchen for With Love, Meghan
Everybody's noticing this one kitchen item in Meghan Markle's new Netflix show — and it's on sale at Amazon right now
MacBook Air M4
I just saw the sky blue MacBook Air M4, and I’m obsessed — it's the best color Apple has ever done
Megan Fox and Bruce Willis in "Midnight in the Switchgrass" (2021)
Netflix just got a crime thriller movie that’s already crashed the top 10 — but there’s a problem
  • Araki
    Nothing's exciting about it. It is censored beyond repair, even finetuning can't fix it. The previous iteration straight-up did not contain any unsafe content in the dataset, consisting of toddler-friendly AI-generated textbooks. For this iteration, I'm certain they went even further and yet again wasted Earth's energy resources on training another set of infantile LLMs.
    Reply