StabilityAI launches new AI chatbot — this one runs on your laptop and generates text faster than you can read

StabilityAI released Zephyr with this AI generated promotional image
(Image credit: StabilityAI)

The latest AI model from Stable Diffusion maker StabilityAI is a lightweight chatbot that can run locally and takes a few minutes to get up and running. 

Built on the StableLM compact large language model specializing in sentence auto-completion, Zephyr has just three billion parameters, allowing accurate responses without high-end hardware.

This is part of a growing trend to move away from massive AI models requiring large data centers and the fastest GPUs. Running those models is expensive and the performance of smaller models, thanks to fine-tuning and efficiencies, is becoming “good enough”.

I tried it on a MacBook Air M2 and it was able to respond faster than I could read the responses. It runs at 40 tokens per second, which is about five times faster than the average reading speed.

What makes Zephyr different?

The latest small model from StabilityAI is a chatbot, tuned for instructions following question and answer tasks. Despite having fewer than half the parameters of many small models the company says it efficiently caters to a wide range of text generation needs.

It leans particularly well towards writing, roleplay, and responding to queries about the humanities subjects in evaluations. It has some reasoning and extraction capabilities but not on a level close to any of the similar small models. 

However, it can generate contextually relevant, coherent, and linguistically accurate text. I found that it was able to respond quickly, in natural language, and with answers not dissimilar to that of Bard or the free ChatGPT.

Why does this matter?

In a conversation with me on X, Emad Mostaque, founder and CEO of StabilityAI said he was surprised at the capabilities of Zephyr. Explaining that this is open source, early days, and better models are coming in the future that will only lead to improvements.

He is of the view that for the vast majority of tasks that require generative AI locally run, smaller models will be enough to get the job done. 

“The pace of innovation will increase with edge and once we hit GPT-4 level quality [in small models] and have consistent dataset improvement, why would you need more for 80 to 90% of tasks,” he explained. “Low energy reasoning engine with retrieval is super powerful.”

What happens next?

Running Zephyr on a MacBook Air M2

(Image credit: Future/Ryan Morrison)

The current license for Zephyr is for non-commercial and research purposes but StabilityAI do also offer commercial licences for companies wanting to integrate the AI into products. 

The smaller these models become the easier it will be to bundle them with an app or even in the future an operating system. Google released a version of its Gemini AI called Nano that runs on Android devices and can be used by developers in applications on the Pixel 8 Pro.

Apple recently released MLX, an open-source framework that makes installing AI models on Apple Silicon devices easier. If this becomes native to macOS in a future version, or even in iOS, then companies will be able to deploy AI tools without paying cloud computing fees.

More from Tom's Guide

Ryan Morrison
AI Editor

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?