Microsoft’s new tiny language model can read images — here’s what you can use it for

Microsoft image for Phi-3 language model
(Image credit: Microsoft)

During Build 2024, Microsoft announced a new version of the company’s small language AI model, Phi-3, which is capable of analyzing images and telling users what’s in them.

The new version, Phi-3-vision, is a multimodal model. For those unaware, especially with OpenAI’s GPT-4o and Google’s updates to Gemini, a multimodal model means that the AI tool can read text and images. 

Phi-3-vision is meant for use on mobile devices as it features a 4.2 billion-parameter model. An AI model’s parameters are a shorthand for understanding how complex a model is and how much of the training it receives it understands. Microsoft has been iterating the Phi model on previous versions. So, Phi-2, for example, learned from Phi-1 and grew with new capabilities, and Phi-3 is similar to Phi-2, trained on Phi-2 and added capabilities.

Phi-3-vision can perform general visual reasoning tasks, such as analyzing charts and images. Unlike other more well-known models, like OpenAI’s DALL-E, Phi-3-vision can only “read” an image; it cannot generate images. 

Microsoft has released several of these small AI models. They’re designed to run locally and on a wider range of devices than larger models like Google’s Gemini or even ChatGPT. No internet connection is required. They also reduce the computing power needed to run certain tasks, like solving math problems, as Microsoft’s small Orca-Math model does.

The first iteration of Phi-3 was announced in April when Microsoft released the tiny Phi-3-mini. In benchmark tests, it performed quite well against larger models like Meta’s Llama 2. The mini model has just 3.8 billion parameters. There are two other models, Phi-3-small and Phi-3-medium, which feature 7 billion parameters and 14 billion parameters, respectively. 

Phi-3-vision is available in preview right now. The three other Phi-3 models, Phi-3-mini, Phi-3-small and Phi-3-medium, are accessible via the Azure Machine Learning model catalog and collections. To utilize them, you’ll need a paid Azure account and Azure AI Studio hub. 

More from Tom's Guide

Scott Younker
West Coast Reporter

Scott Younker is the West Coast Reporter at Tom’s Guide. He covers all the lastest tech news. He’s been involved in tech since 2011 at various outlets and is on an ongoing hunt to build the easiest to use home media system. When not writing about the latest devices, you are more than welcome to discuss board games or disc golf with him.