Skip to main content

Microsoft’s new tiny language model can read images — here’s what you can use it for

Microsoft image for Phi-3 language model
(Image credit: Microsoft)

During Build 2024, Microsoft announced a new version of the company’s small language AI model, Phi-3, which is capable of analyzing images and telling users what’s in them.

The new version, Phi-3-vision, is a multimodal model. For those unaware, especially with OpenAI’s GPT-4o and Google’s updates to Gemini, a multimodal model means that the AI tool can read text and images. 

More from Tom's Guide

Scott Younker
West Coast Reporter

Scott Younker is the West Coast Reporter at Tom’s Guide. He covers all the lastest tech news. He’s been involved in tech since 2011 at various outlets and is on an ongoing hunt to build the easiest to use home media system. When not writing about the latest devices, you are more than welcome to discuss board games or disc golf with him. He also handles all the Connections coverage on Tom's Guide and has been playing the addictive NYT game since it released.