Microsoft Unveils Phi-3-Vision AI Model with 4.2 Billion Parameters

Discover Microsoft’s new Phi-3-Vision AI model, featuring advanced visual and text recognition with 4.2 billion parameters for seamless mobile platform performance.

On May 26th, Microsoft announced the latest addition to its small language model (SLM) family, the Phi-3-Vision. This model emphasizes “visual capabilities,” enabling it to understand and interpret both images and text, while being optimized for efficient performance on mobile platforms.

The Phi-3-Vision is the first multimodal model in Microsoft’s Phi-3 lineup. It incorporates the text-understanding capabilities of the Phi-3-mini, maintaining its lightweight characteristics suitable for mobile and embedded platforms.

phi 3 vision model

The model has 4.2 billion parameters, making it larger than the Phi-3-mini (3.8B) but smaller than the Phi-3-small (7B), with a context length of 128k tokens. Training took place from February to April 2024.

A notable feature of the Phi-3-Vision model, as its name suggests, is its support for “image-text recognition capabilities.” It is claimed to understand the meanings of real-world images and quickly extract text from them.

Microsoft highlights the model’s suitability for office environments, with developers enhancing its ability to understand charts and block diagrams. The model can infer and draw conclusions from user inputs, offering strategic advice for businesses, with performance reportedly on par with larger models.

During training, the Phi-3-Vision was exposed to diverse data, including educational materials, code, annotated images, real-world knowledge, chart images, and chat formats, ensuring a variety of inputs. Microsoft assures that the training data is traceable and contains no personal information.

phi 3 vision 128k benchmarks

Microsoft compared Phi-3-Vision’s performance against ByteDance’s Llama3-Llava-Next (8B), the Microsoft Research and University of Wisconsin, Columbia University’s LlaVA-1.6 (7B), and Alibaba’s QWEN-VL-Chat models, demonstrating superior performance in several areas.

The Phi-3-Vision model is now available on Hugging Face for interested users.

Learn more about  Xiaomi POCO F5 Indian Variant Appears on Geekbench With Snapdragon 7+ Gen 2 Chip

Keep visiting for more such awesome posts, internet tips, lifestyle tips, and remember we cover,
“Everything under the Sun!”

inspire2rise 2024 refresh

Follow Inspire2rise on Twitter. | Follow Inspire2rise on Facebook. | Follow Inspire2rise on YouTube

A high school student deeply passionate about digital marketing, an adventurous trekker, and a dedicated explorer of specialized internet topics.


Microsoft Unveils Phi-3-Vision AI Model with 4.2 Billion Parameters

Leave a Comment

Discover more from Inspire2Rise

Subscribe now to keep reading and get access to the full archive.

Continue reading