Model WarsMarch 4, 2026via Microsoft Research

Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model

Why it matters

Microsoft's release of a 15B open-weight multimodal reasoning model signals a major shift toward reasoning-capable vision models at accessible scale, challenging the assumption that advanced multimodal reasoning requires massive closed models.

Key signals

  • Phi-4-reasoning-vision-15B: 15 billion parameters
  • Multimodal reasoning model (vision + language)
  • Open-weight distribution via Microsoft Foundry, HuggingFace, GitHub
  • Positioned for image captioning and vision-language reasoning tasks
  • Published March 4, 2026 from Microsoft Research

The hook

Microsoft just shipped Phi-4-reasoning-vision-15B. A 15B multimodal reasoning model that rewrites what's possible at that scale.

We are pleased to announce Phi-4-reasoning-vision-15B, a 15 billion parameter open‑weight multimodal reasoning model, available through Microsoft Foundry (opens in new tab), HuggingFace (opens in new tab) and GitHub (opens in new tab). Phi-4-reasoning-vision-15B is a broadly capable model that can be used for a wide array of vision-language tasks such as image captioning, asking […] The post Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model appeared first on Microsoft Research.
Relevance score:78/100

Get stories like this every Friday.

The 5 AI stories that matter — free, in your inbox.

Free forever. No spam.

Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model | KeyNews.AI