Model WarsMarch 31, 2026via The Decoder

Qwen3.5-Omni learned to write code from spoken instructions and video without anyone training it to

Why it matters

This represents a significant leap in multimodal AI capabilities, with emergent coding abilities suggesting we're approaching more general AI systems that could reshape how developers interact with code.

Key signals

  • Beats Gemini 3.1 Pro on audio tasks
  • Processes 4 modalities: text, images, audio, and video
  • Emerged coding ability from speech/video without specific training

The hook

Alibaba's Qwen3.5-Omni just beat Gemini 3.1 Pro on audio tasks — and learned to code from speech without being trained for it.

Alibaba has released Qwen3.5-Omni, an omnimodal AI model that processes text, images, audio, and video. It claims to beat Gemini 3.1 Pro on audio tasks and picked up an unexpected trick along the way: writing code from spoken instructions and video input. The article Qwen3.5-Omni learned to write code from spoken instructions and video without anyone training it to appeared first on The Decoder.
Relevance score:78/100

Get stories like this every Friday.

The 5 AI stories that matter — free, in your inbox.

Free forever. No spam.