Model WarsMarch 31, 2026via The Decoder

Qwen3.5-Omni learned to write code from spoken instructions and video without anyone training it to

Why it matters

This represents a significant leap in multimodal AI capabilities, with emergent coding abilities suggesting we're approaching more general AI systems that could reshape how developers interact with code.

Key signals

Beats Gemini 3.1 Pro on audio tasks
Processes 4 modalities: text, images, audio, and video
Emerged coding ability from speech/video without specific training

The hook

Alibaba's Qwen3.5-Omni just beat Gemini 3.1 Pro on audio tasks — and learned to code from speech without being trained for it.

Alibaba has released Qwen3.5-Omni, an omnimodal AI model that processes text, images, audio, and video. It claims to beat Gemini 3.1 Pro on audio tasks and picked up an unexpected trick along the way: writing code from spoken instructions and video input. The article Qwen3.5-Omni learned to write code from spoken instructions and video without anyone training it to appeared first on The Decoder.

Read full story on The Decoder

Relevance score:78/100

Qwen3.5-Omni learned to write code from spoken instructions and video without anyone training it to

Get stories like this every Friday.