Model WarsMarch 31, 2026via The Decoder
Qwen3.5-Omni learned to write code from spoken instructions and video without anyone training it to
Why it matters
This represents a significant leap in multimodal AI capabilities, with emergent coding abilities suggesting we're approaching more general AI systems that could reshape how developers interact with code.
Key signals
- Beats Gemini 3.1 Pro on audio tasks
- Processes 4 modalities: text, images, audio, and video
- Emerged coding ability from speech/video without specific training
The hook
Alibaba's Qwen3.5-Omni just beat Gemini 3.1 Pro on audio tasks — and learned to code from speech without being trained for it.
Alibaba has released Qwen3.5-Omni, an omnimodal AI model that processes text, images, audio, and video. It claims to beat Gemini 3.1 Pro on audio tasks and picked up an unexpected trick along the way: writing code from spoken instructions and video input.
The article Qwen3.5-Omni learned to write code from spoken instructions and video without anyone training it to appeared first on The Decoder.
Relevance score:78/100