May 19, 2020via Amazon Science

When unsupervised training pays off in natural-language processing

Why it matters

Critical insight for AI teams building language models: unsupervised tokenizers outperform at smaller vocabulary sizes, potentially reducing training costs and complexity for specific use cases.

Key signals

Unsupervised tokenizers perform better at smaller vocabulary sizes
Research conducted by Amazon Science team
Findings apply specifically to natural language processing applications

The hook

Amazon's research reveals when unsupervised training actually beats supervised methods in NLP.

At smaller vocabulary sizes, tokenizers trained on unannotated data work best.

Read full story on Amazon Science

Relevance score:75/100

When unsupervised training pays off in natural-language processing

Get stories like this every Friday.