Validation and test accuracy for 11 pretrained speech models, each fine-tuned per language. Filter by language family or model to compare.
Across 29 benchmarked languages, 11 pretrained speech models and 363 fine-tuning runs — the highest test accuracy reached for each language.
| Language | Top model | Dataset | Best test accuracy |
|---|---|---|---|
| Afrikaans af | whisper-small | AfrikaansEmotionalSpeechCorpus | 71.0% |
| Amharic am | whisper-small | ASED | 90.0% |
| Arabic ar | whisper-large | ANAD | 82.0% |
| Bengali bn | whisper-large | SUBESCO | 89.0% |
| Chinese zh | whisper-large | ESD | 98.0% |
| English en | hubert-base-ls960 | IESC | 82.0% |
| French fr | wavlm-large | CaFE | 70.0% |
| German de | whisper-large | EmoDB | 86.0% |
| Greek el | hubert-base-ls960 | AESDD | 75.0% |
| Hindi hi | whisper-small | Hindi-Dataset | 71.0% |
| Hungarian hu | hubert-base-ls960 | HungarianEmotionalSpeechCorpus | 52.0% |
| Indonesian id | hubert-base-ls960 | IndoWaveSentiment | 99.0% |
| Italian it | whisper-large | Emozionalmente | 80.0% |
| Japanese ja | whisper-large | JVNV | 87.0% |
| Kannada kn | wavlm-large | Kannada-Dataset | 48.0% |
| Kazakh kk | whisper-large | KazakhEmotionalTTS | 65.0% |
| Korean ko | whisper-small | KESDy18 | 81.0% |
| Odia or | wavlm-large | SITB-OSED | 83.0% |
| Persian fa | whisper-small | SHEMO | 83.0% |
| Polish pl | whisper-large | nEMO | 87.0% |
| Portuguese pt | whisper-small | emoUERJ | 91.0% |
| Quechua qu | whisper-large | Quechua-Collao-Corpus | 83.0% |
| Russian ru | hubert-base-ls960 | RESD | 51.0% |
| Spanish es | whisper-small | MESD | 68.0% |
| Swahili sw | wavlm-base-plus | Swahili-Dataset | 83.0% |
| Tamil ta | whisper-small | EmoTa | 44.0% |
| Telugu te | hubert-base-ls960 | Telugu-Dataset | 53.0% |
| Turkish tr | whisper-large | TurEV-DB | 84.0% |
| Urdu ur | wav2vec2-base | Urdu-Dataset | 88.0% |
Every model on every benchmarked dataset. Use the filters above to narrow by language family or model.