Benchmark

Speech Emotion Recognition results

Validation and test accuracy for 11 pretrained speech models, each fine-tuned per language. Filter by language family or model to compare.

≥ 70% 50–70% < 50%

Best model per language

Across 29 benchmarked languages, 11 pretrained speech models and 363 fine-tuning runs — the highest test accuracy reached for each language.

LanguageTop modelDatasetBest test accuracy
Afrikaans afwhisper-smallAfrikaansEmotionalSpeechCorpus71.0%
Amharic amwhisper-smallASED90.0%
Arabic arwhisper-largeANAD82.0%
Bengali bnwhisper-largeSUBESCO89.0%
Chinese zhwhisper-largeESD98.0%
English enhubert-base-ls960IESC82.0%
French frwavlm-largeCaFE70.0%
German dewhisper-largeEmoDB86.0%
Greek elhubert-base-ls960AESDD75.0%
Hindi hiwhisper-smallHindi-Dataset71.0%
Hungarian huhubert-base-ls960HungarianEmotionalSpeechCorpus52.0%
Indonesian idhubert-base-ls960IndoWaveSentiment99.0%
Italian itwhisper-largeEmozionalmente80.0%
Japanese jawhisper-largeJVNV87.0%
Kannada knwavlm-largeKannada-Dataset48.0%
Kazakh kkwhisper-largeKazakhEmotionalTTS65.0%
Korean kowhisper-smallKESDy1881.0%
Odia orwavlm-largeSITB-OSED83.0%
Persian fawhisper-smallSHEMO83.0%
Polish plwhisper-largenEMO87.0%
Portuguese ptwhisper-smallemoUERJ91.0%
Quechua quwhisper-largeQuechua-Collao-Corpus83.0%
Russian ruhubert-base-ls960RESD51.0%
Spanish eswhisper-smallMESD68.0%
Swahili swwavlm-base-plusSwahili-Dataset83.0%
Tamil tawhisper-smallEmoTa44.0%
Telugu tehubert-base-ls960Telugu-Dataset53.0%
Turkish trwhisper-largeTurEV-DB84.0%
Urdu urwav2vec2-baseUrdu-Dataset88.0%

Full results

Every model on every benchmarked dataset. Use the filters above to narrow by language family or model.