Speech Data Selected Opensource speech data MLCommons/peoples_speech Viewer • Updated Nov 20, 2024 • 8.05M • 44.8k • 166 speechcolab/gigaspeech Viewer • Updated Nov 23, 2023 • 364k • 19.4k • 138 keithito/lj_speech Updated Aug 14, 2024 • 1.1k • 57 legacy-datasets/common_voice Updated Aug 22, 2024 • 4.47k • 141
text-pretrain-data some pretrain dataset for LLM allenai/MADLAD-400 Updated Sep 9, 2024 • 37.4k • 152 CASIA-LM/ChineseWebText Viewer • Updated Nov 13, 2023 • 1k • 23.8k • 42 allenai/dolma Updated Apr 17, 2024 • 1.64k • 962 allenai/peS2o Updated Oct 13, 2024 • 2.53k • 184
Speech Data Selected Opensource speech data MLCommons/peoples_speech Viewer • Updated Nov 20, 2024 • 8.05M • 44.8k • 166 speechcolab/gigaspeech Viewer • Updated Nov 23, 2023 • 364k • 19.4k • 138 keithito/lj_speech Updated Aug 14, 2024 • 1.1k • 57 legacy-datasets/common_voice Updated Aug 22, 2024 • 4.47k • 141
text-pretrain-data some pretrain dataset for LLM allenai/MADLAD-400 Updated Sep 9, 2024 • 37.4k • 152 CASIA-LM/ChineseWebText Viewer • Updated Nov 13, 2023 • 1k • 23.8k • 42 allenai/dolma Updated Apr 17, 2024 • 1.64k • 962 allenai/peS2o Updated Oct 13, 2024 • 2.53k • 184