agentlans/GIST-small-c4-en-industry_corpus2
Text Classification ⢠33.4M ⢠Updated
I'm not working on low-resource African languages but this method sounds interesting.
So you put the orthography, grammar, and vocabulary in the prompt and then get the LLM to translate a language that it doesn't know. Clever!
Then once you have enough native speaker-verified Sango-French translations, you can bootstrap it to a full-fledged dataset...