I am proud to be one of 3,000 humans who built Aya – a new massively multilingual, generative LLM that outperforms existing open-source models and covers 101 different languages. Check out Aya here, including the dataset, model, two new papers on arXiv and a nice documentary on the entire process.

https://cohere.com/research/aya

 

The paper here details exactly how we put together the dataset and relied on communities of speakers in 101 different languages around the world. Submitted to Association of Computational Linguistics, 2024.