A collection of datasets with multilingual data resources. Used as part of the BabyBabelLM initiatives.