Pretrain Datasets Collection Datasets we use for pretraining large language models • 12 items • Updated Oct 2