ICONN 1 Training Data

Community Article Published June 19, 2025

ICONN 1 was trained on a diverse collection of datasets, without which it would not have been possible to build this model. Key sources include:

nkandpa2/cccc_all_domains
Processed into question-answer (QA) pairs for effective training.
open-thoughts/OpenThoughts3-1.2M
A comprehensive collection of open-source datasets.
Snippets from HuggingFaceFW/fineweb
Curated content licensed under Creative Commons.

...and many more!

We extend our sincere gratitude to all dataset creators who either developed these datasets or formatted them into QA pairs.

Note: All of ICONN 1's training data is fully open-source.
If you believe any dataset included does not comply with open-source standards, please contact us immediately.

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote