ICONN 1 Training Data

Community Article Published June 19, 2025

ICONN 1 was trained on a diverse collection of datasets, without which it would not have been possible to build this model. Key sources include:

  • nkandpa2/cccc_all_domains
    Processed into question-answer (QA) pairs for effective training.

  • open-thoughts/OpenThoughts3-1.2M
    A comprehensive collection of open-source datasets.

  • Snippets from HuggingFaceFW/fineweb
    Curated content licensed under Creative Commons.

...and many more!

We extend our sincere gratitude to all dataset creators who either developed these datasets or formatted them into QA pairs.


Note: All of ICONN 1's training data is fully open-source.
If you believe any dataset included does not comply with open-source standards, please contact us immediately.

Community

Sign up or log in to comment