inclusionAI/Ming-Lite-Omni · Inquiry about Future Plans for Releasing Ming-Omni Training Data

Hello Ming community,

First of all, thank you for open-sourcing the Ming-Omni model and sharing the model weights and inference code. The unified multimodal capabilities across images, text, audio, and video are truly impressive and represent a significant advancement in open-source AI.

I noticed from the official announcements and repositories that while the model weights and inference code for Ming-lite-omni have been made publicly available, the training code and datasets are planned to be released in subsequent stages. Could you please share if there is a concrete timeline or roadmap for when the training data for Ming-Omni will be made public?

Having access to the training datasets would be invaluable for researchers and developers aiming to better understand the model’s training process, reproduce results, and further innovate on top of Ming-Omni.

Thank you in advance for any updates or insights you can provide!

Best regards,
vegeta

Hi vegeta,

The training code is undergoing significant changes for performance optimization, aside from which, some internal toolchain dependencies need to be resolved. Once the code is stabilized and decoupled, we will release it.

The release of the training data is complicated by its large volume and legal constraints, which necessitate a staggered release strategy. To address this challenge, we will incrementally release the non-controversial components of the dataset.

Currently, we are unable to provide a specific timeline for the release due to the aforementioned complexities, but we are actively working to resolve them. We appreciate your understanding , and will provide regular updates on our progress as we work to resolve these complexities.