Source Control and Data Validation for Web-Driven Dataset Creation

#9
by SlitherCode - opened

It would be great to have the ability to restrict web searches to specific sites or URLs when generating datasets with the LLM, ensuring the data comes from trusted and relevant sources. Along with that, incorporating prompt-aware validation with strict typing—similar to Pydantic’s support for character limits, value ranges, and data formats—would help maintain data quality and consistency before it’s added to the dataset.

Hugging Face Sheets org

Hi, yes, we've been discussing adding a layer of user control and freedom to the search feature. Thank you for sharing.

Sign up or log in to comment