--- license: cc-by-4.0 language: - he base_model: - GiliGold/Knesset-DictaBERT pipeline_tag: text-classification tags: - checkworthiness - factuality - worth - checking - worth-checking - checkable --- This model is based on [Knesset-dictaBERT](https://huggingface.co/GiliGold/Knesset-DictaBERT) and was trained to classify a Hebrew sentence for checkworthiness. The possible values are: *worth checking*, *not worth checking* , or *not a factual proposition* It was trained on a train-set of ~5000 manually annotated sentences from the [Knesset Corpus](https://huggingface.co/datasets/HaifaCLGroup/KnessetCorpus). The train set is available [here](https://github.com/HaifaCLG/Factuality). The Knesset Corpus automatically annotated for checkworthiness by [knesset-dicta-checkworthiness](https://huggingface.co/GiliGold/knesset-dicta-checkworthiness) is available [here](https://huggingface.co/datasets/GiliGold/Knesset_check_worthiness) Paper: [ArXiv paper](https://arxiv.org/abs/2509.26406) - Citation: @InProceedings{goldin-EtAl:2025:RANLP, author = {Goldin, Gili and Wigderson, Shira and Rabinovich, Ella and Wintner, Shuly}, title = {An Annotation Scheme for Factuality and Its Application to Parliamentary Proceedings}, booktitle = {Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI era}, month = {September}, year = {2025}, address = {Varna, Bulgaria}, publisher = {INCOMA Ltd., Shoumen, Bulgaria}, pages = {403--412}, abstract = {Factuality assesses the extent to which a language utterance relates to real-world information; it determines whether utterances correspond to facts, possibilities, or imaginary situations, and as such, it is instrumental for fact checking. Factuality is a complex notion that relies on multiple linguistic signals, and has been studied in various disciplines. We present a complex, multi-faceted annotation scheme of factuality that combines concepts from a variety of previous works. We developed the scheme for Hebrew, but we trust that it can be adapted to other languages. We also present a set of almost 5,000 sentences in the domain of parliamentary discourse that we manually annotated according to this scheme. We report on inter-annotator agreement, and experiment with various approaches to automatically predict (some features of) the scheme, in order to extend the annotation to a large corpus.}, url = {https://aclanthology.org/2025.ranlp-1.49} }