toxicity_results
This model is a fine-tuned version of readerbench/RoBERT-base on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.4871
- F1 Score: 0.7890
Model description
This model is a binary toxicity classifier (Toxic vs. Non-Toxic) for the Romanian language, fine-tuned as a Master's Project in Data Science. The model specializes in analyzing abusive language (insults, harassment, denigration) commonly found in Romanian political comments online.
The model is built upon the RoBERT-base architecture (readerbench/RoBERT-base), utilizing Transformer networks to understand the deep semantic context of the text.
Intended uses & limitations
Intended Uses
The model is developed for advanced academic research and serves as a tool for quantifiable social good.
- Quantification of Social Harm (Primary Value): The RoBERT model acts as a diagnostic tool to measure and quantify the level of abusive language and hostility in online environments. This supports data-driven intervention against toxicity.
- Digital Detox & Education: It is a core technical component for educational projects and awareness campaigns (e.g., your NGO/blog platform) focused on promoting ethical technology use and combating digital addiction by creating safer online spaces.
- Real-Time Application: It provides a reliable, superior solution for real-time content moderation in Romanian comment sections.
Limitations and Ethical Considerations
- Labeling Bias: The dataset was manually labeled by a single expert, introducing an inherent subjectivity into the exact definition of toxicity.
- Generalization: Performance is optimized for informal political comments in Romanian; accuracy may decrease when applied to entirely different domains (e.g., professional documents or technical forums).
- Ethics: The solution is not intended for political censorship but strictly for identifying actionable abusive language (harassment, insults). Its final purpose is to promote higher standards of civic discourse and support educational interventions.
Training and evaluation data
1. Context and Sourcing
- Platform: YouTube (Comment Sections).
- Topic: 2024 Presidential Candidacy (Romanian Political Comments).
- Source Channels: Pro TV, Digi24, Recorder, Antena 3 CNN, etc.
2. Data Details
- Total Volume: 500 custom-labeled texts (final evaluation set).
- Structure: The dataset is perfectly balanced (50% Toxic / 50% Non-Toxic).
- Splitting: 80% Training (400 texts) / 20% Testing (100 texts), stratified.
3. Annotation Guideline
Toxic (Label 1) means: Clear Profanity, Threats, Harassment (Bullying) or Aggressive Insults (attacks on character/competence/physical appearance). Non-Toxic (Label 0) is any other background critique without abusive language.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 3
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss | F1 Score |
|---|---|---|---|---|
| No log | 1.0 | 25 | 0.5849 | 0.6990 |
| 0.5722 | 2.0 | 50 | 0.5072 | 0.7473 |
| 0.5722 | 3.0 | 75 | 0.4871 | 0.7890 |
Framework versions
- Transformers 4.57.1
- Pytorch 2.9.0+cu126
- Datasets 4.4.1
- Tokenizers 0.22.1
- Downloads last month
- 60
Model tree for olimpia20/toxicity_results
Base model
readerbench/RoBERT-base