🇹🇷 Turkish Gibberish Sentence Detection (Fine-Tuned)

This model detects whether a given Turkish text is clean or gibberish.

How to Get Started with the Model

from transformers import pipeline

pipe = pipeline("text-classification", model="yeniguno/turkish-gibberish-detection-ft")

examples = [
    "bugün hava çok güzel, dışarı çıkalım mı?",
    "asdfghjk qwe!!! 🙃🙃🙃",
    "bgn asdqwe güzel qqqqqqqqqq"
]

for text in examples:
    print(text, "->", pipe(text)[0])

Model Details

Model Description

Base model: TURKCELL/gibberish-sentence-detection-model-tr
Language: Turkish
Task: Binary Text Classification (Gibberish Detection)
Labels:
- 0 → ok — meaningful Turkish text
- 1 → gibberish — meaningless or noisy text (nonsense, random keyboard input, malformed words)

Uses

This model is designed to be used in LLM guardrail systems as an input quality scanner.
Since LLM inference is computationally and financially expensive, it is inefficient to process meaningless or malformed text.

By running this model before sending user input to an LLM, you can automatically detect and filter gibberish or nonsensical text — preventing unnecessary API calls and improving overall system efficiency.

Typical use cases include:

Pre-filtering user messages in chatbots or virtual assistants
Guardrail modules in enterprise LLM applications
Quality control for large-scale text ingestion pipelines
Spam and noise detection in user-generated content

If the input is classified as gibberish, it can be safely discarded or handled separately without invoking the LLM.

Training Details

Training Data

Dataset: yeniguno/turkish-gibberish-detection

Label	Count	Description
0 (ok)	651,431	valid, meaningful Turkish text
1 (gibberish)	699,999	random keyboard strings, misspelled or malformed text

All samples are lowercased and cleaned, with no newline or tab characters.

Evaluation

Split	Accuracy	Macro-F1	F1(ok)	F1(gibberish)
Base model	0.6257	0.6254	0.61	0.64
Fine-tuned model	0.7369	0.7340	0.76	0.71

Test set size: 202,669 sentences
Evaluation metrics: Accuracy, Macro-F1, per-class Precision/Recall/F1

Downloads last month: 7

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for yeniguno/turkish-gibberish-detection-ft

Base model

TURKCELL/gibberish-sentence-detection-model-tr

Finetuned

(1)

this model

yeniguno
/

turkish-gibberish-detection-ft