π Try It Live (Google Colab Demo)
You can test the Truth-Zeeker AI model directly in Google Colab using the link below. This demo notebook automatically loads the model from Hugging Face and runs inference on a small pseudonymized Zeek dataset.
π Demo Outputs
Sample visualization:
This chart shows the top anomalous hosts detected by Truth-Zeeker AI on a
pseudonymized VLAN dataset (for demonstration only).
π§ Model and Data
- Model:
model_20251020.joblib - Demo CSV:
zeek_features_for_training_pseudo.csv
These files are hosted on Hugging Face under the repositorydr-rakshith-truth-zeeker/truth-zeeker-ai-demo
π§© Recent Update (2025-10-28)
A new trained variant β isoforest_and_scaler_20251029TXXXXXXZ.joblib β has been uploaded.
This version was generated from the latest VLAN DocNet-sanitized captures, using the unified Zeek β ML pipeline under controlled offline conditions.
It extends the baseline model (model_20251020.joblib) by:
- Incorporating richer Zeek features extracted from real benign network flows
- Maintaining strict anonymization (RFC 5737 DocNet addresses)
- Improving consistency across future SageMaker and Security Onion Lite training experiments
π Note: All datasets and captures used remain fully sanitized and pseudonymized for educational and research purposes only.
πΉ Latest Update β October 29, 2025
Model: model_docnet_20251029T131457Z.joblib
Dataset: vlan_docnet_outputs/host_features_with_scores.csv (DocNet-anonymized Zeek capture)
Platform: Google Colab (Free Tier)
Frameworks: pandas Β· scikit-learn Β· joblib
Pipeline: StandardScaler + IsolationForest
This version was trained directly on DocNet-anonymized Zeek outputs, improving feature diversity and better representing realistic VLAN traffic patterns.
All files have been validated as sanitized before upload.
β‘οΈ This update completes the first open, reproducible model training cycle for Truth-Zeeker AI.
Future training (v1.0.3 and beyond) will explore longer runs on open-source compute environments or cloud frameworks such as SageMaker or Kaggle.
π§ VLAN DocNet Model Visualization
π Visualization of anomaly scores generated by the model_docnet_20251029T131457Z.joblib pipeline.
This vertical plot represents results from the DocNet-anonymized VLAN capture dataset.
Differences in orientation and scaling are intentional β they highlight the updated feature distribution and processing flow introduced in the new training pipeline.
ποΈ Changelog:
This update marks the first DocNet-trained version of Truth-Zeeker AI, introducing VLAN-level anonymized datasets and a refined Isolation Forest pipeline for cleaner feature scaling and anomaly visualization.
The model (model_docnet_20251029T131457Z.joblib) and corresponding outputs were generated via the latest Colab training workflow and uploaded directly to this repository.
Truth-Zeeker AI β Model Card (demo)
Overview
Small demonstration model for the Truth-Zeeker AI pipeline.
This repo contains a tiny synthetic dataset and a demo script that trains/loads a minimal model and shows predictions.
Intended use
Educational / research demo only. Not for production. Use only with sanitized or synthetic data.
Model details
- Algorithm (demo): IsolationForest (scikit-learn) for anomaly scoring
- Input features: duration, orig_bytes, resp_bytes
- Output: anomaly score / binary flag
Limitations
- Demo model is trained on synthetic data and is not validated on real traffic.
- Do not use with real PHI/PII or production network environments.
License
MIT

