🚀 Try It Live (Google Colab Demo)

You can test the Truth-Zeeker AI model directly in Google Colab using the link below. This demo notebook automatically loads the model from Hugging Face and runs inference on a small pseudonymized Zeek dataset.

📊 Demo Outputs

Sample visualization:
This chart shows the top anomalous hosts detected by Truth-Zeeker AI on a pseudonymized VLAN dataset (for demonstration only).

🧠 Model and Data

Model: model_20251020.joblib
Demo CSV: zeek_features_for_training_pseudo.csv

These files are hosted on Hugging Face under the repository
dr-rakshith-truth-zeeker/truth-zeeker-ai-demo

🧩 Recent Update (2025-10-28)

A new trained variant — isoforest_and_scaler_20251029TXXXXXXZ.joblib — has been uploaded.
This version was generated from the latest VLAN DocNet-sanitized captures, using the unified Zeek → ML pipeline under controlled offline conditions.

It extends the baseline model (model_20251020.joblib) by:

Incorporating richer Zeek features extracted from real benign network flows
Maintaining strict anonymization (RFC 5737 DocNet addresses)
Improving consistency across future SageMaker and Security Onion Lite training experiments

📘 Note: All datasets and captures used remain fully sanitized and pseudonymized for educational and research purposes only.

🔹 Latest Update — October 29, 2025

Model: model_docnet_20251029T131457Z.joblib
Dataset: vlan_docnet_outputs/host_features_with_scores.csv (DocNet-anonymized Zeek capture)
Platform: Google Colab (Free Tier)
Frameworks: pandas · scikit-learn · joblib
Pipeline: StandardScaler + IsolationForest

This version was trained directly on DocNet-anonymized Zeek outputs, improving feature diversity and better representing realistic VLAN traffic patterns.
All files have been validated as sanitized before upload.

➡️ This update completes the first open, reproducible model training cycle for Truth-Zeeker AI.
Future training (v1.0.3 and beyond) will explore longer runs on open-source compute environments or cloud frameworks such as SageMaker or Kaggle.

🧠 VLAN DocNet Model Visualization

📊 Visualization of anomaly scores generated by the model_docnet_20251029T131457Z.joblib pipeline.
This vertical plot represents results from the DocNet-anonymized VLAN capture dataset.
Differences in orientation and scaling are intentional — they highlight the updated feature distribution and processing flow introduced in the new training pipeline.

🗓️ Changelog:
This update marks the first DocNet-trained version of Truth-Zeeker AI, introducing VLAN-level anonymized datasets and a refined Isolation Forest pipeline for cleaner feature scaling and anomaly visualization.
The model (model_docnet_20251029T131457Z.joblib) and corresponding outputs were generated via the latest Colab training workflow and uploaded directly to this repository.

Truth-Zeeker AI — Model Card (demo)

Overview

Small demonstration model for the Truth-Zeeker AI pipeline.
This repo contains a tiny synthetic dataset and a demo script that trains/loads a minimal model and shows predictions.

Intended use

Educational / research demo only. Not for production. Use only with sanitized or synthetic data.

Model details

Algorithm (demo): IsolationForest (scikit-learn) for anomaly scoring
Input features: duration, orig_bytes, resp_bytes
Output: anomaly score / binary flag

Limitations

Demo model is trained on synthetic data and is not validated on real traffic.
Do not use with real PHI/PII or production network environments.

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support