|
|
|
--- |
|
base_model: google/gemma-2b-it |
|
library_name: peft |
|
--- |
|
|
|
# Model Card for SQL Injection Classifier |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
This model is a classifier that detects SQL injection attacks in SQL queries. It is based on the `google/gemma-2b-it` model and uses the `peft` library for training and evaluation. This model is trained on a dataset of SQL queries with and without SQL injection attacks. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
This SQL injection classifier is a fine-tuned version of the google/gemma-2b-it model, optimized to detect potential SQL injection vulnerabilities in SQL queries. It uses the PEFT (Parameter-Efficient Fine-Tuning) library to achieve high performance while maintaining efficiency. |
|
|
|
The model demonstrates exceptional performance in classifying SQL queries as either secure or vulnerable: |
|
|
|
``` |
|
Accuracy: 0.9984 |
|
Precision: 0.9974 |
|
Recall: 0.9993 |
|
F1-score: 0.9984 |
|
|
|
Classification Report: |
|
|
|
precision recall f1-score support |
|
|
|
Secure 1.00 1.00 1.00 5658 |
|
Vulnerable 1.00 1.00 1.00 5467 |
|
accuracy 1.00 11125 |
|
macro avg 1.00 1.00 1.00 11125 |
|
weighted avg 1.00 1.00 1.00 11125 |
|
``` |
|
|
|
- **Developed by:** Mahesh Jamdade |
|
- **Model type:** Text Classification |
|
- **Language(s) (NLP):** SQL, English |
|
- **License:** [More Information Needed] |
|
- **Finetuned from model:** google/gemma-2b-it |
|
|
|
### Model Sources |
|
|
|
- **Repository:** https://huggingface.co/maheshmnj/sql-injection-classifier |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
This model can be directly used to classify SQL queries as either secure or vulnerable to SQL injection attacks. It can be integrated into security tools, database management systems, or web application firewalls to provide an additional layer of protection against SQL injection vulnerabilities. |
|
|
|
### Downstream Use |
|
|
|
The model can be further fine-tuned or integrated into larger security ecosystems. It could be used as a component in: |
|
- Code review tools |
|
- Automated security testing suites |
|
- Real-time query analysis systems in database applications |
|
|
|
### Out-of-Scope Use |
|
|
|
This model is specifically trained for SQL injection detection and should not be used for: |
|
- Detecting other types of security vulnerabilities |
|
- Generating or correcting SQL queries |
|
- Analyzing queries in languages other than SQL |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
- The model's performance may vary on SQL dialects or patterns not well-represented in the training data. |
|
- False positives or negatives, while rare given the high accuracy, could still occur and should be considered in critical applications. |
|
- The model may not catch highly sophisticated or novel SQL injection techniques. |
|
|
|
### Recommendations |
|
|
|
- Always use this model as part of a comprehensive security strategy, not as the sole defense against SQL injection. |
|
- Regularly update and retrain the model with new, real-world SQL injection patterns. |
|
- Implement additional security measures such as parameterized queries and input sanitization. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the following code to get started with the model: |
|
|
|
```python |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
|
model_path = "maheshj01/sql-injection-classifier" |
|
model = AutoModelForSequenceClassification.from_pretrained(model_path) |
|
tokenizer = AutoTokenizer.from_pretrained(model_path) |
|
|
|
# Function to classify a SQL query |
|
def classify_query(query): |
|
inputs = tokenizer(query, return_tensors="pt", truncation=True, padding=True) |
|
outputs = model(**inputs) |
|
prediction = outputs.logits.argmax(-1).item() |
|
return "Vulnerable" if prediction == 1 else "Secure" |
|
|
|
# Example usage |
|
query = "SELECT * FROM users WHERE username = 'admin' OR '1'='1'" |
|
result = classify_query(query) |
|
print(f"The query is classified as: {result}") |
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
The model was trained on a dataset of SQL queries, including both secure queries and queries containing SQL injection vulnerabilities. [More specific information about the dataset is needed] |
|
|
|
### Training Procedure |
|
|
|
The model was fine-tuned using the PEFT library, which allows for efficient adaptation of the pre-trained Gemma 2B model to the SQL injection classification task. |
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** [More Information Needed] |
|
|
|
## Evaluation |
|
|
|
The model was evaluated on a held-out test set of SQL queries, achieving high performance across all metrics as shown in the classification report above. |
|
|
|
## Environmental Impact |
|
|
|
[More Information Needed] |
|
|
|
## Technical Specifications |
|
|
|
### Model Architecture and Objective |
|
|
|
The model is based on the google/gemma-2b-it architecture, fine-tuned for binary classification of SQL queries. |
|
|
|
### Compute Infrastructure |
|
|
|
#### Software |
|
|
|
- PEFT 0.8.2 |
|
- Transformers [version needed] |
|
- PyTorch [version needed] |
|
|
|
## Model Card Contact |
|
|
|
For questions or concerns about this model, please contact Mahesh Jamdade through the [Hugging Face repository](https://huggingface.co/maheshmnj/sql-injection-classifier). |
|
|