Model Card for Model ID

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: SIQRIT
Model type : Qwen/Qwen3-8B
Language(s) (NLP) : Korean-based Learning
License : apache-2.0
Finetuned from model : Q-DoRA

Model Sources [optional]

Repository : [GitHub]

Uses

Direct Use

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

The Vector DB used in the training of this model was created based on YouTube scripts. In addition, the YouTube script used the automatic translation generation function. Therefore, for Vector DB references, there is no problem with sentence generation, but word can sometimes be incomplete.

Recommendations

Special tokens have been added to enhance prompt engineering. The hyperparameters reflecting the current latest paper trends are described in detail below.

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Training Details

Training Data

YouTube Scripts on Korean-Based Science

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

Special Tokens

special_tokens_dict = { "additional_special_tokens": [ "[DAIS_INSTRUCTION]", "[DAIS_STYLE]", "[DAIS_RULE]", "[DAIS_EXAMPLE]", "[HISTORY]", "[INPUT]", "[OUTPUT]", "[CONTEXT]" ] }
DoRA Adapter Config

lora_config = LoraConfig( r=64, lora_alpha=32, target_modules=[ "model.embed_tokens", "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj" ], lora_dropout=0.05, bias="none", task_type="CAUSAL_LM", use_dora=True )
Training Arguments

training_args = TrainingArguments( output_dir=OUTPUT_DIR, per_device_train_batch_size=4, per_device_eval_batch_size=4, gradient_accumulation_steps=8, optim="paged_adamw_32bit", gradient_checkpointing=True, num_train_epochs=20, learning_rate=3e-5, lr_scheduler_type="cosine", warmup_ratio=0.1, eval_strategy="epoch", save_strategy="epoch", save_total_limit=5, load_best_model_at_end=True, metric_for_best_model="eval_loss", greater_is_better=False, logging_steps=10, weight_decay=0.01, max_grad_norm=1.0, bf16=True, fp16=False, group_by_length=True, remove_unused_columns=True, push_to_hub=False, report_to="none" )
Supervised Fine-Tuning

trainer = SFTTrainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=eval_dataset, peft_config=lora_config, callbacks=[EarlyStoppingCallback(early_stopping_patience=5)] )

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: [More Information Needed]
Hours used: [More Information Needed]
Cloud Provider: [More Information Needed]
Compute Region: [More Information Needed]
Carbon Emitted: [More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

This model is named DAIS and its official name is Divergent AI with Science. Also it is trained on Korean and aims to train on the subject of a science AI influencer.

Compute Infrastructure

[More Information Needed]

Hardware

RunPod A100 100GB(DISK)/100GB(Container)

Software

runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

SIQRIT

Model Card Contact

siqrit09@gmail.com

SIQRIT
/

DAIS-Qwen3-8B-qdora

Model Card for Model ID

Model Details

Model Description

Model Sources [optional]

Uses

Direct Use

Downstream Use [optional]

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Training Details

Training Data

Training Procedure

Preprocessing [optional]

Training Hyperparameters

Speeds, Sizes, Times [optional]

Evaluation

Testing Data, Factors & Metrics

Testing Data

Factors

Metrics

Results

Summary

Model Examination [optional]

Environmental Impact

Technical Specifications [optional]

Model Architecture and Objective

Compute Infrastructure

Hardware

Software

Citation [optional]

Glossary [optional]

More Information [optional]

Model Card Authors [optional]

Model Card Contact

Model tree for SIQRIT/DAIS-Qwen3-8B-qdora