Model Card for Model ID
Model Details
Model Description
This is the model card of a ๐ค transformers model that has been pushed on the Hub. This model card has been automatically generated.
- Developed by: SIQRIT
- Model type : Qwen/Qwen3-8B
- Language(s) (NLP) : Korean-based Learning
- License : apache-2.0
- Finetuned from model : Q-DoRA
Model Sources [optional]
- Repository : [GitHub]
Uses
Direct Use
Downstream Use [optional]
[More Information Needed]
Out-of-Scope Use
[More Information Needed]
Bias, Risks, and Limitations
The Vector DB used in the training of this model was created based on YouTube scripts. In addition, the YouTube script used the automatic translation generation function. Therefore, for Vector DB references, there is no problem with sentence generation, but word can sometimes be incomplete.
Recommendations
Special tokens have been added to enhance prompt engineering. The hyperparameters reflecting the current latest paper trends are described in detail below.
How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
Training Details
Training Data
YouTube Scripts on Korean-Based Science
Training Procedure
Preprocessing [optional]
[More Information Needed]
Training Hyperparameters
Special Tokens
special_tokens_dict = { "additional_special_tokens": [ "[DAIS_INSTRUCTION]", "[DAIS_STYLE]", "[DAIS_RULE]", "[DAIS_EXAMPLE]", "[HISTORY]", "[INPUT]", "[OUTPUT]", "[CONTEXT]" ] }
DoRA Adapter Config
lora_config = LoraConfig( r=64, lora_alpha=32, target_modules=[ "model.embed_tokens", "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj" ], lora_dropout=0.05, bias="none", task_type="CAUSAL_LM", use_dora=True )
Training Arguments
training_args = TrainingArguments( output_dir=OUTPUT_DIR, per_device_train_batch_size=4, per_device_eval_batch_size=4, gradient_accumulation_steps=8, optim="paged_adamw_32bit", gradient_checkpointing=True, num_train_epochs=20, learning_rate=3e-5, lr_scheduler_type="cosine", warmup_ratio=0.1, eval_strategy="epoch", save_strategy="epoch", save_total_limit=5, load_best_model_at_end=True, metric_for_best_model="eval_loss", greater_is_better=False, logging_steps=10, weight_decay=0.01, max_grad_norm=1.0, bf16=True, fp16=False, group_by_length=True, remove_unused_columns=True, push_to_hub=False, report_to="none" )
Supervised Fine-Tuning
trainer = SFTTrainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=eval_dataset, peft_config=lora_config, callbacks=[EarlyStoppingCallback(early_stopping_patience=5)] )
Speeds, Sizes, Times [optional]
[More Information Needed]
Evaluation
Testing Data, Factors & Metrics
Testing Data
[More Information Needed]
Factors
[More Information Needed]
Metrics
[More Information Needed]
Results
[More Information Needed]
Summary
Model Examination [optional]
[More Information Needed]
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: [More Information Needed]
- Hours used: [More Information Needed]
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
Technical Specifications [optional]
Model Architecture and Objective
This model is named DAIS and its official name is Divergent AI with Science. Also it is trained on Korean and aims to train on the subject of a science AI influencer.
Compute Infrastructure
[More Information Needed]
Hardware
RunPod A100 100GB(DISK)/100GB(Container)
Software
runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
[More Information Needed]
Model Card Authors [optional]
SIQRIT
Model Card Contact
- Downloads last month
- 9