Model Card for Model ID

Model Details

Model Description

This is the model card of a ๐Ÿค— transformers model that has been pushed on the Hub. This model card has been automatically generated.

  • Developed by: SIQRIT
  • Model type : Qwen/Qwen3-8B
  • Language(s) (NLP) : Korean-based Learning
  • License : apache-2.0
  • Finetuned from model : Q-DoRA

Model Sources [optional]

Uses

Direct Use

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

The Vector DB used in the training of this model was created based on YouTube scripts. In addition, the YouTube script used the automatic translation generation function. Therefore, for Vector DB references, there is no problem with sentence generation, but word can sometimes be incomplete.

Recommendations

Special tokens have been added to enhance prompt engineering. The hyperparameters reflecting the current latest paper trends are described in detail below.

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Training Details

Training Data

YouTube Scripts on Korean-Based Science

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

  • Special Tokens

    special_tokens_dict = { "additional_special_tokens": [ "[DAIS_INSTRUCTION]", "[DAIS_STYLE]", "[DAIS_RULE]", "[DAIS_EXAMPLE]", "[HISTORY]", "[INPUT]", "[OUTPUT]", "[CONTEXT]" ] }

  • DoRA Adapter Config

    lora_config = LoraConfig( r=64, lora_alpha=32, target_modules=[ "model.embed_tokens", "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj" ], lora_dropout=0.05, bias="none", task_type="CAUSAL_LM", use_dora=True )

  • Training Arguments

    training_args = TrainingArguments( output_dir=OUTPUT_DIR, per_device_train_batch_size=4, per_device_eval_batch_size=4, gradient_accumulation_steps=8, optim="paged_adamw_32bit", gradient_checkpointing=True, num_train_epochs=20, learning_rate=3e-5, lr_scheduler_type="cosine", warmup_ratio=0.1, eval_strategy="epoch", save_strategy="epoch", save_total_limit=5, load_best_model_at_end=True, metric_for_best_model="eval_loss", greater_is_better=False, logging_steps=10, weight_decay=0.01, max_grad_norm=1.0, bf16=True, fp16=False, group_by_length=True, remove_unused_columns=True, push_to_hub=False, report_to="none" )

  • Supervised Fine-Tuning

    trainer = SFTTrainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=eval_dataset, peft_config=lora_config, callbacks=[EarlyStoppingCallback(early_stopping_patience=5)] )

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: [More Information Needed]
  • Hours used: [More Information Needed]
  • Cloud Provider: [More Information Needed]
  • Compute Region: [More Information Needed]
  • Carbon Emitted: [More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

This model is named DAIS and its official name is Divergent AI with Science. Also it is trained on Korean and aims to train on the subject of a science AI influencer.

Compute Infrastructure

[More Information Needed]

Hardware

RunPod A100 100GB(DISK)/100GB(Container)

Software

runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

SIQRIT

Model Card Contact

siqrit09@gmail.com

Downloads last month
9
Safetensors
Model size
8.19B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for SIQRIT/DAIS-Qwen3-8B-qdora

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Finetuned
(192)
this model
Quantizations
1 model