CalMate-20B-KO (LoRA Adapter)
한국어 일정/스케줄 도메인에 맞춘 LoRA 어댑터입니다.
베이스 모델: unsloth/gpt-oss-20b-unsloth-bnb-4bit (4bit, bitsandbytes)
사용 예시
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
BASE_ID = "unsloth/gpt-oss-20b-unsloth-bnb-4bit"
ADAPTER = "Seonghaa/CalMate-20B-KO-LoRA"
bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.float16)
tok = AutoTokenizer.from_pretrained(ADAPTER, use_fast=True, trust_remote_code=True)
if tok.pad_token is None: tok.pad_token = tok.eos_token
base = AutoModelForCausalLM.from_pretrained(
BASE_ID, quantization_config=bnb, device_map="auto",
trust_remote_code=True, low_cpu_mem_usage=True,
)
model = PeftModel.from_pretrained(base, ADAPTER)
model.eval()
def chat(system, user, max_new_tokens=160, temperature=0.0):
try: model.gradient_checkpointing_disable()
except: pass
model.config.use_cache = True
try:
prompt = tok.apply_chat_template(
[{"role":"system","content":system},{"role":"user","content":user}],
tokenize=False, add_generation_prompt=True
)
except:
prompt = f"<|system|>\n{system}\n<|user|>\n{user}\n<|assistant|>\n"
ins = tok(prompt, return_tensors="pt").to(model.device)
with torch.inference_mode():
out = model.generate(**ins, max_new_tokens=max_new_tokens,
do_sample=(temperature>0.0), temperature=temperature,
eos_token_id=tok.eos_token_id, pad_token_id=tok.pad_token_id)
print(tok.decode(out[0], skip_special_tokens=True).split("<|assistant|>")[-1].strip())
chat("너는 한국어 일정 비서다. 간결하고 또렷하게 답한다.",
"내일 오후 3시에 강남 미팅 2시간, 중요도 높음. 일정 등록해줘.")
포함 파일
- LoRA 가중치 (.safetensors)
- adapter_config.json
- tokenizer.* (선택)
학습 정보
- QLoRA r=16, alpha=32, dropout=0.0
- fp16 compute, packing
- 파일럿: 400 스텝
Model tree for Seonghaa/CalMate-20B-KO-LoRA
Base model
openai/gpt-oss-20b
Quantized
unsloth/gpt-oss-20b-unsloth-bnb-4bit