PAINTED FANTASY

Mistral Small 3.2 24B

Overview

Experimental release.

This is an uncensored creative model intended to excel at character driven RP / ERP.

This model is designed to provide longer, narrative heavy responses where characters are portrayed accurately and proactively.

SillyTavern Settings

Recommended Roleplay Format

> Actions: In plaintext

> Dialogue: "In quotes"

> Thoughts: *In asterisks*

Recommended Samplers

> Temp: 0.8

> MinP: 0.04 - 0.05

> TopP: 0.95 - 1.0

> Dry: 0.8, 1.75, 4

Instruct

Mistral v7 Tekken

Quantizations

GGUF

> Static (mrademacher)

> iMatrix (mrademacher)

EXL3

> 3bpw

> 4bpw

> 5bpw

> 6bpw

Training Process

Training process: Pretrain > SFT > DPO > DPO 2

Did a small pretrain on some light novels and Frieren wiki data as a test. Hasn't seemed to hurt the model and model has shown some small improvements in the lore of series that were included.

The model then went through the standard SFT using a dataset of approx 3.6 million tokens, 700 RP conversations, 1000 creative writing / instruct samples and about 100 summaries. The bulk of this data has been made public.

Finally DPO was used to make the model a little more consistent. The first stage of DPO focused on instruction following and the second tried to burn out some Mistral-isms.

Not optimized for cost / performance efficiency, YMMV.

SFT 1*H100

# ====================
# MODEL CONFIGURATION
# ====================
base_model: ./MS3-2-Pretrain/merged
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer


# ====================
# DATASET CONFIGURATION
# ====================
datasets:
  - path: ./dataset.jsonl
    type: chat_template
    split: train
    chat_template_strategy: tokenizer
    field_messages: messages
    message_property_mappings:
      role: role
      content: content
    roles:
      user: ["user"]
      assistant: ["assistant"]
      system: ["system"]

dataset_prepared_path:
train_on_inputs: false  # Only train on assistant responses
# ====================
# QLORA CONFIGURATION
# ====================
adapter: qlora
load_in_4bit: true
lora_r: 128
lora_alpha: 128
lora_dropout: 0.1
lora_target_linear: true
# lora_modules_to_save:  # Uncomment only if you added NEW tokens
# ====================
# TRAINING PARAMETERS
# ====================
num_epochs: 3
micro_batch_size: 4
gradient_accumulation_steps: 2
learning_rate: 1e-5
optimizer: paged_adamw_8bit
lr_scheduler: rex
warmup_ratio: 0.05
weight_decay: 0.01
max_grad_norm: 1.0
# ====================
# SEQUENCE & PACKING
# ====================
sequence_len: 8192
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true
# ====================
# HARDWARE OPTIMIZATIONS
# ====================
bf16: auto
flash_attention: true
gradient_checkpointing: true
# ====================
# EVALUATION & CHECKPOINTING
# ====================
save_strategy: steps
save_steps: 5
save_total_limit: 5  # Keep best + last few checkpoints
load_best_model_at_end: true
greater_is_better: false
# ====================
# LOGGING & OUTPUT
# ====================
output_dir: ./MS3-2-SFT-2
logging_steps: 2
save_safetensors: true
# ====================
# WANDB TRACKING
# ====================
wandb_project: MS3-2-SFT
wandb_entity: your_entity
wandb_name: run_name