TinyPi-Chat-V1

TinyPi-Chat-V1 is a fine-tuned version of the TinyLlama/TinyLlama-1.1B-Chat-v1.0 model. This project's goal was not to create a simple instruction-following assistant, but to cultivate an AI with a distinct, friendly, and engaging personality, mirroring the natural, witty, and sometimes quirky style of general-purpose Discord conversations. , It was trained on a large dataset of chat logs, resulting in a model that excels at open-ended conversation, offers playful and sometimes evasive humor, and can maintain a consistent character.

This version (v1) represents the initial, highly specialized fine-tune and serves as the foundation for further alignment using techniques like RLAIF.

How to Use

This model is a merged, standalone model and can be used directly for text generation. It follows a specific chat template that must be used to get the best results.

Installation

pip install transformers torch accelerate

from transformers import pipeline
import torch

model_path = "Kittykat924/TinyPi-chat-V1"
pipe = pipeline(
    "text-generation",
    model=model_path,
    torch_dtype=torch.float16,
    device_map="auto"
)

prompt = "What do you think of today?"

messages = [
    {"role": "user", "content": prompt},
]

prompt_formatted = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

outputs = pipe(
    prompt_formatted,
    max_new_tokens=128,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95
)

response = outputs[0]["generated_text"]
assistant_response = response.split("<|assistant|>")[1].strip()
print(assistant_response)

Training Procedure

This model was trained using a custom script built on the Hugging Face accelerate, peft, and datasets libraries.

v1 Fine-tuning Details

Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0

Dataset: A large, private dataset of over 2 million general-purpose Discord chat messages.

Training Method: Parameter-Efficient Fine-Tuning (PEFT) using the LoRA technique.

Hardware: 2x NVIDIA T4 GPUs

Framework: accelerate for distributed training.

Key Hyperparameters:

Learning Rate: 2e-4

LoRA r (rank): 64

LoRA alpha: 16

Batch Size: 4 per device

Gradient Accumulation: 4 steps

Optimizer: AdamW

The model was trained for approximately 2500 steps, with the final adapter chosen based on the lowest validation loss, which occurred very early in the training process (around step 200), indicating rapid specialization on the dataset. The final merged model uses the weights from this optimal checkpoint.

Project Goals

The primary goal of this project was to explore the emergence of personality in language models. Instead of optimizing for factual accuracy or instruction-following, the training was designed to capture the nuances of human-to-human digital interaction. The success of this v1 model lies in its ability to generate responses that are not just correct but believable and in-character.

The "weirdness" and occasional abstract responses are not viewed as bugs, but as features of a model that has learned a rich but ungrounded set of conversational styles.

Limitations and Bias

This model was trained on a large corpus of public internet chat data. As such, it may have inherited biases, opinions, and language styles present in that data. It is not designed to be a source of factual information and may produce incorrect or nonsensical statements, especially on topics outside its training domain. It is intended for research and entertainment purposes. User discretion is advised.

-kittykat924