malayalam_Llama-3.2-3B-Instruct-bnb-4bit

This repository contains a LoRA (Low-Rank Adaptation) fine-tuned version of meta-llama/Llama-3.2-8B-Instruct, specifically adapted for instruction-following tasks in the Malayalam language.

The model was fine-tuned using QLoRA, which involves training with 4-bit quantization, making it efficient to train and run. The training was performed on the excellent VishnuPJ/Alpaca_Instruct_Malayalam dataset.

This model is designed to understand and respond to a wide variety of instructions in Malayalam, from simple questions to more complex tasks like summarization, translation, and creative writing.

💡 How to Use

This is a LoRA adapter, not a full model. To use it, you must first load the base model and then apply this adapter on top. The following code shows how to do this correctly.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

# [Action Required] - Please verify the base model identifier.
# The official Llama-3.2 series starts at 8B. If you used a different 3B variant,
# please update this path. For this example, we'll use the official 8B model.
base_model_id = "meta-llama/Llama-3.2-8B-Instruct"
adapter_id = "mangalathkedar/malayalam_Llama-3.2-3B-Instruct-bnb-4bit"

# 1. Configure 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

# 2. Load the base model with quantization
print(f"Loading base model: {base_model_id}")
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
    # Make sure you are logged in to Hugging Face and have access to the base model
    # token="YOUR_HF_TOKEN" 
)

# 3. Load the tokenizer
# The tokenizer is saved with the adapter, so we load it from the adapter repo
tokenizer = AutoTokenizer.from_pretrained(adapter_id)
# Ensure the pad token is set for batching (if not already set)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# 4. Apply the LoRA adapter to the base model
print(f"Applying LoRA adapter: {adapter_id}")
model = PeftModel.from_pretrained(base_model, adapter_id)

# 5. Prepare and run inference
alpaca_prompt_template = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes therequest.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

instruction = "കേരളത്തെക്കുറിച്ച് ഒരു വാക്യം എഴുതുക." # "Write a sentence about Kerala."
inputs = tokenizer(
    [
        alpaca_prompt_template.format(instruction, "", "")
    ], return_tensors="pt"
).to("cuda")

print("Generating response...")
outputs = model.generate(**inputs, max_new_tokens=100, use_cache=True)
response_text = tokenizer.batch_decode(outputs)[0]

# Clean up the output to only show the response part
response_only = response_text.split("### Response:")[1].strip()
print(response_only)
📚 Training Details
Base Model: meta-llama/Llama-3.2-8B-Instruct (or a similar 3B variant).
Dataset: VishnuPJ/Alpaca_Instruct_Malayalam - A high-quality instruction dataset translated into Malayalam.
Fine-tuning Method: QLoRA (4-bit NormalFloat quantization with LoRA).
Frameworks: Hugging Face transformers, peft, bitsandbytes, and accelerate.
⚠️ Limitations and Ethical Considerations
Language Focus: This model is specialized for Malayalam. Its performance on other languages, including English, will be suboptimal.
Hallucinations: Like all LLMs, this model may generate factually incorrect or nonsensical information. Always verify critical information.
Bias: The model's responses are influenced by the data it was trained on. It may reflect biases present in the Alpaca_Instruct_Malayalam dataset or the underlying Llama 3.2 model.
Not for Critical Advice: Do not use this model for medical, legal, or financial advice.

Citation
If you use this model in your work, please consider citing its creators, the dataset author, and the base model authors.
Bibtex
@misc{mangalathkedar_malayalam_llama32_3b,
  author = {Kedar Mangalath},
  title = {malayalam_Llama-3.2-3B-Instruct-bnb-4bit: A LoRA fine-tuned model for Malayalam},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face repository},
  howpublished = {\url{https://huggingface.co/mangalathkedar/malayalam_Llama-3.2-3B-Instruct-bnb-4bit}},
}



Bibtex
@misc{vishnupj_2023_alpaca_instruct_malayalam,
    author       = {Vishnu P J},
    title        = {{Alpaca Instruct Malayalam Dataset}},
    month        = nov,
    year         = 2023,
    publisher    = {Hugging Face},
    version      = {1.0.0},
    doi          = {10.57967/hf/1149},
    url          = {https://huggingface.co/datasets/VishnuPJ/Alpaca_Instruct_Malayalam}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mangalathkedar/malayalam_Llama-3.2-3B-Instruct-bnb-4bit

Dataset used to train mangalathkedar/malayalam_Llama-3.2-3B-Instruct-bnb-4bit