CLaRa-7B-Instruct (Compression-16 & 128)

The CLaRa-7B-Instruct model is our instruction-tuned unified RAG model with built-in semantic document compression (16× & 128x).
It supports instruction-following QA directly from compressed document representations.

Training recipe: Instruction tuning on QA-style tasks built on top of the base semantic compression model.
Benchmarks: Strong instruction-following performance under 16× compression.

More details and usage examples:

Paper: CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning
GitHub: https://github.com/apple/ml-clara

Example Usage (Instruction-Tuned Inference)

from transformers import AutoModel

unirag = AutoModel.from_pretrained(
    "/mnt/ceph_rbd/model/CLaRa-7B-Instruct/compression-16",
    trust_remote_code=True
).to("cuda")

documents = [
    [
        "Weldenia is a monotypic genus of flowering plant in the family Commelinaceae...",
        "Hagsatera is a genus of flowering plants from the orchid family...",
        "Alsobia is a genus of flowering plants in the family Gesneriaceae..."
    ]
]

questions = [
    "Which genus of plant grows originally in Mexico and Guatemala, Phylica or Weldenia?"
]

# Instruction-tuned usage
out = unirag.generate_from_text(
    questions=questions,
    documents=documents,
    max_new_tokens=64
)

print("Generated answer:", out)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for apple/CLaRa-7B-Instruct

Base model

mistralai/Mistral-7B-Instruct-v0.2

Finetuned

(1047)

this model