CLaRa-7B-Instruct (Compression-16 & 128)
The CLaRa-7B-Instruct model is our instruction-tuned unified RAG model with built-in semantic document compression (16ร & 128x).
It supports instruction-following QA directly from compressed document representations.
Training recipe: Instruction tuning on QA-style tasks built on top of the base semantic compression model.
Benchmarks: Strong instruction-following performance under 16ร compression.
More details and usage examples:
Paper: CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning
GitHub: https://github.com/apple/ml-clara
Example Usage (Instruction-Tuned Inference)
from transformers import AutoModel
unirag = AutoModel.from_pretrained(
"/mnt/ceph_rbd/model/CLaRa-7B-Instruct/compression-16",
trust_remote_code=True
).to("cuda")
documents = [
[
"Weldenia is a monotypic genus of flowering plant in the family Commelinaceae...",
"Hagsatera is a genus of flowering plants from the orchid family...",
"Alsobia is a genus of flowering plants in the family Gesneriaceae..."
]
]
questions = [
"Which genus of plant grows originally in Mexico and Guatemala, Phylica or Weldenia?"
]
# Instruction-tuned usage
out = unirag.generate_from_text(
questions=questions,
documents=documents,
max_new_tokens=64
)
print("Generated answer:", out)
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for apple/CLaRa-7B-Instruct
Base model
mistralai/Mistral-7B-Instruct-v0.2