Try LFM • Documentation • LEAP

LFM2.5-1.2B-Base

LFM2.5 is a new family of hybrid models designed for on-device deployment. It builds on the LFM2 architecture with extended pre-training and reinforcement learning.

Find more information about LFM2.5 in our blog post.

🗒️ Model Details

Model	Parameters	Description
LFM2.5-1.2B-Base	1.2B	Pre-trained base model for fine-tuning
LFM2.5-1.2B-Instruct	1.2B	General-purpose instruction-tuned model
LFM2.5-1.2B-JP	1.2B	Japanese-optimized chat model
LFM2.5-VL-1.6B	1.6B	Vision-language model with fast inference
LFM2.5-Audio-1.5B	1.5B	Audio-language model for speech and text I/O

LFM2.5-1.2B-Base is the pre-trained text-only checkpoint, used to create all the LFM2.5-1.2B variants. It has the following features:

Number of parameters: 1.17B
Number of layers: 16 (10 double-gated LIV convolution blocks + 6 GQA blocks)
Training budget: 28T tokens
Context length: 32,768 tokens
Vocabulary size: 65,536
Languages: English, Arabic, Chinese, French, German, Japanese, Korean, Spanish

Model	Description
LFM2.5-1.2B-Base	Original model checkpoint in native format. Best for fine-tuning or inference with Transformers and vLLM.
LFM2.5-1.2B-Base-GGUF	Quantized format for llama.cpp and compatible tools. Optimized for CPU inference and local deployment with reduced memory usage.
LFM2.5-1.2B-Base-ONNX	ONNX Runtime format for cross-platform deployment. Enables hardware-accelerated inference across diverse environments (cloud, edge, mobile).

This pre-trained checkpoint is only recommended for tasks that require heavy fine-tuning, like language-specific (e.g., Japanese) or domain-specific (e.g., medical) assistants, training on proprietary data, or experimenting with novel post-training approaches.

🏃 Inference

LFM2.5 is supported by many inference frameworks. See the Inference documentation for the full list.

Name	Description	Docs
Transformers	Simple inference with direct access to model internals.	Link
vLLM	High-throughput production deployments with GPU.	Link
llama.cpp	Cross-platform inference with CPU offloading.	Link

Here's a quick start example with transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model_id = "LiquidAI/LFM2.5-1.2B-Base"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    dtype="bfloat16",
#   attn_implementation="flash_attention_2" <- uncomment on compatible GPU
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

prompt = "What is C. elegans?"

input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
).to(model.device)

output = model.generate(
    input_ids,
    do_sample=True,
    temperature=0.3,
    min_p=0.15,
    repetition_penalty=1.05,
    max_new_tokens=512,
    streamer=streamer,
)

🔧 Fine-tuning

We recommend fine-tuning LFM2.5 for your specific use case to achieve the best results.

Name	Description	Docs
SFT (Unsloth)	Supervised Fine-Tuning with LoRA using Unsloth.	Link
SFT (TRL)	Supervised Fine-Tuning with LoRA using TRL.	Link
DPO (TRL)	Direct Preference Optimization with LoRA using TRL.	Link

Contact

For enterprise solutions and edge deployment, contact sales@liquid.ai.

Citation

@article{liquidai2025lfm2,
  title={LFM2 Technical Report},
  author={Liquid AI},
  journal={arXiv preprint arXiv:2511.23404},
  year={2025}
}

Downloads last month: 66

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for LiquidAI/LFM2.5-1.2B-Base

Finetunes

5 models

Quantizations

10 models

Collection including LiquidAI/LFM2.5-1.2B-Base

💧 LFM2.5

Collection

Collection of Instruct, Base, and Japanese LFM2.5-1.2B models. • 19 items • Updated 2 days ago • 52

Paper for LiquidAI/LFM2.5-1.2B-Base

LFM2 Technical Report

Paper • 2511.23404 • Published Nov 28, 2025 • 42