chandra-bf16-mlx

I make this available for a limited time for testing, will be removed in a week

Chandra is an OCR model that outputs markdown, HTML, and JSON. It is highly accurate at extracting text from images and PDFs, while preserving layout information.

You can try Chandra in the free playground here, or at a hosted API here.

Features

Convert documents to markdown, html, or json with detailed layout information
Good handwriting support
Reconstructs forms accurately, including checkboxes
Good support for tables, math, and complex layouts
Extracts images and diagrams, with captions and structured data
Support for 40+ languages

See the original model card for details on usage from the command line.

This model chandra-bf16-mlx was converted to MLX format from datalab-to/chandra using mlx-vlm version 0.3.4.

Using mlx tools

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("chandra-bf16-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)

Downloads last month: 22

Safetensors

Model size

9B params

Tensor type

BF16