chandra-bf16-mlx
I make this available for a limited time for testing, will be removed in a week
Chandra is an OCR model that outputs markdown, HTML, and JSON. It is highly accurate at extracting text from images and PDFs, while preserving layout information.
You can try Chandra in the free playground here, or at a hosted API here.
Features
- Convert documents to markdown, html, or json with detailed layout information
- Good handwriting support
- Reconstructs forms accurately, including checkboxes
- Good support for tables, math, and complex layouts
- Extracts images and diagrams, with captions and structured data
- Support for 40+ languages
See the original model card for details on usage from the command line.
This model chandra-bf16-mlx was converted to MLX format from datalab-to/chandra using mlx-vlm version 0.3.4.
Using mlx tools
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("chandra-bf16-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
- Downloads last month
- 22