metadata

title: Ollama Generate API
emoji: 🦙
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
app_port: 7860

Ollama Generate API

A simple REST API for text generation using Ollama models on Hugging Face Spaces.

Features

🦙 Generate text using Ollama models
🎛️ Configurable parameters (temperature, top_p, max_tokens)
📊 Health monitoring
🚀 Simple and lightweight API

API Endpoints

Health Check

GET /health - Check if Ollama service is running
GET / - API information and usage examples

Text Generation

POST /generate - Generate text completion

Usage Examples

Check Health

curl "https://your-space.hf.space/health"

Generate Text

curl -X POST "https://your-space.hf.space/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tinyllama",
    "prompt": "The future of AI is",
    "temperature": 0.7,
    "max_tokens": 100
  }'

API Information

curl "https://your-space.hf.space/"

Request Parameters

Parameter	Type	Default	Description
`model`	string	required	Model name (e.g., "tinyllama")
`prompt`	string	required	Input text prompt
`temperature`	float	0.7	Sampling temperature (0.0-2.0)
`top_p`	float	0.9	Top-p sampling (0.0-1.0)
`max_tokens`	integer	512	Maximum tokens to generate (1-4096)

Supported Models

This API works with any Ollama model. Recommended lightweight models for Hugging Face Spaces:

tinyllama - Very small and fast (~600MB)
phi - Small but capable (~1.6GB)
llama2:7b - Larger but more capable (~3.8GB)

Interactive Documentation

Once deployed, visit /docs for interactive API documentation powered by FastAPI.

Setup Notes

The startup script automatically pulls the tinyllama model
First generation may be slower as the model loads
Lightweight models are recommended for better performance on limited resources

Example Response

{
  "model": "tinyllama",
  "response": "The future of AI is bright and full of possibilities...",
  "done": true,
  "total_duration": 1234567890,
  "load_duration": 123456789,
  "prompt_eval_count": 10,
  "eval_count": 25
}

Resource Requirements

TinyLlama: ~1GB RAM, very fast
Phi models: ~2GB RAM, good balance
Llama2 7B: ~8GB RAM, high quality

For Hugging Face Spaces free tier, stick with TinyLlama or Phi models for best performance.