faq / README.md
brendon-ai's picture
Update README.md
69678d3 verified
metadata
title: Ollama Generate API
emoji: πŸ¦™
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
app_port: 7860

Ollama Generate API

A simple REST API for text generation using Ollama models on Hugging Face Spaces.

Features

  • πŸ¦™ Generate text using Ollama models
  • πŸŽ›οΈ Configurable parameters (temperature, top_p, max_tokens)
  • πŸ“Š Health monitoring
  • πŸš€ Simple and lightweight API

API Endpoints

Health Check

  • GET /health - Check if Ollama service is running
  • GET / - API information and usage examples

Text Generation

  • POST /generate - Generate text completion

Usage Examples

Check Health

curl "https://your-space.hf.space/health"

Generate Text

curl -X POST "https://your-space.hf.space/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tinyllama",
    "prompt": "The future of AI is",
    "temperature": 0.7,
    "max_tokens": 100
  }'

API Information

curl "https://your-space.hf.space/"

Request Parameters

Parameter Type Default Description
model string required Model name (e.g., "tinyllama")
prompt string required Input text prompt
temperature float 0.7 Sampling temperature (0.0-2.0)
top_p float 0.9 Top-p sampling (0.0-1.0)
max_tokens integer 512 Maximum tokens to generate (1-4096)

Supported Models

This API works with any Ollama model. Recommended lightweight models for Hugging Face Spaces:

  • tinyllama - Very small and fast (~600MB)
  • phi - Small but capable (~1.6GB)
  • llama2:7b - Larger but more capable (~3.8GB)

Interactive Documentation

Once deployed, visit /docs for interactive API documentation powered by FastAPI.

Setup Notes

  • The startup script automatically pulls the tinyllama model
  • First generation may be slower as the model loads
  • Lightweight models are recommended for better performance on limited resources

Example Response

{
  "model": "tinyllama",
  "response": "The future of AI is bright and full of possibilities...",
  "done": true,
  "total_duration": 1234567890,
  "load_duration": 123456789,
  "prompt_eval_count": 10,
  "eval_count": 25
}

Resource Requirements

  • TinyLlama: ~1GB RAM, very fast
  • Phi models: ~2GB RAM, good balance
  • Llama2 7B: ~8GB RAM, high quality

For Hugging Face Spaces free tier, stick with TinyLlama or Phi models for best performance.