Spaces:

brendon-ai
/

faq

Running

faq

File size: 2,509 Bytes

---
title: Ollama Generate API
emoji: 🦙
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
app_port: 7860
---

# Ollama Generate API

A simple REST API for text generation using Ollama models on Hugging Face Spaces.

## Features

- 🦙 Generate text using Ollama models
- 🎛️ Configurable parameters (temperature, top_p, max_tokens)
- 📊 Health monitoring
- 🚀 Simple and lightweight API

## API Endpoints

### Health Check
- `GET /health` - Check if Ollama service is running
- `GET /` - API information and usage examples

### Text Generation
- `POST /generate` - Generate text completion

## Usage Examples

### Check Health
```bash
curl "https://your-space.hf.space/health"
```

### Generate Text
```bash
curl -X POST "https://your-space.hf.space/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tinyllama",
    "prompt": "The future of AI is",
    "temperature": 0.7,
    "max_tokens": 100
  }'
```

### API Information
```bash
curl "https://your-space.hf.space/"
```

## Request Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `model` | string | required | Model name (e.g., "tinyllama") |
| `prompt` | string | required | Input text prompt |
| `temperature` | float | 0.7 | Sampling temperature (0.0-2.0) |
| `top_p` | float | 0.9 | Top-p sampling (0.0-1.0) |
| `max_tokens` | integer | 512 | Maximum tokens to generate (1-4096) |

## Supported Models

This API works with any Ollama model. Recommended lightweight models for Hugging Face Spaces:

- `tinyllama` - Very small and fast (~600MB)
- `phi` - Small but capable (~1.6GB)
- `llama2:7b` - Larger but more capable (~3.8GB)

## Interactive Documentation

Once deployed, visit `/docs` for interactive API documentation powered by FastAPI.

## Setup Notes

- The startup script automatically pulls the `tinyllama` model
- First generation may be slower as the model loads
- Lightweight models are recommended for better performance on limited resources

## Example Response

```json
{
  "model": "tinyllama",
  "response": "The future of AI is bright and full of possibilities...",
  "done": true,
  "total_duration": 1234567890,
  "load_duration": 123456789,
  "prompt_eval_count": 10,
  "eval_count": 25
}
```

## Resource Requirements

- **TinyLlama**: ~1GB RAM, very fast
- **Phi models**: ~2GB RAM, good balance
- **Llama2 7B**: ~8GB RAM, high quality

For Hugging Face Spaces free tier, stick with TinyLlama or Phi models for best performance.