Spaces:
Sleeping
Sleeping
title: Ollama Generate API | |
emoji: π¦ | |
colorFrom: blue | |
colorTo: purple | |
sdk: docker | |
pinned: false | |
app_port: 7860 | |
# Ollama Generate API | |
A simple REST API for text generation using Ollama models on Hugging Face Spaces. | |
## Features | |
- π¦ Generate text using Ollama models | |
- ποΈ Configurable parameters (temperature, top_p, max_tokens) | |
- π Health monitoring | |
- π Simple and lightweight API | |
## API Endpoints | |
### Health Check | |
- `GET /health` - Check if Ollama service is running | |
- `GET /` - API information and usage examples | |
### Text Generation | |
- `POST /generate` - Generate text completion | |
## Usage Examples | |
### Check Health | |
```bash | |
curl "https://your-space.hf.space/health" | |
``` | |
### Generate Text | |
```bash | |
curl -X POST "https://your-space.hf.space/generate" \ | |
-H "Content-Type: application/json" \ | |
-d '{ | |
"model": "tinyllama", | |
"prompt": "The future of AI is", | |
"temperature": 0.7, | |
"max_tokens": 100 | |
}' | |
``` | |
### API Information | |
```bash | |
curl "https://your-space.hf.space/" | |
``` | |
## Request Parameters | |
| Parameter | Type | Default | Description | | |
|-----------|------|---------|-------------| | |
| `model` | string | required | Model name (e.g., "tinyllama") | | |
| `prompt` | string | required | Input text prompt | | |
| `temperature` | float | 0.7 | Sampling temperature (0.0-2.0) | | |
| `top_p` | float | 0.9 | Top-p sampling (0.0-1.0) | | |
| `max_tokens` | integer | 512 | Maximum tokens to generate (1-4096) | | |
## Supported Models | |
This API works with any Ollama model. Recommended lightweight models for Hugging Face Spaces: | |
- `tinyllama` - Very small and fast (~600MB) | |
- `phi` - Small but capable (~1.6GB) | |
- `llama2:7b` - Larger but more capable (~3.8GB) | |
## Interactive Documentation | |
Once deployed, visit `/docs` for interactive API documentation powered by FastAPI. | |
## Setup Notes | |
- The startup script automatically pulls the `tinyllama` model | |
- First generation may be slower as the model loads | |
- Lightweight models are recommended for better performance on limited resources | |
## Example Response | |
```json | |
{ | |
"model": "tinyllama", | |
"response": "The future of AI is bright and full of possibilities...", | |
"done": true, | |
"total_duration": 1234567890, | |
"load_duration": 123456789, | |
"prompt_eval_count": 10, | |
"eval_count": 25 | |
} | |
``` | |
## Resource Requirements | |
- **TinyLlama**: ~1GB RAM, very fast | |
- **Phi models**: ~2GB RAM, good balance | |
- **Llama2 7B**: ~8GB RAM, high quality | |
For Hugging Face Spaces free tier, stick with TinyLlama or Phi models for best performance. |