faq / README.md
brendon-ai's picture
Update README.md
69678d3 verified
---
title: Ollama Generate API
emoji: πŸ¦™
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
app_port: 7860
---
# Ollama Generate API
A simple REST API for text generation using Ollama models on Hugging Face Spaces.
## Features
- πŸ¦™ Generate text using Ollama models
- πŸŽ›οΈ Configurable parameters (temperature, top_p, max_tokens)
- πŸ“Š Health monitoring
- πŸš€ Simple and lightweight API
## API Endpoints
### Health Check
- `GET /health` - Check if Ollama service is running
- `GET /` - API information and usage examples
### Text Generation
- `POST /generate` - Generate text completion
## Usage Examples
### Check Health
```bash
curl "https://your-space.hf.space/health"
```
### Generate Text
```bash
curl -X POST "https://your-space.hf.space/generate" \
-H "Content-Type: application/json" \
-d '{
"model": "tinyllama",
"prompt": "The future of AI is",
"temperature": 0.7,
"max_tokens": 100
}'
```
### API Information
```bash
curl "https://your-space.hf.space/"
```
## Request Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `model` | string | required | Model name (e.g., "tinyllama") |
| `prompt` | string | required | Input text prompt |
| `temperature` | float | 0.7 | Sampling temperature (0.0-2.0) |
| `top_p` | float | 0.9 | Top-p sampling (0.0-1.0) |
| `max_tokens` | integer | 512 | Maximum tokens to generate (1-4096) |
## Supported Models
This API works with any Ollama model. Recommended lightweight models for Hugging Face Spaces:
- `tinyllama` - Very small and fast (~600MB)
- `phi` - Small but capable (~1.6GB)
- `llama2:7b` - Larger but more capable (~3.8GB)
## Interactive Documentation
Once deployed, visit `/docs` for interactive API documentation powered by FastAPI.
## Setup Notes
- The startup script automatically pulls the `tinyllama` model
- First generation may be slower as the model loads
- Lightweight models are recommended for better performance on limited resources
## Example Response
```json
{
"model": "tinyllama",
"response": "The future of AI is bright and full of possibilities...",
"done": true,
"total_duration": 1234567890,
"load_duration": 123456789,
"prompt_eval_count": 10,
"eval_count": 25
}
```
## Resource Requirements
- **TinyLlama**: ~1GB RAM, very fast
- **Phi models**: ~2GB RAM, good balance
- **Llama2 7B**: ~8GB RAM, high quality
For Hugging Face Spaces free tier, stick with TinyLlama or Phi models for best performance.