Spaces:
Running
Running
metadata
title: Ollama API
emoji: π¦
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
app_port: 7860
Ollama Model API
A REST API for running Ollama models on Hugging Face Spaces.
Features
- π¦ Run Ollama models via REST API
- π Model management (pull, list, delete)
- π¬ Chat completions
- ποΈ Configurable parameters (temperature, top_p, etc.)
- π Health monitoring
API Endpoints
Health Check
GET /health
- Check if the service is runningGET /models
- List available models
Model Management
POST /models/pull
- Pull a model from Ollama registryDELETE /models/{model_name}
- Delete a model
Chat & Completions
POST /chat
- Chat with a modelPOST /generate
- Generate text completion
Usage Examples
Pull a Model
curl -X POST "https://your-space.hf.space/models/pull" \
-H "Content-Type: application/json" \
-d '{"model": "llama2:7b"}'
Chat with Model
curl -X POST "https://your-space.hf.space/chat" \
-H "Content-Type: application/json" \
-d '{
"model": "llama2:7b",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
]
}'
Generate Text
curl -X POST "https://your-space.hf.space/generate" \
-H "Content-Type: application/json" \
-d '{
"model": "llama2:7b",
"prompt": "The future of AI is",
"max_tokens": 100
}'
Supported Models
This setup supports any model available in the Ollama registry:
llama2:7b
,llama2:13b
mistral:7b
codellama:7b
vicuna:7b
- And many more...
Interactive Documentation
Once deployed, visit /docs
for interactive API documentation.
Notes
- Model pulling may take several minutes depending on model size
- Larger models require more memory and may not work on free tier
- First inference may be slower as the model loads into memory
Resource Requirements
- Small models (7B): 8GB+ RAM recommended
- Medium models (13B): 16GB+ RAM recommended
- Large models (70B+): 32GB+ RAM required
Consider using smaller models like llama2:7b
or mistral:7b
for better performance on limited resources.