Spaces:
Running
Running
title: Ollama API | |
emoji: π¦ | |
colorFrom: blue | |
colorTo: purple | |
sdk: docker | |
pinned: false | |
app_port: 7860 | |
# Ollama Model API | |
A REST API for running Ollama models on Hugging Face Spaces. | |
## Features | |
- π¦ Run Ollama models via REST API | |
- π Model management (pull, list, delete) | |
- π¬ Chat completions | |
- ποΈ Configurable parameters (temperature, top_p, etc.) | |
- π Health monitoring | |
## API Endpoints | |
### Health Check | |
- `GET /health` - Check if the service is running | |
- `GET /models` - List available models | |
### Model Management | |
- `POST /models/pull` - Pull a model from Ollama registry | |
- `DELETE /models/{model_name}` - Delete a model | |
### Chat & Completions | |
- `POST /chat` - Chat with a model | |
- `POST /generate` - Generate text completion | |
## Usage Examples | |
### Pull a Model | |
```bash | |
curl -X POST "https://your-space.hf.space/models/pull" \ | |
-H "Content-Type: application/json" \ | |
-d '{"model": "llama2:7b"}' | |
``` | |
### Chat with Model | |
```bash | |
curl -X POST "https://your-space.hf.space/chat" \ | |
-H "Content-Type: application/json" \ | |
-d '{ | |
"model": "llama2:7b", | |
"messages": [ | |
{"role": "user", "content": "Hello, how are you?"} | |
] | |
}' | |
``` | |
### Generate Text | |
```bash | |
curl -X POST "https://your-space.hf.space/generate" \ | |
-H "Content-Type: application/json" \ | |
-d '{ | |
"model": "llama2:7b", | |
"prompt": "The future of AI is", | |
"max_tokens": 100 | |
}' | |
``` | |
## Supported Models | |
This setup supports any model available in the Ollama registry: | |
- `llama2:7b`, `llama2:13b` | |
- `mistral:7b` | |
- `codellama:7b` | |
- `vicuna:7b` | |
- And many more... | |
## Interactive Documentation | |
Once deployed, visit `/docs` for interactive API documentation. | |
## Notes | |
- Model pulling may take several minutes depending on model size | |
- Larger models require more memory and may not work on free tier | |
- First inference may be slower as the model loads into memory | |
## Resource Requirements | |
- **Small models (7B)**: 8GB+ RAM recommended | |
- **Medium models (13B)**: 16GB+ RAM recommended | |
- **Large models (70B+)**: 32GB+ RAM required | |
Consider using smaller models like `llama2:7b` or `mistral:7b` for better performance on limited resources. |