Spaces:
Running
Running
File size: 2,159 Bytes
a34e2be c71461d 32f060b c71461d a34e2be c71461d b908fcf a34e2be c71461d 32f060b c71461d 32f060b c71461d 32f060b c71461d 32f060b c71461d 32f060b c71461d 32f060b c71461d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
---
title: Ollama API
emoji: π¦
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
app_port: 7860
---
# Ollama Model API
A REST API for running Ollama models on Hugging Face Spaces.
## Features
- π¦ Run Ollama models via REST API
- π Model management (pull, list, delete)
- π¬ Chat completions
- ποΈ Configurable parameters (temperature, top_p, etc.)
- π Health monitoring
## API Endpoints
### Health Check
- `GET /health` - Check if the service is running
- `GET /models` - List available models
### Model Management
- `POST /models/pull` - Pull a model from Ollama registry
- `DELETE /models/{model_name}` - Delete a model
### Chat & Completions
- `POST /chat` - Chat with a model
- `POST /generate` - Generate text completion
## Usage Examples
### Pull a Model
```bash
curl -X POST "https://your-space.hf.space/models/pull" \
-H "Content-Type: application/json" \
-d '{"model": "llama2:7b"}'
```
### Chat with Model
```bash
curl -X POST "https://your-space.hf.space/chat" \
-H "Content-Type: application/json" \
-d '{
"model": "llama2:7b",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
]
}'
```
### Generate Text
```bash
curl -X POST "https://your-space.hf.space/generate" \
-H "Content-Type: application/json" \
-d '{
"model": "llama2:7b",
"prompt": "The future of AI is",
"max_tokens": 100
}'
```
## Supported Models
This setup supports any model available in the Ollama registry:
- `llama2:7b`, `llama2:13b`
- `mistral:7b`
- `codellama:7b`
- `vicuna:7b`
- And many more...
## Interactive Documentation
Once deployed, visit `/docs` for interactive API documentation.
## Notes
- Model pulling may take several minutes depending on model size
- Larger models require more memory and may not work on free tier
- First inference may be slower as the model loads into memory
## Resource Requirements
- **Small models (7B)**: 8GB+ RAM recommended
- **Medium models (13B)**: 16GB+ RAM recommended
- **Large models (70B+)**: 32GB+ RAM required
Consider using smaller models like `llama2:7b` or `mistral:7b` for better performance on limited resources. |