File size: 2,159 Bytes
a34e2be
c71461d
 
32f060b
c71461d
a34e2be
c71461d
b908fcf
a34e2be
 
c71461d
32f060b
c71461d
32f060b
c71461d
32f060b
c71461d
 
 
 
 
32f060b
c71461d
32f060b
c71461d
 
 
32f060b
c71461d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
title: Ollama API
emoji: πŸ¦™
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
app_port: 7860
---

# Ollama Model API

A REST API for running Ollama models on Hugging Face Spaces.

## Features

- πŸ¦™ Run Ollama models via REST API
- πŸ”„ Model management (pull, list, delete)
- πŸ’¬ Chat completions
- πŸŽ›οΈ Configurable parameters (temperature, top_p, etc.)
- πŸ“Š Health monitoring

## API Endpoints

### Health Check
- `GET /health` - Check if the service is running
- `GET /models` - List available models

### Model Management
- `POST /models/pull` - Pull a model from Ollama registry
- `DELETE /models/{model_name}` - Delete a model

### Chat & Completions
- `POST /chat` - Chat with a model
- `POST /generate` - Generate text completion

## Usage Examples

### Pull a Model
```bash
curl -X POST "https://your-space.hf.space/models/pull" \
  -H "Content-Type: application/json" \
  -d '{"model": "llama2:7b"}'
```

### Chat with Model
```bash
curl -X POST "https://your-space.hf.space/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama2:7b",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'
```

### Generate Text
```bash
curl -X POST "https://your-space.hf.space/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama2:7b",
    "prompt": "The future of AI is",
    "max_tokens": 100
  }'
```

## Supported Models

This setup supports any model available in the Ollama registry:
- `llama2:7b`, `llama2:13b`
- `mistral:7b`
- `codellama:7b`
- `vicuna:7b`
- And many more...

## Interactive Documentation

Once deployed, visit `/docs` for interactive API documentation.

## Notes

- Model pulling may take several minutes depending on model size
- Larger models require more memory and may not work on free tier
- First inference may be slower as the model loads into memory

## Resource Requirements

- **Small models (7B)**: 8GB+ RAM recommended
- **Medium models (13B)**: 16GB+ RAM recommended  
- **Large models (70B+)**: 32GB+ RAM required

Consider using smaller models like `llama2:7b` or `mistral:7b` for better performance on limited resources.