File size: 2,509 Bytes
a34e2be
69678d3
c71461d
32f060b
c71461d
a34e2be
c71461d
b908fcf
a34e2be
 
69678d3
32f060b
69678d3
32f060b
c71461d
32f060b
69678d3
 
c71461d
69678d3
32f060b
c71461d
32f060b
c71461d
69678d3
 
32f060b
69678d3
c71461d
 
 
 
69678d3
c71461d
69678d3
c71461d
 
 
 
 
 
 
69678d3
c71461d
69678d3
c71461d
 
 
 
69678d3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c71461d
 
69678d3
 
 
 
 
c71461d
 
 
69678d3
 
 
c71461d
69678d3
 
 
c71461d
69678d3
 
 
 
 
 
 
 
 
 
 
 
 
c71461d
 
 
69678d3
 
 
c71461d
69678d3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
title: Ollama Generate API
emoji: πŸ¦™
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
app_port: 7860
---

# Ollama Generate API

A simple REST API for text generation using Ollama models on Hugging Face Spaces.

## Features

- πŸ¦™ Generate text using Ollama models
- πŸŽ›οΈ Configurable parameters (temperature, top_p, max_tokens)
- πŸ“Š Health monitoring
- πŸš€ Simple and lightweight API

## API Endpoints

### Health Check
- `GET /health` - Check if Ollama service is running
- `GET /` - API information and usage examples

### Text Generation
- `POST /generate` - Generate text completion

## Usage Examples

### Check Health
```bash
curl "https://your-space.hf.space/health"
```

### Generate Text
```bash
curl -X POST "https://your-space.hf.space/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tinyllama",
    "prompt": "The future of AI is",
    "temperature": 0.7,
    "max_tokens": 100
  }'
```

### API Information
```bash
curl "https://your-space.hf.space/"
```

## Request Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `model` | string | required | Model name (e.g., "tinyllama") |
| `prompt` | string | required | Input text prompt |
| `temperature` | float | 0.7 | Sampling temperature (0.0-2.0) |
| `top_p` | float | 0.9 | Top-p sampling (0.0-1.0) |
| `max_tokens` | integer | 512 | Maximum tokens to generate (1-4096) |

## Supported Models

This API works with any Ollama model. Recommended lightweight models for Hugging Face Spaces:

- `tinyllama` - Very small and fast (~600MB)
- `phi` - Small but capable (~1.6GB)
- `llama2:7b` - Larger but more capable (~3.8GB)

## Interactive Documentation

Once deployed, visit `/docs` for interactive API documentation powered by FastAPI.

## Setup Notes

- The startup script automatically pulls the `tinyllama` model
- First generation may be slower as the model loads
- Lightweight models are recommended for better performance on limited resources

## Example Response

```json
{
  "model": "tinyllama",
  "response": "The future of AI is bright and full of possibilities...",
  "done": true,
  "total_duration": 1234567890,
  "load_duration": 123456789,
  "prompt_eval_count": 10,
  "eval_count": 25
}
```

## Resource Requirements

- **TinyLlama**: ~1GB RAM, very fast
- **Phi models**: ~2GB RAM, good balance
- **Llama2 7B**: ~8GB RAM, high quality

For Hugging Face Spaces free tier, stick with TinyLlama or Phi models for best performance.