brendon-ai commited on
Commit
69678d3
Β·
verified Β·
1 Parent(s): c0fd7e0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -47
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: Ollama API
3
  emoji: πŸ¦™
4
  colorFrom: blue
5
  colorTo: purple
@@ -8,51 +8,31 @@ pinned: false
8
  app_port: 7860
9
  ---
10
 
11
- # Ollama Model API
12
 
13
- A REST API for running Ollama models on Hugging Face Spaces.
14
 
15
  ## Features
16
 
17
- - πŸ¦™ Run Ollama models via REST API
18
- - πŸ”„ Model management (pull, list, delete)
19
- - πŸ’¬ Chat completions
20
- - πŸŽ›οΈ Configurable parameters (temperature, top_p, etc.)
21
  - πŸ“Š Health monitoring
 
22
 
23
  ## API Endpoints
24
 
25
  ### Health Check
26
- - `GET /health` - Check if the service is running
27
- - `GET /models` - List available models
28
 
29
- ### Model Management
30
- - `POST /models/pull` - Pull a model from Ollama registry
31
- - `DELETE /models/{model_name}` - Delete a model
32
-
33
- ### Chat & Completions
34
- - `POST /chat` - Chat with a model
35
  - `POST /generate` - Generate text completion
36
 
37
  ## Usage Examples
38
 
39
- ### Pull a Model
40
  ```bash
41
- curl -X POST "https://your-space.hf.space/models/pull" \
42
- -H "Content-Type: application/json" \
43
- -d '{"model": "llama2:7b"}'
44
- ```
45
-
46
- ### Chat with Model
47
- ```bash
48
- curl -X POST "https://your-space.hf.space/chat" \
49
- -H "Content-Type: application/json" \
50
- -d '{
51
- "model": "llama2:7b",
52
- "messages": [
53
- {"role": "user", "content": "Hello, how are you?"}
54
- ]
55
- }'
56
  ```
57
 
58
  ### Generate Text
@@ -60,35 +40,64 @@ curl -X POST "https://your-space.hf.space/chat" \
60
  curl -X POST "https://your-space.hf.space/generate" \
61
  -H "Content-Type: application/json" \
62
  -d '{
63
- "model": "llama2:7b",
64
  "prompt": "The future of AI is",
 
65
  "max_tokens": 100
66
  }'
67
  ```
68
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69
  ## Supported Models
70
 
71
- This setup supports any model available in the Ollama registry:
72
- - `llama2:7b`, `llama2:13b`
73
- - `mistral:7b`
74
- - `codellama:7b`
75
- - `vicuna:7b`
76
- - And many more...
77
 
78
  ## Interactive Documentation
79
 
80
- Once deployed, visit `/docs` for interactive API documentation.
 
 
81
 
82
- ## Notes
 
 
83
 
84
- - Model pulling may take several minutes depending on model size
85
- - Larger models require more memory and may not work on free tier
86
- - First inference may be slower as the model loads into memory
 
 
 
 
 
 
 
 
 
 
87
 
88
  ## Resource Requirements
89
 
90
- - **Small models (7B)**: 8GB+ RAM recommended
91
- - **Medium models (13B)**: 16GB+ RAM recommended
92
- - **Large models (70B+)**: 32GB+ RAM required
93
 
94
- Consider using smaller models like `llama2:7b` or `mistral:7b` for better performance on limited resources.
 
1
  ---
2
+ title: Ollama Generate API
3
  emoji: πŸ¦™
4
  colorFrom: blue
5
  colorTo: purple
 
8
  app_port: 7860
9
  ---
10
 
11
+ # Ollama Generate API
12
 
13
+ A simple REST API for text generation using Ollama models on Hugging Face Spaces.
14
 
15
  ## Features
16
 
17
+ - πŸ¦™ Generate text using Ollama models
18
+ - πŸŽ›οΈ Configurable parameters (temperature, top_p, max_tokens)
 
 
19
  - πŸ“Š Health monitoring
20
+ - πŸš€ Simple and lightweight API
21
 
22
  ## API Endpoints
23
 
24
  ### Health Check
25
+ - `GET /health` - Check if Ollama service is running
26
+ - `GET /` - API information and usage examples
27
 
28
+ ### Text Generation
 
 
 
 
 
29
  - `POST /generate` - Generate text completion
30
 
31
  ## Usage Examples
32
 
33
+ ### Check Health
34
  ```bash
35
+ curl "https://your-space.hf.space/health"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
  ```
37
 
38
  ### Generate Text
 
40
  curl -X POST "https://your-space.hf.space/generate" \
41
  -H "Content-Type: application/json" \
42
  -d '{
43
+ "model": "tinyllama",
44
  "prompt": "The future of AI is",
45
+ "temperature": 0.7,
46
  "max_tokens": 100
47
  }'
48
  ```
49
 
50
+ ### API Information
51
+ ```bash
52
+ curl "https://your-space.hf.space/"
53
+ ```
54
+
55
+ ## Request Parameters
56
+
57
+ | Parameter | Type | Default | Description |
58
+ |-----------|------|---------|-------------|
59
+ | `model` | string | required | Model name (e.g., "tinyllama") |
60
+ | `prompt` | string | required | Input text prompt |
61
+ | `temperature` | float | 0.7 | Sampling temperature (0.0-2.0) |
62
+ | `top_p` | float | 0.9 | Top-p sampling (0.0-1.0) |
63
+ | `max_tokens` | integer | 512 | Maximum tokens to generate (1-4096) |
64
+
65
  ## Supported Models
66
 
67
+ This API works with any Ollama model. Recommended lightweight models for Hugging Face Spaces:
68
+
69
+ - `tinyllama` - Very small and fast (~600MB)
70
+ - `phi` - Small but capable (~1.6GB)
71
+ - `llama2:7b` - Larger but more capable (~3.8GB)
 
72
 
73
  ## Interactive Documentation
74
 
75
+ Once deployed, visit `/docs` for interactive API documentation powered by FastAPI.
76
+
77
+ ## Setup Notes
78
 
79
+ - The startup script automatically pulls the `tinyllama` model
80
+ - First generation may be slower as the model loads
81
+ - Lightweight models are recommended for better performance on limited resources
82
 
83
+ ## Example Response
84
+
85
+ ```json
86
+ {
87
+ "model": "tinyllama",
88
+ "response": "The future of AI is bright and full of possibilities...",
89
+ "done": true,
90
+ "total_duration": 1234567890,
91
+ "load_duration": 123456789,
92
+ "prompt_eval_count": 10,
93
+ "eval_count": 25
94
+ }
95
+ ```
96
 
97
  ## Resource Requirements
98
 
99
+ - **TinyLlama**: ~1GB RAM, very fast
100
+ - **Phi models**: ~2GB RAM, good balance
101
+ - **Llama2 7B**: ~8GB RAM, high quality
102
 
103
+ For Hugging Face Spaces free tier, stick with TinyLlama or Phi models for best performance.