Spaces:

brendon-ai
/

faq

Running

App Files Files Community

brendon-ai commited on Jun 17

Commit

69678d3

verified ·

1 Parent(s): c0fd7e0

Update README.md

Browse files

Files changed (1) hide show

README.md +56 -47

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: Ollama API
 emoji: 🦙
 colorFrom: blue
 colorTo: purple
@@ -8,51 +8,31 @@ pinned: false
 app_port: 7860
 ---
-# Ollama Model API
-A REST API for running Ollama models on Hugging Face Spaces.
 ## Features
-- 🦙 Run Ollama models via REST API
-- 🔄 Model management (pull, list, delete)
-- 💬 Chat completions
-- 🎛️ Configurable parameters (temperature, top_p, etc.)
 - 📊 Health monitoring
 ## API Endpoints
 ### Health Check
-- `GET /health` - Check if the service is running
-- `GET /models` - List available models
-### Model Management
-- `POST /models/pull` - Pull a model from Ollama registry
-- `DELETE /models/{model_name}` - Delete a model
-### Chat & Completions
-- `POST /chat` - Chat with a model
 - `POST /generate` - Generate text completion
 ## Usage Examples
-### Pull a Model
 ```bash
-curl -X POST "https://your-space.hf.space/models/pull" \
-  -H "Content-Type: application/json" \
-  -d '{"model": "llama2:7b"}'
-```
-### Chat with Model
-```bash
-curl -X POST "https://your-space.hf.space/chat" \
-  -H "Content-Type: application/json" \
-  -d '{
-    "model": "llama2:7b",
-    "messages": [
-      {"role": "user", "content": "Hello, how are you?"}
-    ]
-  }'
 ```
 ### Generate Text
@@ -60,35 +40,64 @@ curl -X POST "https://your-space.hf.space/chat" \
 curl -X POST "https://your-space.hf.space/generate" \
   -H "Content-Type: application/json" \
   -d '{
-    "model": "llama2:7b",
     "prompt": "The future of AI is",
     "max_tokens": 100
   }'
 ```
 ## Supported Models
-This setup supports any model available in the Ollama registry:
-- `llama2:7b`, `llama2:13b`
-- `mistral:7b`
-- `codellama:7b`
-- `vicuna:7b`
-- And many more...
 ## Interactive Documentation
-Once deployed, visit `/docs` for interactive API documentation.
-## Notes
-- Model pulling may take several minutes depending on model size
-- Larger models require more memory and may not work on free tier
-- First inference may be slower as the model loads into memory
 ## Resource Requirements
-- **Small models (7B)**: 8GB+ RAM recommended
-- **Medium models (13B)**: 16GB+ RAM recommended
-- **Large models (70B+)**: 32GB+ RAM required
-Consider using smaller models like `llama2:7b` or `mistral:7b` for better performance on limited resources.

 ---
+title: Ollama Generate API
 emoji: 🦙
 colorFrom: blue
 colorTo: purple
 app_port: 7860
 ---
+# Ollama Generate API
+A simple REST API for text generation using Ollama models on Hugging Face Spaces.
 ## Features
+- 🦙 Generate text using Ollama models
+- 🎛️ Configurable parameters (temperature, top_p, max_tokens)
 - 📊 Health monitoring
+- 🚀 Simple and lightweight API
 ## API Endpoints
 ### Health Check
+- `GET /health` - Check if Ollama service is running
+- `GET /` - API information and usage examples
+### Text Generation
 - `POST /generate` - Generate text completion
 ## Usage Examples
+### Check Health
 ```bash
+curl "https://your-space.hf.space/health"
 ```
 ### Generate Text
 curl -X POST "https://your-space.hf.space/generate" \
   -H "Content-Type: application/json" \
   -d '{
+    "model": "tinyllama",
     "prompt": "The future of AI is",
+    "temperature": 0.7,
     "max_tokens": 100
   }'
 ```
+### API Information
+```bash
+curl "https://your-space.hf.space/"
+```
+## Request Parameters
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `model` | string | required | Model name (e.g., "tinyllama") |
+| `prompt` | string | required | Input text prompt |
+| `temperature` | float | 0.7 | Sampling temperature (0.0-2.0) |
+| `top_p` | float | 0.9 | Top-p sampling (0.0-1.0) |
+| `max_tokens` | integer | 512 | Maximum tokens to generate (1-4096) |
 ## Supported Models
+This API works with any Ollama model. Recommended lightweight models for Hugging Face Spaces:
+- `tinyllama` - Very small and fast (~600MB)
+- `phi` - Small but capable (~1.6GB)
+- `llama2:7b` - Larger but more capable (~3.8GB)
 ## Interactive Documentation
+Once deployed, visit `/docs` for interactive API documentation powered by FastAPI.
+## Setup Notes
+- The startup script automatically pulls the `tinyllama` model
+- First generation may be slower as the model loads
+- Lightweight models are recommended for better performance on limited resources
+## Example Response
+```json
+{
+  "model": "tinyllama",
+  "response": "The future of AI is bright and full of possibilities...",
+  "done": true,
+  "total_duration": 1234567890,
+  "load_duration": 123456789,
+  "prompt_eval_count": 10,
+  "eval_count": 25
+}
+```
 ## Resource Requirements
+- **TinyLlama**: ~1GB RAM, very fast
+- **Phi models**: ~2GB RAM, good balance
+- **Llama2 7B**: ~8GB RAM, high quality
+For Hugging Face Spaces free tier, stick with TinyLlama or Phi models for best performance.