Spaces:

brendon-ai
/

faq

Running

App Files Files Community

faq / README.md

brendon-ai's picture

Update README.md

c71461d verified about 1 month ago

|

2.16 kB

metadata

title: Ollama API
emoji: 🦙
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
app_port: 7860

Ollama Model API

A REST API for running Ollama models on Hugging Face Spaces.

Features

🦙 Run Ollama models via REST API
🔄 Model management (pull, list, delete)
💬 Chat completions
🎛️ Configurable parameters (temperature, top_p, etc.)
📊 Health monitoring

API Endpoints

Health Check

GET /health - Check if the service is running
GET /models - List available models

Model Management

POST /models/pull - Pull a model from Ollama registry
DELETE /models/{model_name} - Delete a model

Chat & Completions

POST /chat - Chat with a model
POST /generate - Generate text completion

Usage Examples

Pull a Model

curl -X POST "https://your-space.hf.space/models/pull" \
  -H "Content-Type: application/json" \
  -d '{"model": "llama2:7b"}'

Chat with Model

curl -X POST "https://your-space.hf.space/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama2:7b",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Generate Text

curl -X POST "https://your-space.hf.space/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama2:7b",
    "prompt": "The future of AI is",
    "max_tokens": 100
  }'

Supported Models

This setup supports any model available in the Ollama registry:

llama2:7b, llama2:13b
mistral:7b
codellama:7b
vicuna:7b
And many more...

Interactive Documentation

Once deployed, visit /docs for interactive API documentation.

Notes

Model pulling may take several minutes depending on model size
Larger models require more memory and may not work on free tier
First inference may be slower as the model loads into memory

Resource Requirements

Small models (7B): 8GB+ RAM recommended
Medium models (13B): 16GB+ RAM recommended
Large models (70B+): 32GB+ RAM required

Consider using smaller models like llama2:7b or mistral:7b for better performance on limited resources.