Spaces:

brendon-ai
/

faq

Running

faq

File size: 2,159 Bytes

a34e2be
c71461d
 
32f060b
c71461d
a34e2be
c71461d
b908fcf
a34e2be
 
c71461d
32f060b
c71461d
32f060b
c71461d
32f060b
c71461d
 
 
 
 
32f060b
c71461d
32f060b
c71461d
 
 
32f060b
c71461d

---
title: Ollama API
emoji: 🦙
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
app_port: 7860
---

# Ollama Model API

A REST API for running Ollama models on Hugging Face Spaces.

## Features

- 🦙 Run Ollama models via REST API
- 🔄 Model management (pull, list, delete)
- 💬 Chat completions
- 🎛️ Configurable parameters (temperature, top_p, etc.)
- 📊 Health monitoring

## API Endpoints

### Health Check
- `GET /health` - Check if the service is running
- `GET /models` - List available models

### Model Management
- `POST /models/pull` - Pull a model from Ollama registry
- `DELETE /models/{model_name}` - Delete a model

### Chat & Completions
- `POST /chat` - Chat with a model
- `POST /generate` - Generate text completion

## Usage Examples

### Pull a Model
```bash
curl -X POST "https://your-space.hf.space/models/pull" \
  -H "Content-Type: application/json" \
  -d '{"model": "llama2:7b"}'
```

### Chat with Model
```bash
curl -X POST "https://your-space.hf.space/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama2:7b",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'
```

### Generate Text
```bash
curl -X POST "https://your-space.hf.space/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama2:7b",
    "prompt": "The future of AI is",
    "max_tokens": 100
  }'
```

## Supported Models

This setup supports any model available in the Ollama registry:
- `llama2:7b`, `llama2:13b`
- `mistral:7b`
- `codellama:7b`
- `vicuna:7b`
- And many more...

## Interactive Documentation

Once deployed, visit `/docs` for interactive API documentation.

## Notes

- Model pulling may take several minutes depending on model size
- Larger models require more memory and may not work on free tier
- First inference may be slower as the model loads into memory

## Resource Requirements

- **Small models (7B)**: 8GB+ RAM recommended
- **Medium models (13B)**: 16GB+ RAM recommended  
- **Large models (70B+)**: 32GB+ RAM required

Consider using smaller models like `llama2:7b` or `mistral:7b` for better performance on limited resources.