Spaces:

brendon-ai
/

faq

Sleeping

App Files Files Community

faq / README.md

brendon-ai

Update README.md

69678d3 verified about 1 month ago

preview code

raw

history blame contribute delete

2.51 kB

	---
	title: Ollama Generate API
	emoji: 🦙
	colorFrom: blue
	colorTo: purple
	sdk: docker
	pinned: false
	app_port: 7860
	---

	# Ollama Generate API

	A simple REST API for text generation using Ollama models on Hugging Face Spaces.

	## Features

	- 🦙 Generate text using Ollama models
	- 🎛️ Configurable parameters (temperature, top_p, max_tokens)
	- 📊 Health monitoring
	- 🚀 Simple and lightweight API

	## API Endpoints

	### Health Check
	- `GET /health` - Check if Ollama service is running
	- `GET /` - API information and usage examples

	### Text Generation
	- `POST /generate` - Generate text completion

	## Usage Examples

	### Check Health
	```bash
	curl "https://your-space.hf.space/health"
	```

	### Generate Text
	```bash
	curl -X POST "https://your-space.hf.space/generate" \
	-H "Content-Type: application/json" \
	-d '{
	"model": "tinyllama",
	"prompt": "The future of AI is",
	"temperature": 0.7,
	"max_tokens": 100
	}'
	```

	### API Information
	```bash
	curl "https://your-space.hf.space/"
	```

	## Request Parameters

	\| Parameter \| Type \| Default \| Description \|
	\|-----------\|------\|---------\|-------------\|
	\| `model` \| string \| required \| Model name (e.g., "tinyllama") \|
	\| `prompt` \| string \| required \| Input text prompt \|
	\| `temperature` \| float \| 0.7 \| Sampling temperature (0.0-2.0) \|
	\| `top_p` \| float \| 0.9 \| Top-p sampling (0.0-1.0) \|
	\| `max_tokens` \| integer \| 512 \| Maximum tokens to generate (1-4096) \|

	## Supported Models

	This API works with any Ollama model. Recommended lightweight models for Hugging Face Spaces:

	- `tinyllama` - Very small and fast (~600MB)
	- `phi` - Small but capable (~1.6GB)
	- `llama2:7b` - Larger but more capable (~3.8GB)

	## Interactive Documentation

	Once deployed, visit `/docs` for interactive API documentation powered by FastAPI.

	## Setup Notes

	- The startup script automatically pulls the `tinyllama` model
	- First generation may be slower as the model loads
	- Lightweight models are recommended for better performance on limited resources

	## Example Response

	```json
	{
	"model": "tinyllama",
	"response": "The future of AI is bright and full of possibilities...",
	"done": true,
	"total_duration": 1234567890,
	"load_duration": 123456789,
	"prompt_eval_count": 10,
	"eval_count": 25
	}
	```

	## Resource Requirements

	- TinyLlama: ~1GB RAM, very fast
	- Phi models: ~2GB RAM, good balance
	- Llama2 7B: ~8GB RAM, high quality

	For Hugging Face Spaces free tier, stick with TinyLlama or Phi models for best performance.