brendon-ai commited on
Commit
c71461d
Β·
verified Β·
1 Parent(s): 0165548

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -23
README.md CHANGED
@@ -1,35 +1,94 @@
1
  ---
2
- title: NeuroBERT-Tiny API
3
- emoji: πŸ€–
4
  colorFrom: blue
5
- colorTo: green
6
  sdk: docker
 
7
  app_port: 7860
8
  ---
9
 
10
- # NeuroBERT-Tiny Masked Language Model API
11
 
12
- This Space hosts a FastAPI application that performs Masked Language Modeling using the [boltuix/NeuroBERT-Tiny](https://huggingface.co/boltuix/NeuroBERT-Tiny) model.
13
 
14
- ## Endpoints:
15
 
16
- * **Health Check (GET /health):**
17
- Returns a simple message to confirm the API is running.
18
- Example: `curl https://brendon-ai-faq.hf.space/health`
 
 
19
 
20
- * **Predict (POST /predict):**
21
- Accepts a JSON payload with a `text` field containing a sentence with `[MASK]` tokens.
22
- Returns a list of top 5 predictions for each masked position.
23
 
24
- Example `curl` request:
25
- ```bash
26
- curl -X POST \
27
- -H "Content-Type: application/json" \
28
- -d '{"text": "The quick brown fox jumps over the [MASK] dog."}' \
29
- [https://brendon-ai-faq.hf.space/predict](https://brendon-ai-faq.hf.space/predict)
30
- ```
31
 
32
- ## Interactive API Documentation:
33
- You can find the full interactive API documentation at:
34
- * [Swagger UI](https://brendon-ai-faq.hf.space/docs)
35
- * [ReDoc](https://brendon-ai-faq.hf.space/redoc)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Ollama API
3
+ emoji: πŸ¦™
4
  colorFrom: blue
5
+ colorTo: purple
6
  sdk: docker
7
+ pinned: false
8
  app_port: 7860
9
  ---
10
 
11
+ # Ollama Model API
12
 
13
+ A REST API for running Ollama models on Hugging Face Spaces.
14
 
15
+ ## Features
16
 
17
+ - πŸ¦™ Run Ollama models via REST API
18
+ - πŸ”„ Model management (pull, list, delete)
19
+ - πŸ’¬ Chat completions
20
+ - πŸŽ›οΈ Configurable parameters (temperature, top_p, etc.)
21
+ - πŸ“Š Health monitoring
22
 
23
+ ## API Endpoints
 
 
24
 
25
+ ### Health Check
26
+ - `GET /health` - Check if the service is running
27
+ - `GET /models` - List available models
 
 
 
 
28
 
29
+ ### Model Management
30
+ - `POST /models/pull` - Pull a model from Ollama registry
31
+ - `DELETE /models/{model_name}` - Delete a model
32
+
33
+ ### Chat & Completions
34
+ - `POST /chat` - Chat with a model
35
+ - `POST /generate` - Generate text completion
36
+
37
+ ## Usage Examples
38
+
39
+ ### Pull a Model
40
+ ```bash
41
+ curl -X POST "https://your-space.hf.space/models/pull" \
42
+ -H "Content-Type: application/json" \
43
+ -d '{"model": "llama2:7b"}'
44
+ ```
45
+
46
+ ### Chat with Model
47
+ ```bash
48
+ curl -X POST "https://your-space.hf.space/chat" \
49
+ -H "Content-Type: application/json" \
50
+ -d '{
51
+ "model": "llama2:7b",
52
+ "messages": [
53
+ {"role": "user", "content": "Hello, how are you?"}
54
+ ]
55
+ }'
56
+ ```
57
+
58
+ ### Generate Text
59
+ ```bash
60
+ curl -X POST "https://your-space.hf.space/generate" \
61
+ -H "Content-Type: application/json" \
62
+ -d '{
63
+ "model": "llama2:7b",
64
+ "prompt": "The future of AI is",
65
+ "max_tokens": 100
66
+ }'
67
+ ```
68
+
69
+ ## Supported Models
70
+
71
+ This setup supports any model available in the Ollama registry:
72
+ - `llama2:7b`, `llama2:13b`
73
+ - `mistral:7b`
74
+ - `codellama:7b`
75
+ - `vicuna:7b`
76
+ - And many more...
77
+
78
+ ## Interactive Documentation
79
+
80
+ Once deployed, visit `/docs` for interactive API documentation.
81
+
82
+ ## Notes
83
+
84
+ - Model pulling may take several minutes depending on model size
85
+ - Larger models require more memory and may not work on free tier
86
+ - First inference may be slower as the model loads into memory
87
+
88
+ ## Resource Requirements
89
+
90
+ - **Small models (7B)**: 8GB+ RAM recommended
91
+ - **Medium models (13B)**: 16GB+ RAM recommended
92
+ - **Large models (70B+)**: 32GB+ RAM required
93
+
94
+ Consider using smaller models like `llama2:7b` or `mistral:7b` for better performance on limited resources.