sema-api / docs /DEPLOYMENT_ARCHITECTURE.md
kamau1's picture
Added documentation for using custom models
0745795
# Deployment Architecture & Infrastructure
## πŸ—οΈ Current Architecture
### HuggingFace Spaces Deployment
**Platform:** HuggingFace Spaces
**Runtime:** Python 3.9+ with FastAPI
**URL:** `https://sematech-sema-api.hf.space`
**Auto-deployment:** Connected to Git repository
### System Components
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Sema Translation API β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ FastAPI Application Server β”‚
β”‚ β”œβ”€β”€ API Endpoints (v1) β”‚
β”‚ β”œβ”€β”€ Request Middleware (Rate Limiting, Logging) β”‚
β”‚ β”œβ”€β”€ Authentication (Future) β”‚
β”‚ └── Response Middleware (CORS, Headers) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Translation Services β”‚
β”‚ β”œβ”€β”€ CTranslate2 Translation Engine β”‚
β”‚ β”œβ”€β”€ SentencePiece Tokenizer β”‚
β”‚ β”œβ”€β”€ FastText Language Detection β”‚
β”‚ └── Language Database (FLORES-200) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Custom HuggingFace Models β”‚
β”‚ β”œβ”€β”€ sematech/sema-utils Repository β”‚
β”‚ β”œβ”€β”€ NLLB-200 3.3B (CTranslate2 Optimized) β”‚
β”‚ β”œβ”€β”€ FastText LID.176 Model β”‚
β”‚ └── SentencePiece Tokenizer β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Monitoring & Observability β”‚
β”‚ β”œβ”€β”€ Prometheus Metrics β”‚
β”‚ β”œβ”€β”€ Structured Logging (JSON) β”‚
β”‚ β”œβ”€β”€ Request Tracking (UUID) β”‚
β”‚ └── Performance Timing β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
### Model Storage & Caching
**HuggingFace Hub Integration:**
```python
# Model loading from unified repository
model_path = snapshot_download(
repo_id="sematech/sema-utils",
cache_dir="/app/models",
local_files_only=False
)
# Local caching strategy
CACHE_STRUCTURE = {
"/app/models/": {
"sematech--sema-utils/": {
"translation/": {
"nllb-200-3.3B-ct2/": "CTranslate2 model files",
"tokenizer/": "SentencePiece tokenizer"
},
"language_detection/": {
"lid.176.bin": "FastText model"
}
}
}
}
```
## πŸš€ Deployment Process
### 1. HuggingFace Spaces Configuration
**Space Configuration (`README.md`):**
```yaml
---
title: Sema Translation API
emoji: 🌍
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit
app_port: 8000
---
```
**Dockerfile:**
```dockerfile
FROM python:3.9-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Copy requirements and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Expose port
EXPOSE 8000
# Start application
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
```
### 2. Environment Configuration
**Environment Variables:**
```bash
# Application settings
APP_NAME="Sema Translation API"
APP_VERSION="2.0.0"
ENVIRONMENT="production"
# Model settings
MODEL_CACHE_DIR="/app/models"
HF_HOME="/app/models"
# API settings
MAX_CHARACTERS=5000
RATE_LIMIT_PER_MINUTE=60
# Monitoring
ENABLE_METRICS=true
LOG_LEVEL="INFO"
# HuggingFace Hub
HF_TOKEN="your_token_here" # Optional for private models
```
### 3. Startup Process
**Application Initialization:**
```python
@app.on_event("startup")
async def startup_event():
"""Initialize application on startup"""
print("[INFO] Starting Sema Translation API v2.0.0")
print("[INFO] Loading translation models...")
try:
# Load models from HuggingFace Hub
load_models()
# Initialize metrics
if settings.enable_metrics:
setup_prometheus_metrics()
# Setup logging
configure_structured_logging()
print("[SUCCESS] API started successfully")
print(f"[CONFIG] Environment: {settings.environment}")
print(f"[ENDPOINT] Documentation: / (Swagger UI)")
print(f"[ENDPOINT] API v1: /api/v1/")
except Exception as e:
print(f"[ERROR] Startup failed: {e}")
raise
```
## πŸ“Š Performance Characteristics
### Resource Requirements
**Memory Usage:**
- **Model Loading**: ~3.2GB RAM
- **Per Request**: 50-100MB additional
- **Concurrent Requests**: Linear scaling
- **Peak Usage**: ~4-5GB with multiple concurrent requests
**CPU Usage:**
- **Model Inference**: CPU-intensive (CTranslate2 optimized)
- **Language Detection**: Minimal CPU usage
- **Request Processing**: Low overhead
- **Recommended**: 4+ CPU cores for production
**Storage:**
- **Model Files**: ~2.8GB total
- **Application Code**: ~50MB
- **Logs**: Variable (recommend log rotation)
- **Cache**: Automatic HuggingFace Hub caching
### Performance Benchmarks
**Translation Speed:**
```
Text Length | Inference Time | Total Response Time
----------------|----------------|--------------------
< 50 chars | 0.2-0.5s | 0.3-0.7s
50-200 chars | 0.5-1.2s | 0.7-1.5s
200-500 chars | 1.2-2.5s | 1.5-3.0s
500+ chars | 2.5-5.0s | 3.0-6.0s
```
**Language Detection Speed:**
```
Text Length | Detection Time
----------------|---------------
Any length | 0.01-0.05s
```
**Concurrent Request Handling:**
```
Concurrent Users | Response Time (95th percentile)
-----------------|--------------------------------
1-5 users | < 2 seconds
5-10 users | < 3 seconds
10-20 users | < 5 seconds
20+ users | May require scaling
```
## πŸ”§ Monitoring & Observability
### Prometheus Metrics
**Available Metrics:**
```python
# Request metrics
sema_requests_total{endpoint, status}
sema_request_duration_seconds{endpoint}
# Translation metrics
sema_translations_total{source_lang, target_lang}
sema_characters_translated_total
sema_translation_duration_seconds{source_lang, target_lang}
# Language detection metrics
sema_language_detections_total{detected_lang}
sema_detection_duration_seconds
# Error metrics
sema_errors_total{error_type, endpoint}
# System metrics
sema_model_load_time_seconds
sema_memory_usage_bytes
```
**Metrics Endpoint:**
```bash
curl https://sematech-sema-api.hf.space/metrics
```
### Structured Logging
**Log Format:**
```json
{
"timestamp": "2024-06-21T14:30:25.123Z",
"level": "INFO",
"event": "translation_request",
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"source_language": "swh_Latn",
"target_language": "eng_Latn",
"character_count": 17,
"inference_time": 0.234,
"total_time": 1.234,
"client_ip": "192.168.1.1"
}
```
### Health Monitoring
**Health Check Endpoints:**
```bash
# Basic status
curl https://sematech-sema-api.hf.space/status
# Detailed health
curl https://sematech-sema-api.hf.space/health
# Model validation
curl https://sematech-sema-api.hf.space/health | jq '.models_loaded'
```
## πŸ”„ CI/CD Pipeline
### Automated Deployment
**Git Integration:**
1. **Code Push**: Push to main branch
2. **Auto-Build**: HuggingFace Spaces builds Docker image
3. **Model Download**: Automatic model download from `sematech/sema-utils`
4. **Health Check**: Automatic health validation
5. **Live Deployment**: Zero-downtime deployment
**Deployment Validation:**
```bash
# Automated health check after deployment
curl -f https://sematech-sema-api.hf.space/health || exit 1
# Test translation functionality
curl -X POST https://sematech-sema-api.hf.space/api/v1/translate \
-H "Content-Type: application/json" \
-d '{"text": "Hello", "target_language": "swh_Latn"}' || exit 1
```
### Model Updates
**Model Versioning Strategy:**
```python
# Check for model updates
def check_model_updates():
"""Check if models need updating"""
try:
repo_info = api.repo_info("sematech/sema-utils")
local_commit = get_local_commit_hash()
if local_commit != repo_info.sha:
logger.info("model_update_available")
return True
return False
except Exception as e:
logger.error("update_check_failed", error=str(e))
return False
# Graceful model reloading
async def reload_models():
"""Reload models without downtime"""
global translator, tokenizer, language_detector
# Download updated models
new_model_path = download_models()
# Load new models
new_translator = load_translation_model(new_model_path)
new_tokenizer = load_tokenizer(new_model_path)
new_detector = load_detection_model(new_model_path)
# Atomic swap
translator = new_translator
tokenizer = new_tokenizer
language_detector = new_detector
logger.info("models_reloaded_successfully")
```
## πŸ”’ Security Considerations
### Current Security Measures
**Input Validation:**
- Pydantic schema validation
- Character length limits
- Content type validation
- Request size limits
**Rate Limiting:**
- IP-based rate limiting (60 req/min)
- Sliding window implementation
- Graceful degradation
**CORS Configuration:**
```python
app.add_middleware(
CORSMiddleware,
allow_origins=["*"], # Configure for production
allow_credentials=True,
allow_methods=["GET", "POST"],
allow_headers=["*"],
)
```
### Future Security Enhancements
**Authentication & Authorization:**
- API key management
- JWT token validation
- Role-based access control
- Usage quotas per user
**Enhanced Security:**
- Request signing
- IP whitelisting
- DDoS protection
- Input sanitization
## πŸš€ Scaling Considerations
### Horizontal Scaling
**Load Balancing Strategy:**
```nginx
upstream sema_api {
server sema-api-1.hf.space;
server sema-api-2.hf.space;
server sema-api-3.hf.space;
}
server {
listen 80;
location / {
proxy_pass http://sema_api;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
```
**Auto-scaling Triggers:**
- CPU usage > 80%
- Memory usage > 85%
- Response time > 5 seconds
- Queue length > 10 requests
### Performance Optimization
**Caching Strategy:**
- Redis for translation caching
- CDN for static content
- Model result caching
- Language metadata caching
**Database Integration:**
- PostgreSQL for user data
- Analytics database for metrics
- Read replicas for scaling
- Connection pooling
This architecture provides a solid foundation for scaling the Sema API to handle enterprise-level traffic while maintaining high performance and reliability.