|
# Deployment Architecture & Infrastructure |
|
|
|
## ποΈ Current Architecture |
|
|
|
### HuggingFace Spaces Deployment |
|
|
|
**Platform:** HuggingFace Spaces |
|
**Runtime:** Python 3.9+ with FastAPI |
|
**URL:** `https://sematech-sema-api.hf.space` |
|
**Auto-deployment:** Connected to Git repository |
|
|
|
### System Components |
|
|
|
``` |
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
β Sema Translation API β |
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
|
β FastAPI Application Server β |
|
β βββ API Endpoints (v1) β |
|
β βββ Request Middleware (Rate Limiting, Logging) β |
|
β βββ Authentication (Future) β |
|
β βββ Response Middleware (CORS, Headers) β |
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
|
β Translation Services β |
|
β βββ CTranslate2 Translation Engine β |
|
β βββ SentencePiece Tokenizer β |
|
β βββ FastText Language Detection β |
|
β βββ Language Database (FLORES-200) β |
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
|
β Custom HuggingFace Models β |
|
β βββ sematech/sema-utils Repository β |
|
β βββ NLLB-200 3.3B (CTranslate2 Optimized) β |
|
β βββ FastText LID.176 Model β |
|
β βββ SentencePiece Tokenizer β |
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
|
β Monitoring & Observability β |
|
β βββ Prometheus Metrics β |
|
β βββ Structured Logging (JSON) β |
|
β βββ Request Tracking (UUID) β |
|
β βββ Performance Timing β |
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
``` |
|
|
|
### Model Storage & Caching |
|
|
|
**HuggingFace Hub Integration:** |
|
```python |
|
# Model loading from unified repository |
|
model_path = snapshot_download( |
|
repo_id="sematech/sema-utils", |
|
cache_dir="/app/models", |
|
local_files_only=False |
|
) |
|
|
|
# Local caching strategy |
|
CACHE_STRUCTURE = { |
|
"/app/models/": { |
|
"sematech--sema-utils/": { |
|
"translation/": { |
|
"nllb-200-3.3B-ct2/": "CTranslate2 model files", |
|
"tokenizer/": "SentencePiece tokenizer" |
|
}, |
|
"language_detection/": { |
|
"lid.176.bin": "FastText model" |
|
} |
|
} |
|
} |
|
} |
|
``` |
|
|
|
## π Deployment Process |
|
|
|
### 1. HuggingFace Spaces Configuration |
|
|
|
**Space Configuration (`README.md`):** |
|
```yaml |
|
--- |
|
title: Sema Translation API |
|
emoji: π |
|
colorFrom: blue |
|
colorTo: green |
|
sdk: docker |
|
pinned: false |
|
license: mit |
|
app_port: 8000 |
|
--- |
|
``` |
|
|
|
**Dockerfile:** |
|
```dockerfile |
|
FROM python:3.9-slim |
|
|
|
WORKDIR /app |
|
|
|
# Install system dependencies |
|
RUN apt-get update && apt-get install -y \ |
|
build-essential \ |
|
&& rm -rf /var/lib/apt/lists/* |
|
|
|
# Copy requirements and install Python dependencies |
|
COPY requirements.txt . |
|
RUN pip install --no-cache-dir -r requirements.txt |
|
|
|
# Copy application code |
|
COPY . . |
|
|
|
# Expose port |
|
EXPOSE 8000 |
|
|
|
# Start application |
|
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"] |
|
``` |
|
|
|
### 2. Environment Configuration |
|
|
|
**Environment Variables:** |
|
```bash |
|
# Application settings |
|
APP_NAME="Sema Translation API" |
|
APP_VERSION="2.0.0" |
|
ENVIRONMENT="production" |
|
|
|
# Model settings |
|
MODEL_CACHE_DIR="/app/models" |
|
HF_HOME="/app/models" |
|
|
|
# API settings |
|
MAX_CHARACTERS=5000 |
|
RATE_LIMIT_PER_MINUTE=60 |
|
|
|
# Monitoring |
|
ENABLE_METRICS=true |
|
LOG_LEVEL="INFO" |
|
|
|
# HuggingFace Hub |
|
HF_TOKEN="your_token_here" # Optional for private models |
|
``` |
|
|
|
### 3. Startup Process |
|
|
|
**Application Initialization:** |
|
```python |
|
@app.on_event("startup") |
|
async def startup_event(): |
|
"""Initialize application on startup""" |
|
print("[INFO] Starting Sema Translation API v2.0.0") |
|
print("[INFO] Loading translation models...") |
|
|
|
try: |
|
# Load models from HuggingFace Hub |
|
load_models() |
|
|
|
# Initialize metrics |
|
if settings.enable_metrics: |
|
setup_prometheus_metrics() |
|
|
|
# Setup logging |
|
configure_structured_logging() |
|
|
|
print("[SUCCESS] API started successfully") |
|
print(f"[CONFIG] Environment: {settings.environment}") |
|
print(f"[ENDPOINT] Documentation: / (Swagger UI)") |
|
print(f"[ENDPOINT] API v1: /api/v1/") |
|
|
|
except Exception as e: |
|
print(f"[ERROR] Startup failed: {e}") |
|
raise |
|
``` |
|
|
|
## π Performance Characteristics |
|
|
|
### Resource Requirements |
|
|
|
**Memory Usage:** |
|
- **Model Loading**: ~3.2GB RAM |
|
- **Per Request**: 50-100MB additional |
|
- **Concurrent Requests**: Linear scaling |
|
- **Peak Usage**: ~4-5GB with multiple concurrent requests |
|
|
|
**CPU Usage:** |
|
- **Model Inference**: CPU-intensive (CTranslate2 optimized) |
|
- **Language Detection**: Minimal CPU usage |
|
- **Request Processing**: Low overhead |
|
- **Recommended**: 4+ CPU cores for production |
|
|
|
**Storage:** |
|
- **Model Files**: ~2.8GB total |
|
- **Application Code**: ~50MB |
|
- **Logs**: Variable (recommend log rotation) |
|
- **Cache**: Automatic HuggingFace Hub caching |
|
|
|
### Performance Benchmarks |
|
|
|
**Translation Speed:** |
|
``` |
|
Text Length | Inference Time | Total Response Time |
|
----------------|----------------|-------------------- |
|
< 50 chars | 0.2-0.5s | 0.3-0.7s |
|
50-200 chars | 0.5-1.2s | 0.7-1.5s |
|
200-500 chars | 1.2-2.5s | 1.5-3.0s |
|
500+ chars | 2.5-5.0s | 3.0-6.0s |
|
``` |
|
|
|
**Language Detection Speed:** |
|
``` |
|
Text Length | Detection Time |
|
----------------|--------------- |
|
Any length | 0.01-0.05s |
|
``` |
|
|
|
**Concurrent Request Handling:** |
|
``` |
|
Concurrent Users | Response Time (95th percentile) |
|
-----------------|-------------------------------- |
|
1-5 users | < 2 seconds |
|
5-10 users | < 3 seconds |
|
10-20 users | < 5 seconds |
|
20+ users | May require scaling |
|
``` |
|
|
|
## π§ Monitoring & Observability |
|
|
|
### Prometheus Metrics |
|
|
|
**Available Metrics:** |
|
```python |
|
# Request metrics |
|
sema_requests_total{endpoint, status} |
|
sema_request_duration_seconds{endpoint} |
|
|
|
# Translation metrics |
|
sema_translations_total{source_lang, target_lang} |
|
sema_characters_translated_total |
|
sema_translation_duration_seconds{source_lang, target_lang} |
|
|
|
# Language detection metrics |
|
sema_language_detections_total{detected_lang} |
|
sema_detection_duration_seconds |
|
|
|
# Error metrics |
|
sema_errors_total{error_type, endpoint} |
|
|
|
# System metrics |
|
sema_model_load_time_seconds |
|
sema_memory_usage_bytes |
|
``` |
|
|
|
**Metrics Endpoint:** |
|
```bash |
|
curl https://sematech-sema-api.hf.space/metrics |
|
``` |
|
|
|
### Structured Logging |
|
|
|
**Log Format:** |
|
```json |
|
{ |
|
"timestamp": "2024-06-21T14:30:25.123Z", |
|
"level": "INFO", |
|
"event": "translation_request", |
|
"request_id": "550e8400-e29b-41d4-a716-446655440000", |
|
"source_language": "swh_Latn", |
|
"target_language": "eng_Latn", |
|
"character_count": 17, |
|
"inference_time": 0.234, |
|
"total_time": 1.234, |
|
"client_ip": "192.168.1.1" |
|
} |
|
``` |
|
|
|
### Health Monitoring |
|
|
|
**Health Check Endpoints:** |
|
```bash |
|
# Basic status |
|
curl https://sematech-sema-api.hf.space/status |
|
|
|
# Detailed health |
|
curl https://sematech-sema-api.hf.space/health |
|
|
|
# Model validation |
|
curl https://sematech-sema-api.hf.space/health | jq '.models_loaded' |
|
``` |
|
|
|
## π CI/CD Pipeline |
|
|
|
### Automated Deployment |
|
|
|
**Git Integration:** |
|
1. **Code Push**: Push to main branch |
|
2. **Auto-Build**: HuggingFace Spaces builds Docker image |
|
3. **Model Download**: Automatic model download from `sematech/sema-utils` |
|
4. **Health Check**: Automatic health validation |
|
5. **Live Deployment**: Zero-downtime deployment |
|
|
|
**Deployment Validation:** |
|
```bash |
|
# Automated health check after deployment |
|
curl -f https://sematech-sema-api.hf.space/health || exit 1 |
|
|
|
# Test translation functionality |
|
curl -X POST https://sematech-sema-api.hf.space/api/v1/translate \ |
|
-H "Content-Type: application/json" \ |
|
-d '{"text": "Hello", "target_language": "swh_Latn"}' || exit 1 |
|
``` |
|
|
|
### Model Updates |
|
|
|
**Model Versioning Strategy:** |
|
```python |
|
# Check for model updates |
|
def check_model_updates(): |
|
"""Check if models need updating""" |
|
try: |
|
repo_info = api.repo_info("sematech/sema-utils") |
|
local_commit = get_local_commit_hash() |
|
|
|
if local_commit != repo_info.sha: |
|
logger.info("model_update_available") |
|
return True |
|
return False |
|
except Exception as e: |
|
logger.error("update_check_failed", error=str(e)) |
|
return False |
|
|
|
# Graceful model reloading |
|
async def reload_models(): |
|
"""Reload models without downtime""" |
|
global translator, tokenizer, language_detector |
|
|
|
# Download updated models |
|
new_model_path = download_models() |
|
|
|
# Load new models |
|
new_translator = load_translation_model(new_model_path) |
|
new_tokenizer = load_tokenizer(new_model_path) |
|
new_detector = load_detection_model(new_model_path) |
|
|
|
# Atomic swap |
|
translator = new_translator |
|
tokenizer = new_tokenizer |
|
language_detector = new_detector |
|
|
|
logger.info("models_reloaded_successfully") |
|
``` |
|
|
|
## π Security Considerations |
|
|
|
### Current Security Measures |
|
|
|
**Input Validation:** |
|
- Pydantic schema validation |
|
- Character length limits |
|
- Content type validation |
|
- Request size limits |
|
|
|
**Rate Limiting:** |
|
- IP-based rate limiting (60 req/min) |
|
- Sliding window implementation |
|
- Graceful degradation |
|
|
|
**CORS Configuration:** |
|
```python |
|
app.add_middleware( |
|
CORSMiddleware, |
|
allow_origins=["*"], # Configure for production |
|
allow_credentials=True, |
|
allow_methods=["GET", "POST"], |
|
allow_headers=["*"], |
|
) |
|
``` |
|
|
|
### Future Security Enhancements |
|
|
|
**Authentication & Authorization:** |
|
- API key management |
|
- JWT token validation |
|
- Role-based access control |
|
- Usage quotas per user |
|
|
|
**Enhanced Security:** |
|
- Request signing |
|
- IP whitelisting |
|
- DDoS protection |
|
- Input sanitization |
|
|
|
## π Scaling Considerations |
|
|
|
### Horizontal Scaling |
|
|
|
**Load Balancing Strategy:** |
|
```nginx |
|
upstream sema_api { |
|
server sema-api-1.hf.space; |
|
server sema-api-2.hf.space; |
|
server sema-api-3.hf.space; |
|
} |
|
|
|
server { |
|
listen 80; |
|
location / { |
|
proxy_pass http://sema_api; |
|
proxy_set_header Host $host; |
|
proxy_set_header X-Real-IP $remote_addr; |
|
} |
|
} |
|
``` |
|
|
|
**Auto-scaling Triggers:** |
|
- CPU usage > 80% |
|
- Memory usage > 85% |
|
- Response time > 5 seconds |
|
- Queue length > 10 requests |
|
|
|
### Performance Optimization |
|
|
|
**Caching Strategy:** |
|
- Redis for translation caching |
|
- CDN for static content |
|
- Model result caching |
|
- Language metadata caching |
|
|
|
**Database Integration:** |
|
- PostgreSQL for user data |
|
- Analytics database for metrics |
|
- Read replicas for scaling |
|
- Connection pooling |
|
|
|
This architecture provides a solid foundation for scaling the Sema API to handle enterprise-level traffic while maintaining high performance and reliability. |
|
|