sema-api / docs /README.md
kamau1's picture
Added documentation for using custom models
0745795
# Sema Translation API - Complete Documentation
Welcome to the comprehensive documentation for the Sema Translation API - an enterprise-grade translation service supporting 200+ languages with custom HuggingFace models and a focus on African languages.
## ๐Ÿ“š Documentation Overview
This documentation covers all aspects of the Sema Translation API, from custom model implementation to advanced deployment scenarios and future application ideas.
### ๐Ÿš€ Core Documentation
#### **[Custom Models Implementation](CUSTOM_MODELS_IMPLEMENTATION.md)**
**Essential Reading** - Detailed documentation of how we implemented custom HuggingFace models:
- Unified `sematech/sema-utils` repository structure
- CTranslate2 optimization for 2-4x faster inference
- Model loading pipeline and caching strategy
- Performance benchmarks and monitoring
- Model update and versioning process
#### **[API Capabilities](API_CAPABILITIES.md)**
Complete overview of enhanced API features:
- 55+ African languages (updated from 23)
- Server-side performance timing
- Language detection with confidence scores
- Comprehensive language metadata system
#### **[Future Considerations](FUTURE_CONSIDERATIONS.md)**
Roadmap and application ideas:
- Authentication & user management with Supabase
- Database integration and caching strategies
- Document translation and real-time streaming
- Innovative application ideas (chatbots, education, government services)
#### **[Deployment Architecture](DEPLOYMENT_ARCHITECTURE.md)**
Infrastructure and deployment details:
- HuggingFace Spaces deployment process
- Performance characteristics and resource requirements
- Monitoring with Prometheus and structured logging
- CI/CD pipeline and scaling considerations
### ๐Ÿ“– Additional Documentation
#### **[Project Overview](PROJECT_OVERVIEW.md)**
High-level project introduction and goals
#### **[API Reference](API_REFERENCE.md)**
Complete endpoint documentation with examples
## ๐ŸŒŸ Key Achievements & Features
### Custom HuggingFace Models Integration
- **Unified Repository**: `sematech/sema-utils` containing all models
- **Optimized Performance**: CTranslate2 INT8 quantization (75% size reduction)
- **Automatic Updates**: HuggingFace Hub integration with version management
- **Enterprise Caching**: Intelligent model caching and loading strategies
### Enhanced African Language Support
- **55+ African Languages**: Complete FLORES-200 African language coverage
- **Regional Distribution**: West, East, Southern, Central, and North Africa
- **Multiple Scripts**: Latin, Arabic, Ethiopic, Tifinagh support
- **Cultural Context**: Native names and regional information
### Performance & Monitoring
- **Server-Side Timing**: Request performance tracking in headers and responses
- **Prometheus Metrics**: Comprehensive monitoring and analytics
- **Request Tracking**: Unique request IDs for debugging
- **Health Monitoring**: System status and model availability checks
## ๐Ÿ”ง Technical Implementation Highlights
### Model Architecture
```
Custom HuggingFace Models (sematech/sema-utils)
โ”œโ”€โ”€ Translation: NLLB-200 3.3B (CTranslate2 optimized)
โ”œโ”€โ”€ Language Detection: FastText LID.176
โ”œโ”€โ”€ Tokenization: SentencePiece
โ””โ”€โ”€ Language Database: FLORES-200 complete
```
### Performance Metrics
- **Model Size**: 2.5GB (optimized from 6.6GB)
- **Inference Speed**: 0.2-2.5 seconds depending on text length
- **Memory Usage**: ~3.2GB for models, 50-100MB per request
- **Language Detection**: 0.01-0.05 seconds with 99%+ accuracy
### API Enhancements
- **Request Timing**: Server-side performance measurement
- **Language Metadata**: Complete language information system
- **Error Handling**: Comprehensive validation and error responses
- **Rate Limiting**: 60 requests/minute with graceful degradation
## ๐Ÿš€ Quick Start Examples
### Basic Translation with Timing
```bash
curl -v -X POST "https://sematech-sema-api.hf.space/api/v1/translate" \
-H "Content-Type: application/json" \
-d '{"text": "Habari ya asubuhi", "target_language": "eng_Latn"}'
# Response includes timing information:
# X-Response-Time: 1.234s
# X-Request-ID: 550e8400-e29b-41d4-a716-446655440000
```
### African Languages Discovery
```bash
# Get all 55+ African languages
curl "https://sematech-sema-api.hf.space/api/v1/languages/african"
# Search for specific African languages
curl "https://sematech-sema-api.hf.space/api/v1/languages/search?q=Akan"
curl "https://sematech-sema-api.hf.space/api/v1/languages/search?q=Bambara"
```
### Language Detection with Confidence
```bash
curl -X POST "https://sematech-sema-api.hf.space/api/v1/detect-language" \
-H "Content-Type: application/json" \
-d '{"text": "Habari ya asubuhi"}'
# Returns: detected language, confidence score, timing information
```
## ๐ŸŽฏ Application Use Cases
### 1. Multilingual Chatbot Implementation
```python
async def process_user_input(user_text):
# 1. Detect language
detection = await detect_language(user_text)
# 2. Decide processing flow
if detection.is_english:
response = await llm_chat(user_text)
else:
# Translate โ†’ Process โ†’ Translate back
english_input = await translate(user_text, "eng_Latn")
english_response = await llm_chat(english_input)
response = await translate(english_response, detection.detected_language)
return response
```
### 2. African News Platform
- Aggregate news from multiple African countries
- Translate between African languages
- Provide summaries in user's preferred language
### 3. Educational Platform
- Interactive language learning with African languages
- Cultural context and pronunciation guides
- Progress tracking across multiple languages
### 4. Government Services
- Multilingual official document translation
- Emergency notifications in local languages
- Citizen services in preferred languages
## ๐Ÿ“Š API Statistics & Metrics
### Language Coverage
- **Total Languages**: 200+ (FLORES-200 complete)
- **African Languages**: 55+ (updated from 23)
- **Writing Scripts**: Latin, Arabic, Ethiopic, Tifinagh, Cyrillic, Han, etc.
- **Geographic Regions**: Comprehensive global coverage
### Performance Benchmarks
- **Translation Speed**: 0.2-2.5s depending on text length
- **Language Detection**: 0.01-0.05s with 99%+ accuracy
- **Model Efficiency**: 75% size reduction with maintained quality
- **Concurrent Handling**: Linear scaling with available resources
### Quality Metrics
- **BLEU Scores**: Industry-standard translation quality
- **African Languages**: Specialized cultural context preservation
- **Uptime**: 99.9% target availability
- **Error Rate**: <1% under normal load
## ๐Ÿ”ฎ Future Roadmap
### Immediate (3-6 months)
- User authentication and usage tracking
- Database integration with PostgreSQL
- Redis caching for improved performance
- Advanced monitoring dashboards
### Medium-term (6-12 months)
- Document translation with formatting preservation
- Real-time translation streaming via WebSocket
- Domain-specific models (medical, legal, technical)
- Mobile SDK development
### Long-term (1-2 years)
- AI-powered translation ecosystem
- Enterprise integration platform
- African language research contributions
- Voice-to-voice translation capabilities
## ๐Ÿ› ๏ธ Development & Deployment
### Local Development
```bash
# Clone and setup
git clone https://github.com/lewiskimaru/sema.git
cd sema/backend/sema-api
# Install dependencies
pip install -r requirements.txt
# Run locally
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
```
### Testing
```bash
# Run comprehensive tests
python tests/test_african_languages_update.py
python tests/test_performance_timing.py
python tests/simple_test.py
```
### Deployment
- **Platform**: HuggingFace Spaces
- **Auto-deployment**: Git integration
- **Model Updates**: Automatic from `sematech/sema-utils`
- **Monitoring**: Prometheus metrics and health checks
## ๐Ÿ“ž Support & Resources
### Documentation Links
- **Live API**: https://sematech-sema-api.hf.space
- **Interactive Docs**: https://sematech-sema-api.hf.space/ (Swagger UI)
- **Health Status**: https://sematech-sema-api.hf.space/health
- **Metrics**: https://sematech-sema-api.hf.space/metrics
### Model Repository
- **HuggingFace**: https://huggingface.co/sematech/sema-utils
- **Model Documentation**: Comprehensive model usage and optimization guides
- **Version History**: Track model updates and improvements
### Community & Support
- **GitHub Repository**: Complete source code and issue tracking
- **Model Contributions**: Community-driven improvements
- **Research Collaboration**: Academic partnerships for African language research
---
**The Sema Translation API represents a significant advancement in African language technology, combining custom HuggingFace models with enterprise-grade infrastructure to serve diverse global communities.**
*Documentation last updated: June 2024 | API Version: 2.0.0*