|
--- |
|
title: AI Content Summariser API |
|
emoji: π |
|
colorFrom: blue |
|
colorTo: indigo |
|
sdk: docker |
|
app_port: 7860 |
|
pinned: false |
|
license: mit |
|
--- |
|
|
|
# AI Content Summariser API (Backend) |
|
|
|
This is the backend API for the AI Content Summariser, a tool that automatically generates concise summaries of articles, documents, and web content using natural language processing. |
|
|
|
The frontend application is available in a separate repository: [ai-content-summariser](https://github.com/dang-w/ai-content-summariser). |
|
|
|
## Features |
|
|
|
- **Text Summarization**: Generate concise summaries using BART-large-CNN model |
|
- **URL Content Extraction**: Automatically extract and process content from web pages |
|
- **Adjustable Parameters**: Control summary length (30-500 chars) and style |
|
- **Advanced Generation Options**: Temperature control (0.7-2.0) and sampling options |
|
- **Caching System**: Store results to improve performance and reduce redundant processing |
|
- **Status Monitoring**: Track model loading and summarization progress in real-time |
|
- **Error Handling**: Robust error handling for various input scenarios |
|
- **CORS Support**: Configured for cross-origin requests from the frontend |
|
|
|
## API Endpoints |
|
|
|
- `POST /api/summarise` - Summarize text content |
|
- `POST /api/summarise-url` - Extract and summarize content from a URL |
|
- `GET /api/status` - Get the current status of the model and any running jobs |
|
- `GET /health` - Health check endpoint for monitoring |
|
|
|
## Technology Stack |
|
|
|
- **Framework**: FastAPI for efficient API development |
|
- **NLP Models**: Hugging Face Transformers (BART-large-CNN) |
|
- **Web Scraping**: BeautifulSoup4 for extracting content from URLs |
|
- **HTTP Client**: HTTPX for asynchronous web requests |
|
- **ML Framework**: PyTorch for running the NLP models |
|
- **Testing**: Pytest for unit and integration testing |
|
- **Deployment**: Docker containers on Hugging Face Spaces |
|
|
|
## Project Structure |
|
|
|
``` |
|
ai-content-summariser-api/ |
|
βββ app/ |
|
β βββ api/ |
|
β β βββ routes.py # API endpoints |
|
β βββ services/ |
|
β β βββ summariser.py # Text summarization service |
|
β β βββ url_extractor.py # URL content extraction |
|
β β βββ cache.py # Caching functionality |
|
β βββ check_transformers.py # Utility to verify model setup |
|
βββ tests/ |
|
β βββ test_api.py # API endpoint tests |
|
β βββ test_summariser.py # Summarizer service tests |
|
βββ main.py # Application entry point |
|
βββ Dockerfile # Docker configuration |
|
βββ requirements.txt # Python dependencies |
|
βββ .env # Environment variables (not in repo) |
|
``` |
|
|
|
## Getting Started |
|
|
|
### Prerequisites |
|
|
|
- Python (v3.8+) |
|
- pip |
|
- At least 4GB of RAM (8GB recommended for optimal performance) |
|
- GPU support (optional, but recommended for faster processing) |
|
|
|
### Installation |
|
|
|
```bash |
|
# Clone the repository |
|
git clone https://github.com/dang-w/ai-content-summariser-api.git |
|
cd ai-content-summariser-api |
|
|
|
# Create a virtual environment |
|
python -m venv venv |
|
source venv/bin/activate # On Windows: venv\Scripts\activate |
|
|
|
# Install dependencies |
|
pip install -r requirements.txt |
|
``` |
|
|
|
### Environment Setup |
|
|
|
Create a `.env` file in the root directory with the following variables: |
|
|
|
``` |
|
ENVIRONMENT=development |
|
CORS_ORIGINS=http://localhost:3000,https://ai-content-summariser.vercel.app |
|
TRANSFORMERS_CACHE=/path/to/cache # Optional: custom cache location |
|
``` |
|
|
|
### Running Locally |
|
|
|
```bash |
|
# Start the backend server |
|
uvicorn main:app --reload --host 0.0.0.0 --port 8000 |
|
``` |
|
|
|
The API will be available at `http://localhost:8000`. You can access the API documentation at `http://localhost:8000/docs`. |
|
|
|
## Testing |
|
|
|
The project includes a comprehensive test suite covering both unit and integration tests. |
|
|
|
### Running Tests |
|
|
|
```bash |
|
# Run all tests |
|
pytest |
|
|
|
# Run tests with verbose output |
|
pytest -v |
|
|
|
# Run tests and generate coverage report |
|
pytest --cov=app tests/ |
|
|
|
# Run tests and generate detailed coverage report |
|
pytest --cov=app --cov-report=term-missing tests/ |
|
|
|
# Run specific test file |
|
pytest tests/test_api.py |
|
|
|
# Run tests without warnings |
|
pytest -W ignore::FutureWarning -W ignore::UserWarning |
|
``` |
|
|
|
## Docker Deployment |
|
|
|
```bash |
|
# Build and run with Docker |
|
docker build -t ai-content-summariser-api . |
|
docker run -p 8000:8000 ai-content-summariser-api |
|
``` |
|
|
|
## Deployment to Hugging Face Spaces |
|
|
|
When deploying to Hugging Face Spaces: |
|
|
|
1. Fork this repository to your Hugging Face account |
|
2. Set the following environment variables in the Space settings: |
|
- `TRANSFORMERS_CACHE=/tmp/huggingface_cache` |
|
- `HF_HOME=/tmp/huggingface_cache` |
|
- `HUGGINGFACE_HUB_CACHE=/tmp/huggingface_cache` |
|
- `CORS_ORIGINS=https://ai-content-summariser.vercel.app,http://localhost:3000` |
|
3. Ensure the Space is configured to use the Docker SDK |
|
4. Your API will be available at `https://huggingface.co/spaces/your-username/ai-content-summariser-api` |
|
|
|
## Performance Optimizations |
|
|
|
The API includes several performance optimizations: |
|
|
|
1. **Model Caching**: Models are loaded once and cached for subsequent requests |
|
2. **Result Caching**: Frequently requested summaries are cached to avoid redundant processing |
|
3. **Asynchronous Processing**: Long-running tasks are processed asynchronously |
|
4. **Text Preprocessing**: Input text is cleaned and normalized before processing |
|
5. **Batched Processing**: Large texts are processed in batches for better memory management |
|
|
|
## API Request Examples |
|
|
|
### Text Summarization |
|
|
|
```bash |
|
curl -X 'POST' \ |
|
'http://localhost:8000/api/summarise' \ |
|
-H 'Content-Type: application/json' \ |
|
-d '{ |
|
"text": "Your long text to summarize goes here...", |
|
"max_length": 150, |
|
"min_length": 50, |
|
"do_sample": true, |
|
"temperature": 1.2 |
|
}' |
|
``` |
|
|
|
### URL Summarization |
|
|
|
```bash |
|
curl -X 'POST' \ |
|
'http://localhost:8000/api/summarise-url' \ |
|
-H 'Content-Type: application/json' \ |
|
-d '{ |
|
"url": "https://example.com/article", |
|
"max_length": 150, |
|
"min_length": 50, |
|
"do_sample": true, |
|
"temperature": 1.2 |
|
}' |
|
``` |
|
|
|
## License |
|
|
|
This project is licensed under the MIT License. |
|
|