Spaces:
Sleeping
Sleeping
File size: 4,564 Bytes
076e53d 7e17f7b 9cf5fee 7e17f7b 9cf5fee 7e17f7b 9cf5fee 7e17f7b 124b5b5 7e17f7b 124b5b5 7e17f7b b089011 7e17f7b 124b5b5 7e17f7b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 |
---
title: AI Content Summariser API
emoji: π
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
license: mit
---
# AI Content Summariser API (Backend)
This is the backend API for the AI Content Summariser, a tool that automatically generates concise summaries of articles, documents, and web content using natural language processing.
The frontend application is available in a separate repository: [ai-content-summariser](https://github.com/dang-w/ai-content-summariser).
## Features
- Text summarization using state-of-the-art NLP models (BART-large-CNN)
- URL content extraction and summarization
- Adjustable parameters for summary length and style
- Efficient API endpoints with proper error handling
## API Endpoints
- `POST /api/summarise` - Summarize text content
- `POST /api/summarise-url` - Extract and summarize content from a URL
## Technology Stack
- **Framework**: FastAPI for efficient API endpoints
- **NLP Models**: Transformer-based models (BART) for summarisation
- **Web Scraping**: BeautifulSoup4 for extracting content from URLs
- **HTTP Client**: HTTPX for asynchronous web requests
- **Deployment**: Hugging Face Spaces or Docker containers
## Getting Started
### Prerequisites
- Python (v3.8+)
- pip
### Installation
```bash
# Clone the repository
git clone https://github.com/dang-w/ai-content-summariser-api.git
cd ai-content-summariser-api
# Create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
```
### Running Locally
```bash
# Start the backend server
uvicorn main:app --reload
```
The API will be available at `http://localhost:8000`.
## Testing
The project includes a comprehensive test suite covering both unit and integration tests.
### Installing Test Dependencies
```bash
pip install pytest pytest-cov httpx
```
### Running Tests
```bash
# Run all tests
pytest
# Run tests with verbose output
pytest -v
# Run tests and generate coverage report
pytest --cov=app tests/
# Run tests and generate detailed coverage report
pytest --cov=app --cov-report=term-missing tests/
# Run specific test file
pytest tests/test_api.py
# Run tests without warnings
pytest -W ignore::FutureWarning -W ignore::UserWarning
```
### Test Structure
- **Unit Tests**: Test individual components in isolation
- `tests/test_summariser.py`: Tests for the summarization service
- **Integration Tests**: Test API endpoints and component interactions
- `tests/test_api.py`: Tests for API endpoints
### Mocking Strategy
For faster and more reliable tests, we use mocking to avoid loading large ML models during testing:
```python
# Example of mocked test
def test_summariser_with_mock():
with patch('app.services.summariser.AutoTokenizer') as mock_tokenizer_class, \
patch('app.services.summariser.AutoModelForSeq2SeqLM') as mock_model_class:
# Test implementation...
```
### Continuous Integration
Tests are automatically run on pull requests and pushes to the main branch using GitHub Actions.
## Running with Docker
```bash
# Build and run with Docker
docker build -t ai-content-summariser-api .
docker run -p 8000:8000 ai-content-summariser-api
```
## Deployment
See the deployment guide in the frontend repository for detailed instructions on deploying both the frontend and backend components.
### Deploying to Hugging Face Spaces
When deploying to Hugging Face Spaces, make sure to:
1. Set the following environment variables in the Space settings:
- `TRANSFORMERS_CACHE=/tmp/huggingface_cache`
- `HF_HOME=/tmp/huggingface_cache`
- `HUGGINGFACE_HUB_CACHE=/tmp/huggingface_cache`
2. Use the Docker SDK in your Space settings
3. If you encounter memory issues, consider using a smaller model by changing the `model_name` in `summariser.py`
## Performance Optimizations
The API includes several performance optimizations:
1. **Model Caching**: Models are loaded once and cached for subsequent requests
2. **Result Caching**: Frequently requested summaries are cached to avoid redundant processing
3. **Asynchronous Processing**: Long-running tasks are processed asynchronously
## Development
### Testing the API
You can test the API endpoints using the built-in Swagger documentation at `/docs` when running locally.
### Checking Transformers Installation
To verify that the transformers library is installed correctly:
```bash
python -m app.check_transformers
```
## License
This project is licensed under the MIT License.
|