File size: 4,564 Bytes
076e53d
 
 
 
 
 
 
 
 
 
 
7e17f7b
9cf5fee
 
 
7e17f7b
9cf5fee
 
 
7e17f7b
9cf5fee
 
7e17f7b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
124b5b5
 
 
 
7e17f7b
 
 
 
 
 
 
 
 
 
 
 
 
124b5b5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7e17f7b
 
 
 
 
 
 
 
 
 
 
 
 
b089011
 
 
 
 
 
 
 
 
 
7e17f7b
124b5b5
 
 
 
 
 
 
 
7e17f7b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
---
title: AI Content Summariser API
emoji: πŸ“
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
license: mit
---

# AI Content Summariser API (Backend)

This is the backend API for the AI Content Summariser, a tool that automatically generates concise summaries of articles, documents, and web content using natural language processing.

The frontend application is available in a separate repository: [ai-content-summariser](https://github.com/dang-w/ai-content-summariser).

## Features

- Text summarization using state-of-the-art NLP models (BART-large-CNN)
- URL content extraction and summarization
- Adjustable parameters for summary length and style
- Efficient API endpoints with proper error handling

## API Endpoints

- `POST /api/summarise` - Summarize text content
- `POST /api/summarise-url` - Extract and summarize content from a URL

## Technology Stack

- **Framework**: FastAPI for efficient API endpoints
- **NLP Models**: Transformer-based models (BART) for summarisation
- **Web Scraping**: BeautifulSoup4 for extracting content from URLs
- **HTTP Client**: HTTPX for asynchronous web requests
- **Deployment**: Hugging Face Spaces or Docker containers

## Getting Started

### Prerequisites

- Python (v3.8+)
- pip

### Installation

```bash
# Clone the repository
git clone https://github.com/dang-w/ai-content-summariser-api.git
cd ai-content-summariser-api

# Create a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
```

### Running Locally

```bash
# Start the backend server
uvicorn main:app --reload
```

The API will be available at `http://localhost:8000`.

## Testing

The project includes a comprehensive test suite covering both unit and integration tests.

### Installing Test Dependencies

```bash
pip install pytest pytest-cov httpx
```

### Running Tests

```bash
# Run all tests
pytest

# Run tests with verbose output
pytest -v

# Run tests and generate coverage report
pytest --cov=app tests/

# Run tests and generate detailed coverage report
pytest --cov=app --cov-report=term-missing tests/

# Run specific test file
pytest tests/test_api.py

# Run tests without warnings
pytest -W ignore::FutureWarning -W ignore::UserWarning
```

### Test Structure

- **Unit Tests**: Test individual components in isolation
  - `tests/test_summariser.py`: Tests for the summarization service

- **Integration Tests**: Test API endpoints and component interactions
  - `tests/test_api.py`: Tests for API endpoints

### Mocking Strategy

For faster and more reliable tests, we use mocking to avoid loading large ML models during testing:

```python
# Example of mocked test
def test_summariser_with_mock():
    with patch('app.services.summariser.AutoTokenizer') as mock_tokenizer_class, \
         patch('app.services.summariser.AutoModelForSeq2SeqLM') as mock_model_class:
        # Test implementation...
```

### Continuous Integration

Tests are automatically run on pull requests and pushes to the main branch using GitHub Actions.

## Running with Docker

```bash
# Build and run with Docker
docker build -t ai-content-summariser-api .
docker run -p 8000:8000 ai-content-summariser-api
```

## Deployment

See the deployment guide in the frontend repository for detailed instructions on deploying both the frontend and backend components.

### Deploying to Hugging Face Spaces

When deploying to Hugging Face Spaces, make sure to:

1. Set the following environment variables in the Space settings:
   - `TRANSFORMERS_CACHE=/tmp/huggingface_cache`
   - `HF_HOME=/tmp/huggingface_cache`
   - `HUGGINGFACE_HUB_CACHE=/tmp/huggingface_cache`

2. Use the Docker SDK in your Space settings

3. If you encounter memory issues, consider using a smaller model by changing the `model_name` in `summariser.py`

## Performance Optimizations

The API includes several performance optimizations:

1. **Model Caching**: Models are loaded once and cached for subsequent requests
2. **Result Caching**: Frequently requested summaries are cached to avoid redundant processing
3. **Asynchronous Processing**: Long-running tasks are processed asynchronously

## Development

### Testing the API

You can test the API endpoints using the built-in Swagger documentation at `/docs` when running locally.

### Checking Transformers Installation

To verify that the transformers library is installed correctly:

```bash
python -m app.check_transformers
```

## License

This project is licensed under the MIT License.