Spaces:

dang-w
/

ai-content-summariser-api

Running

App Files Files Community

ai-content-summariser-api / README.md

Dan Walsh

Updating README

77a88ff 4 months ago

preview code

raw

history blame contribute delete

6.13 kB

	---
	title: AI Content Summariser API
	emoji: 📝
	colorFrom: blue
	colorTo: indigo
	sdk: docker
	app_port: 7860
	pinned: false
	license: mit
	---

	# AI Content Summariser API (Backend)

	This is the backend API for the AI Content Summariser, a tool that automatically generates concise summaries of articles, documents, and web content using natural language processing.

	The frontend application is available in a separate repository: [ai-content-summariser](https://github.com/dang-w/ai-content-summariser).

	## Features

	- Text Summarization: Generate concise summaries using BART-large-CNN model
	- URL Content Extraction: Automatically extract and process content from web pages
	- Adjustable Parameters: Control summary length (30-500 chars) and style
	- Advanced Generation Options: Temperature control (0.7-2.0) and sampling options
	- Caching System: Store results to improve performance and reduce redundant processing
	- Status Monitoring: Track model loading and summarization progress in real-time
	- Error Handling: Robust error handling for various input scenarios
	- CORS Support: Configured for cross-origin requests from the frontend

	## API Endpoints

	- `POST /api/summarise` - Summarize text content
	- `POST /api/summarise-url` - Extract and summarize content from a URL
	- `GET /api/status` - Get the current status of the model and any running jobs
	- `GET /health` - Health check endpoint for monitoring

	## Technology Stack

	- Framework: FastAPI for efficient API development
	- NLP Models: Hugging Face Transformers (BART-large-CNN)
	- Web Scraping: BeautifulSoup4 for extracting content from URLs
	- HTTP Client: HTTPX for asynchronous web requests
	- ML Framework: PyTorch for running the NLP models
	- Testing: Pytest for unit and integration testing
	- Deployment: Docker containers on Hugging Face Spaces

	## Project Structure

	```
	ai-content-summariser-api/
	├── app/
	│ ├── api/
	│ │ └── routes.py # API endpoints
	│ ├── services/
	│ │ ├── summariser.py # Text summarization service
	│ │ ├── url_extractor.py # URL content extraction
	│ │ └── cache.py # Caching functionality
	│ └── check_transformers.py # Utility to verify model setup
	├── tests/
	│ ├── test_api.py # API endpoint tests
	│ └── test_summariser.py # Summarizer service tests
	├── main.py # Application entry point
	├── Dockerfile # Docker configuration
	├── requirements.txt # Python dependencies
	└── .env # Environment variables (not in repo)
	```

	## Getting Started

	### Prerequisites

	- Python (v3.8+)
	- pip
	- At least 4GB of RAM (8GB recommended for optimal performance)
	- GPU support (optional, but recommended for faster processing)

	### Installation

	```bash
	# Clone the repository
	git clone https://github.com/dang-w/ai-content-summariser-api.git
	cd ai-content-summariser-api

	# Create a virtual environment
	python -m venv venv
	source venv/bin/activate # On Windows: venv\Scripts\activate

	# Install dependencies
	pip install -r requirements.txt
	```

	### Environment Setup

	Create a `.env` file in the root directory with the following variables:

	```
	ENVIRONMENT=development
	CORS_ORIGINS=http://localhost:3000,https://ai-content-summariser.vercel.app
	TRANSFORMERS_CACHE=/path/to/cache # Optional: custom cache location
	```

	### Running Locally

	```bash
	# Start the backend server
	uvicorn main:app --reload --host 0.0.0.0 --port 8000
	```

	The API will be available at `http://localhost:8000`. You can access the API documentation at `http://localhost:8000/docs`.

	## Testing

	The project includes a comprehensive test suite covering both unit and integration tests.

	### Running Tests

	```bash
	# Run all tests
	pytest

	# Run tests with verbose output
	pytest -v

	# Run tests and generate coverage report
	pytest --cov=app tests/

	# Run tests and generate detailed coverage report
	pytest --cov=app --cov-report=term-missing tests/

	# Run specific test file
	pytest tests/test_api.py

	# Run tests without warnings
	pytest -W ignore::FutureWarning -W ignore::UserWarning
	```

	## Docker Deployment

	```bash
	# Build and run with Docker
	docker build -t ai-content-summariser-api .
	docker run -p 8000:8000 ai-content-summariser-api
	```

	## Deployment to Hugging Face Spaces

	When deploying to Hugging Face Spaces:

	1. Fork this repository to your Hugging Face account
	2. Set the following environment variables in the Space settings:
	- `TRANSFORMERS_CACHE=/tmp/huggingface_cache`
	- `HF_HOME=/tmp/huggingface_cache`
	- `HUGGINGFACE_HUB_CACHE=/tmp/huggingface_cache`
	- `CORS_ORIGINS=https://ai-content-summariser.vercel.app,http://localhost:3000`
	3. Ensure the Space is configured to use the Docker SDK
	4. Your API will be available at `https://huggingface.co/spaces/your-username/ai-content-summariser-api`

	## Performance Optimizations

	The API includes several performance optimizations:

	1. Model Caching: Models are loaded once and cached for subsequent requests
	2. Result Caching: Frequently requested summaries are cached to avoid redundant processing
	3. Asynchronous Processing: Long-running tasks are processed asynchronously
	4. Text Preprocessing: Input text is cleaned and normalized before processing
	5. Batched Processing: Large texts are processed in batches for better memory management

	## API Request Examples

	### Text Summarization

	```bash
	curl -X 'POST' \
	'http://localhost:8000/api/summarise' \
	-H 'Content-Type: application/json' \
	-d '{
	"text": "Your long text to summarize goes here...",
	"max_length": 150,
	"min_length": 50,
	"do_sample": true,
	"temperature": 1.2
	}'
	```

	### URL Summarization

	```bash
	curl -X 'POST' \
	'http://localhost:8000/api/summarise-url' \
	-H 'Content-Type: application/json' \
	-d '{
	"url": "https://example.com/article",
	"max_length": 150,
	"min_length": 50,
	"do_sample": true,
	"temperature": 1.2
	}'
	```

	## License

	This project is licensed under the MIT License.