A newer version of the Gradio SDK is available:
5.34.2
Web Scraper Project Instructions
This is a Python Gradio application for web scraping that:
- Scrapes text content from websites
- Formats content as markdown
- Generates sitemaps from page links
- Provides MCP (Model Context Protocol) server functionality
Key Libraries
- gradio[mcp]: For the web interface and MCP server capabilities
- requests: For HTTP requests
- beautifulsoup4: For HTML parsing
- markdownify: For converting HTML to markdown
- urllib.parse: For URL handling
Project Structure
app.py
: Main web interface applicationmcp_server.py
: MCP server that exposes tools for AI integration
MCP Tools
The MCP server exposes three main tools:
scrape_content
: Extract website content as markdowngenerate_sitemap
: Create sitemap from page linksanalyze_website
: Complete analysis with content and sitemap
Code Style
- Use type hints where appropriate
- Include proper error handling for web requests
- Follow PEP 8 style guidelines
- Add docstrings for functions with clear parameter descriptions
- MCP functions should have descriptive docstrings as they become tool descriptions