web-scraper / .github\copilot-instructions.md
spagestic's picture
Create .github\copilot-instructions.md
8088298 verified

A newer version of the Gradio SDK is available: 5.34.2

Upgrade

Web Scraper Project Instructions

This is a Python Gradio application for web scraping that:

  • Scrapes text content from websites
  • Formats content as markdown
  • Generates sitemaps from page links
  • Provides MCP (Model Context Protocol) server functionality

Key Libraries

  • gradio[mcp]: For the web interface and MCP server capabilities
  • requests: For HTTP requests
  • beautifulsoup4: For HTML parsing
  • markdownify: For converting HTML to markdown
  • urllib.parse: For URL handling

Project Structure

  • app.py: Main web interface application
  • mcp_server.py: MCP server that exposes tools for AI integration

MCP Tools

The MCP server exposes three main tools:

  • scrape_content: Extract website content as markdown
  • generate_sitemap: Create sitemap from page links
  • analyze_website: Complete analysis with content and sitemap

Code Style

  • Use type hints where appropriate
  • Include proper error handling for web requests
  • Follow PEP 8 style guidelines
  • Add docstrings for functions with clear parameter descriptions
  • MCP functions should have descriptive docstrings as they become tool descriptions