Create .github\copilot-instructions.md
Browse files
.github//copilot-instructions.md
ADDED
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<!-- Use this file to provide workspace-specific custom instructions to Copilot. For more details, visit https://code.visualstudio.com/docs/copilot/copilot-customization#_use-a-githubcopilotinstructionsmd-file -->
|
2 |
+
|
3 |
+
# Web Scraper Project Instructions
|
4 |
+
|
5 |
+
This is a Python Gradio application for web scraping that:
|
6 |
+
|
7 |
+
- Scrapes text content from websites
|
8 |
+
- Formats content as markdown
|
9 |
+
- Generates sitemaps from page links
|
10 |
+
- Provides MCP (Model Context Protocol) server functionality
|
11 |
+
|
12 |
+
## Key Libraries
|
13 |
+
|
14 |
+
- gradio[mcp]: For the web interface and MCP server capabilities
|
15 |
+
- requests: For HTTP requests
|
16 |
+
- beautifulsoup4: For HTML parsing
|
17 |
+
- markdownify: For converting HTML to markdown
|
18 |
+
- urllib.parse: For URL handling
|
19 |
+
|
20 |
+
## Project Structure
|
21 |
+
|
22 |
+
- `app.py`: Main web interface application
|
23 |
+
- `mcp_server.py`: MCP server that exposes tools for AI integration
|
24 |
+
|
25 |
+
## MCP Tools
|
26 |
+
|
27 |
+
The MCP server exposes three main tools:
|
28 |
+
|
29 |
+
- `scrape_content`: Extract website content as markdown
|
30 |
+
- `generate_sitemap`: Create sitemap from page links
|
31 |
+
- `analyze_website`: Complete analysis with content and sitemap
|
32 |
+
|
33 |
+
## Code Style
|
34 |
+
|
35 |
+
- Use type hints where appropriate
|
36 |
+
- Include proper error handling for web requests
|
37 |
+
- Follow PEP 8 style guidelines
|
38 |
+
- Add docstrings for functions with clear parameter descriptions
|
39 |
+
- MCP functions should have descriptive docstrings as they become tool descriptions
|