CultriX commited on
Commit
80d31c8
·
1 Parent(s): 1151f26

First Commit

Browse files
Files changed (1) hide show
  1. README.md +12 -51
README.md CHANGED
@@ -1,51 +1,12 @@
1
- # RAGScraper
2
-
3
- RAGScraper is a simple Python package that scrapes webpages and converts them to markdown format for RAG usage.
4
-
5
- ## Installation
6
-
7
- To install RAGScraper, simply run:
8
-
9
- ```bash
10
- pip install ragscraper
11
- ```
12
-
13
- ## Usage
14
-
15
- To use RAGScraper as a command-line tool:
16
-
17
- ```bash
18
- rag-scraper <URL>
19
- ```
20
-
21
- To use RAGScraper in a Python script:
22
-
23
- ```python
24
- from rag_scraper.scraper import Scraper
25
- from rag_scraper.converter import Converter
26
-
27
- # Fetch HTML content
28
- url = "https://example.com"
29
- html_content = Scraper.fetch_html(url)
30
-
31
- # Convert to Markdown
32
- markdown_content = Converter.html_to_markdown(
33
- html=html_content,
34
- base_url=base_url,
35
- parser_features='html.parser',
36
- ignore_links=True
37
- )
38
- print(markdown_content)
39
- ```
40
-
41
- ## Development
42
-
43
- To run the tests for RAGScraper, navigate to the package directory and run:
44
-
45
- ```bash
46
- python -m unittest discover tests
47
- ```
48
-
49
- ## Contributing
50
-
51
- Contributions are welcome! Please feel free to submit a Pull Request.
 
1
+ title: RAG-Scraper
2
+ emoji: 🥳
3
+ colorFrom: blue
4
+ colorTo: gray
5
+ sdk: gradio
6
+ sdk_version: 4.44.1
7
+ app_file: app.py
8
+ pinned: false
9
+ license: creativeml-openrail-m
10
+ short_description: 'Scrape webpages for RAG purposes'
11
+ #thumbnail: >-
12
+ # https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/YQdpDtR9myOBCOzUDLaAE.png