Spaces:
Running
Running
title: Link Lazarus Method | |
emoji: π | |
colorFrom: blue | |
colorTo: red | |
sdk: streamlit | |
sdk_version: "1.32.2" | |
app_file: wikipedia_dead_links_streamlit.py | |
pinned: false | |
# The Link Lazarus Method: Wikipedia Dead Link Finder by metehan.ai - Streamlit Version | |
A Streamlit web application for finding and logging dead (broken) external links in Wikipedia articles, identifying potentially available domains for registration, and saving them to a dedicated database. | |
## Features | |
- **Multiple Search Methods**: | |
- Search by text to find Wikipedia articles | |
- Search by category to find related articles | |
- **Dead Link Detection**: Checks external links for HTTP errors or connection issues | |
- **Domain Availability**: Identifies which domains from dead links might be available for registration | |
- **Restricted TLD Filtering**: Automatically identifies and excludes restricted domains (.edu, .gov, etc.) | |
- **Available Domains Database**: Maintains a separate database of potentially available domains | |
- **Real-time Logging**: Saves dead links and available domains to JSON files as they're found | |
- **Result Visualization**: Displays results in an interactive table with filtering options | |
- **Export to CSV**: Download results as a CSV file | |
- **Web Archive Filter**: Automatically ignores links from web.archive.org | |
- **Configurable**: Adjust settings via the sidebar | |
## Requirements | |
- Python 3.6+ | |
- Required packages listed in `requirements_streamlit.txt` | |
## Installation | |
``` | |
pip install -r requirements_streamlit.txt | |
``` | |
## Usage | |
Run the Streamlit app: | |
``` | |
streamlit run wikipedia_dead_links_streamlit.py | |
``` | |
The application will open in your default web browser with three main tabs: | |
### 1. Search by Text | |
- Enter search terms to find Wikipedia articles containing that text | |
- View search results with snippets | |
- Process all found pages to check for dead links and available domains | |
### 2. Search by Category | |
- Enter a category name to find Wikipedia categories | |
- Select a category to crawl its pages | |
- Find dead links and available domains within those pages | |
### 3. Available Domains | |
- View all potentially available domains found during searches | |
- Filter domains by status (potentially available, expired, etc.) | |
- See details about each domain including where it was found | |
- Download the list as a CSV file | |
## How Domain Availability Works | |
The app uses these methods to determine if a domain might be available: | |
1. **WHOIS Lookup**: Checks if the domain has registration information | |
2. **Expiration Check**: Identifies domains with expired registration dates | |
3. **DNS Lookup**: Verifies if the domain has active DNS records | |
4. **TLD Restriction Check**: Identifies restricted TLDs that cannot be freely registered | |
Domains are flagged as potentially available if: | |
- No WHOIS registration data is found | |
- The domain's expiration date has passed | |
- No DNS records exist for the domain | |
- The domain does NOT have a restricted TLD (.edu, .gov, .mil, etc.) | |
### Restricted TLDs (Optional) | |
The following TLDs are recognized as restricted and will never be reported as available, if you choose to filter them: | |
- .edu - Educational institutions | |
- .gov - Government entities | |
- .mil - Military organizations | |
- .int - International organizations | |
- Country-specific restrictions like .ac.uk, .gov.uk, etc. | |
**Note**: For definitive availability, you should verify with a domain registrar. The tool provides a starting point for identifying potential opportunities. | |
## Configuration Options | |
- **Log file path**: Where to save the dead links JSON results | |
- **Available domains file**: Where to save the available domains database | |
- **Max concurrent requests**: Number of links to check simultaneously | |
- **Max pages to process**: Limit the number of articles to process | |
## Output Files | |
The app generates two main JSON files: | |
1. **wikipedia_dead_links.json**: Contains details about all dead links found | |
2. **available_domains.json**: Contains only the potentially available domains and where they were found | |
You can also download results as CSV files directly from the app. Make sure follow on X @metehan777 and LinkedIn www.linkedin.com/in/metehanyesilyurt for the upcoming updates and more tips&tools. |