ChatGPT Friendly Crawl

prerequisite

Ensure Python 11+ is installed. Dependencies can be installed via:

pip install aiohttp pyppeteer

Usage

Before running the crawler, set these environment variables:

CHATGPT_CRAWL_VAR_START_URL: Starting URL for the crawl.
CHATGPT_CRAWL_VAR_DEPTH: Maximum crawl depth.
CHATGPT_CRAWL_VAR_MAX_PAGES: Maximum number of pages to fetch.

export CHATGPT_CRAWL_VAR_START_URL=$target_url && \
export CHATGPT_CRAWL_VAR_DEPTH=$depth_number && \
export CHATGPT_CRAWL_VAR_MAX_PAGES=$max_pages_number && \
python ./chatgpt_crawl.py

export CHATGPT_CRAWL_VAR_START_URL=https://www.google.com && \
export CHATGPT_CRAWL_VAR_DEPTH=2 && \
export CHATGPT_CRAWL_VAR_MAX_PAGES=100 && \
python ./chatgpt_crawl.py

Benefits of Using `https://r.jina.ai` API

Using the https://r.jina.ai API optimizes the retrieval process, enhancing scalability and reliability without the overhead of managing infrastructure.

Wrap-up

This "ChatGPT Friendly Crawl" combines modern async patterns with a robust API to streamline data collection, making it an efficient tool for scalable web scraping.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ChatGPT Friendly Crawl

prerequisite

Usage

Benefits of Using `https://r.jina.ai` API

Wrap-up

Files

README.md

Latest commit

History

README.md

File metadata and controls

ChatGPT Friendly Crawl

prerequisite

Usage

Benefits of Using https://r.jina.ai API

Wrap-up

Benefits of Using `https://r.jina.ai` API