This project aims to scrape data from daraz Nepal, and save the scraped data into a database and show scraped data in simple Django web app.
To run this project, you need to have the following software installed:
- Python 3.x
- Django
- Scrapy
- Database management system (e.g.SQLite)
- Clone the repository or download the source code.
git clone [email protected]:Pradip-p/daraz-py-scraper.git
- Install the required dependencies.
cd crawler/
pip install .
note: please setup django project first then run py-crawler script.
To run project on locally :
- create a .env file in project directory set
MYPROJECT_ENV=dev
- Run the Django project.
python manage.py makemigrations
python manage.py migrate
python manage.py createsuperuser
python manage.py runserver
-
Access the web app by visiting http://localhost:8000/ in your web browser.
-
Run the daraz_com.py script to scrape data from daraz and save it to the database.
python crawler/daraz_com.py
- After running scripts, the scraped data will be stored in the SQLite database.
-
Download Options: Implement a download button for each crawler run, allowing users to export scraped data in various formats such as JSON, CSV, and Excel. This feature will enhance data accessibility and usability.
-
Automated Email Delivery: Integrate an automatic email delivery system that sends scraped data to clients' specified email addresses. This will provide users with a convenient way to receive updates and reports without manual intervention.
-
Live Logging: Develop a live log feature that tracks the progress and status of each scraper in real-time. Users will benefit from enhanced transparency and the ability to monitor scraping activities as they happen.
-
Notification System: Add a notification system to alert users about the completion of scraping jobs, errors, or other important events. Notifications can be sent via email, Slack, or other preferred communication channels, ensuring users stay informed and can take prompt actions as needed.
Contributions are welcome! If you encounter any issues or have suggestions for improvements, please submit a pull request.
This project is licensed under the MIT License.