You can install the extension from the Chrome Web Store here
This file is a script used to crawl personal information related to bilibili content creators, such as avatar links, signatures, premium membership info, etc.
- Run command:
python3 user_info_request.py 0
- You need to pass an integer to determine which
mid*chunk*{x}.json
to use.
This file is a script used to crawl the top 1000 videos of the current month's popular rankings for each category.
- Run command:
python3 hot_videos_crawler.py
- The result for each category will be saved separately into a CSV file.
- You need to manually modify
year=20xx
in the program to specify which year's data to crawl.
The data processing scripts will eventually generate a JSON file that includes each bilibili content creator and their corresponding similar author list. Currently, 30 users are selected, and then combined with user_info.json
to filter bilibili content creators whose avatars and signatures can be displayed. This is because the user_info_request.py
crawler might not have retrieved account information for some UP users.
The /eda
folder contains:
-
Sample Dataset:
bili-hot-2019
, a representative dataset used to demonstrate the data exploration process. -
Python Script for EDA:
bili-eda.ipynb
, a Jupyter Notebook for performing Exploratory Data Analysis (EDA) and data preprocessing on the dataset.