-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test if any features in the component
model can be removed without impacting performance
#3677
Comments
Hi @suhaibmujahi, I followed the instructions in the README.md and successfully cloned the repository onto my system. To evaluate the current performance of the model, I attempted to train the component model using the following command: python3 -m scripts.trainer component While the repository suggests that the model should take around 30 minutes to train, in my case, it's been running for over 6 hours. Unfortunately, my laptop's battery cannot sustain such a long training process. I'd like to inquire if there are any specific system hardware requirements I should be aware of. Additionally, do you have any recommendations for speeding up the model training process? Thank you for your assistance. |
Welcome @Inyrkz -- Thank you for your interest in the project!
The duration required to train a model varies based on the model, data size, and hardware used. The readme file warns that training will take more than 30 minutes ("warning this takes 30min+"). For testing purposes, you could limit the data size to speed up the process using the Currently, we have an issue on file to enable training on our infrastructure instead of locally (#3688), but I do not know when that will be ready. |
Thank you for the clarification regarding the training issue. Yeah, using the I'll continue to monitor the training process and will be patient as the team works towards a solution. If I have any further questions or encounter any issues, I'll be sure to reach out. |
Hi @suhaibmujahid , it took me a while to set-it up. Finally, when I ran the trainer for component. I am getting this error: First Run: Second Time
|
@gothwalritu you need to have |
Thanks Suhaib, I followed the steps and getting this error now: (venv) PS C:\ritu\bugbug> python -m scripts.trainer component I ran it second time and getting the same index out of range error: PS C:\ritu\bugbug> zstd --version I installed popen as well. |
I ran the zstd manually and it is working now :). PS C:\ritu\bugbug> zstd -df data\bugs.json.zst data\bugs.json.zst : 2307799708 bytes |
@suhaibmujahid: The training for component ran for a couple of hours and then threw this error. This looks like a code issue...could you please advise?
|
Hi, it's me again and I again got the zstd issue:
I noticed that the zstd file name is zstd.exe and not zstdmt. So started the training again after I changed the code in my repo to this:
hopefully it works this time. |
Hi @suhaibmujahid , I am working on this task and so far, I have understood that it requires ablation study. My question is the |
@gothwalritu Currently there is not. Feel free to use any tools you find useful. |
@suhaibmujahid : The ablation study runs for all the features finished today. After analyzing the metrics.json for each run, I found that the model already outputs the averages of the metrics for each target component. Now I am working to design the comparison methodology and then write a report which will document the process, results and conclusions. I have a query: Although no coding was required to run these models, should I still submit the report via GitHub, or is submitting here more appropriate? Also please correct me if I am not on the right path. |
@gothwalritu you could submit a PR to apply the findings of your experiments (e.g., dropping a specific feature). A complete formal report is not required, a compression between before and after should be sufficient. |
Test if any features in the component model can be removed without impacting performance mozilla#3677 based on /docs/models/component_ablation_study.md is_coverity_issue feature can be removed from component model.
@suhaibmujahid : Thanks, Subaib, I have created the pull request and also stored my findings in docs/models/component_feature_ablation.md. |
mozilla#3677 **Conclusion** Based on the evaluation metrics, removal of _is_coverity_issue_ component exhibits superior performance in terms of Precision, Recall, F1 Score, Geometric Mean, and IBA. Although it has a slightly lower specificity compared to other runs, its higher values in other key metrics signify a better balance and predictive accuracy. On the other hand, removal of _severity_ component registers the lowest performance across most metrics.
bugbug/bugbug/models/component.py
Lines 73 to 85 in 41b1372
The text was updated successfully, but these errors were encountered: