Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbl.spamhaus.org not working causing ticket/comments with several URLs to be rejected as spam #245

Open
timgraham opened this issue Dec 13, 2024 · 6 comments

Comments

@timgraham
Copy link
Member

It looks to me like any ticket changes are penalized -3 karma per URL that's submitted because dbl.spamhaus.org fails for each one.

Example:

URL's blacklisted by dbl.spamhaus.org (djangopackages.org[255.255.254]), dbl.spamhaus.org (forum.djangoproject.com[255.255.254]), dbl.spamhaus.org (softwarecrafts.uk[255.255.254])

(Domains that are all okay according to https://check.spamhaus.org/.)

It may be related to the spamhaus cannot be resolved by the djangoproject.com server. Per https://stackoverflow.com/questions/64363090/how-do-you-access-the-public-spamhaus-dbl-service, someone could try

$ dig dbltest.com.dbl.spamhaus.org

Proposed resolution:
Remove from "dbl.spamhaus.org" from "URL Blacklists (comma separated):" at https://code.djangoproject.com/admin/spamfilter/external

I believe this is part of the Trac database because I didn't find it in tracenv.ini.

@bmispelon
Copy link
Member

Oh interesting theory, that would indeed explain the posts that have been (wrongly) marked as spam.

I ran the suggested dig command on the server and it seems to work:

; <<>> DiG 9.18.28-0ubuntu0.20.04.1-Ubuntu <<>> dbltest.com.dbl.spamhaus.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 61837
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;dbltest.com.dbl.spamhaus.org.	IN	A

;; ANSWER SECTION:
dbltest.com.dbl.spamhaus.org. 300 IN	A	127.255.255.254

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Fri Dec 13 12:28:27 UTC 2024
;; MSG SIZE  rcvd: 73

I thought I should also try to run it from the container Trac runs from, but onfortunately that image doen't come with dig so I can't. I wonder if the containerization could be interfering with the DNS resolving in a way that breaks the plugin 🤔

I'll look into disabling spamhaus temporarily, I'll report here.

bmispelon added a commit to bmispelon/code.djangoproject.com that referenced this issue Dec 13, 2024
The service doesn't seem to work, possibly because
of the containerization.

Refs django#245
@bmispelon
Copy link
Member

I found the name of the setting in the source code and made a PR with a change to the config: #246

bmispelon added a commit to bmispelon/code.djangoproject.com that referenced this issue Dec 13, 2024
The service doesn't seem to work, possibly because
of the containerization.

Refs django#245
bmispelon added a commit that referenced this issue Dec 13, 2024
The service doesn't seem to work, possibly because
of the containerization.

Refs #245
@thibaudcolas
Copy link
Member

thibaudcolas commented Dec 18, 2024

Here’s a record of Django spam monitoring logs I’ve worked on over the last few weeks, to help us assess different options to tweak the spam filtering to get better results.

With this I’ve identified:

  • 4 instances of things being flagged as spam even though they aren’t.
    • Two would have gone through if the dbl.spamhaus.org check was off
    • Two would still be flagged as spam.
  • Unfortunately there are 2x instances of recent spam that would have gone through if the URLs check was disabled at that time (karma of 0, with a -3 from the faulty URL filter, so would have been a karma of 3 otherwise)

For the two that would still be marked as spam even now, it’s because they have low "session" scores of 6, so probably first-time users whose writing is detected as spam. Their total content’s karma is 0, so it would only take a small tweak to get them through as well. But from what I can see, that would cause 4 more spam entries with spam links to get through (score was -3, would be 0 without the faulty link checker, would be more than 0 if we changed the "first time user" session score).


I’m very hesitant to spend more time on this personally, it feels like there are ways to tweak and get good results, but the diminishing returns are real.

Perhaps there’d be a way to make the spam message more friendly, so users who get into this situation know where to raise the issue or how to work around it?

Error_Submission_rejected_as_potential_spam

@timgraham
Copy link
Member Author

Thanks, Thibaud. I also saw there could be merit in changing "min_karma" to 0. FYI, in your analysis, I think you misidentified a some lines where the user id appeared in the "Quote" field. That's most often when users add themselves to CC.

I've also been wondering if tuning the spam filters is worth the effort. If we instead had a list of approved users and a moderation queue for first-time posters, I think that would eliminate all spam and false positives.

@bmispelon
Copy link
Member

I haven't looked into it, but my gut feeling is that changing the error message when spam is detected should be doable.

A moderation queue sounds like it would work great for us, but unless our existing plugins support it (or there exists one that's still mainained) then it's most likely a non-starter.

A 3rd option could be to do a captcha challenge when spam is detected (I have a vague memory that one of our plugins supports that, but I could be wrong). There's something similar in place for the donation page on djangoproject.com.

I will create tickets for 1 and 2+3 together (my reasoning is that what we really want is a moderation queue, but if that's not possible then a captcha could be acceptable)

@thibaudcolas
Copy link
Member

thibaudcolas commented Dec 21, 2024

👍 I’m also surprised there’s so much spam as we only allow participation from authenticated accounts. Is there some honeypot we might want to put in place on the user registration flow to thwart the simplest types of botting?

I feel like any incremental improvements here would be great. So even just a help message if it’s not too much work, would go a long way.

@timgraham thank you, not sure what I was thinking with those two! I’ve updated the numbers above accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants