Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Procedure for handling "cannot resolve" Sentry errors #24

Open
reginafcompton opened this issue Jul 20, 2018 · 24 comments
Open

Procedure for handling "cannot resolve" Sentry errors #24

reginafcompton opened this issue Jul 20, 2018 · 24 comments

Comments

@reginafcompton
Copy link
Contributor

Recently, we increased the level of logging to Sentry to help DataMade quickly identify data problems, before the client does.

What should we do with cannot resolve pseudo id to Bill warnings?

Potential step-by-step:
(1) check if bills are in Legistar;
(2a) if they are, then add them to this issue: opencivicdata/scrapers-us-municipal#241 Ignore the Sentry error (since the error has been recorded in a Github issue).
(2b) if they are not, then they might be private, but will become public. So, keep an eye on it? contact Metro? ignore the Sentry error? I am not sure.....

I am also not sure if we still need this level of logging, given that we’ll be aggressively scraping all bills every Friday.

@hancush
Copy link
Member

hancush commented Jul 20, 2018

+1 on reducing errors, it makes it very hard to use both sentry and the semaphor channel

@hancush
Copy link
Member

hancush commented Jul 20, 2018

i would like to propose that we devise a way of creating a digest of things we cannot resolve, and logging it in one place, i.e., opencivicdata/scrapers-us-municipal#241, rather than logging each and every one of these instances as a sentry error.

@reginafcompton
Copy link
Contributor Author

I like the idea of a digest. It correlates with what @fgregg proposes in point 2 here. That is, we'd capture these unresolved bill errors and then scrape Metro for just those bills. We'd have a log of what that special scrape does, as opposed to Sentry errors.

@fgregg
Copy link
Member

fgregg commented Jul 23, 2018

I think that we can move to a digest, or even reducing the level of logging once we have an understanding of all the reasons why unresolved bills (and other thigns) appear. I don't think we are there yet.

@hancush
Copy link
Member

hancush commented Jul 23, 2018

@fgregg I definitely agree that we need to get to the bottom of this problem, but I'm not sure that needs to happen at the expense of one of our primary channels of communication. 10+ often redundant error notifications on every scrape is a lot, especially when the scrapes are happening at an increased frequency on Fridays. Coupled with pretty crappy search functionality in Semaphor, it becomes way too easy to lose track of conversations. Is there a way we can reconcile the log level with our communication needs? What about a separate channel for pupa errors?

@fgregg
Copy link
Member

fgregg commented Jul 23, 2018

@hancush

  1. I think its's a great idea to split channels between conversation and logging
  2. I'm not very concerned with redundancy of alerts if we are being alerted by things that are problems.

If we know that something is not a problem, is the ignoring of those events in sentry sufficient. If not, why not?

@reginafcompton
Copy link
Contributor Author

I too like the idea of separate channels, but I don't think it's just a matter of distinguishing between conversation and logging, since the pupa-cannot-resolve-errors stand to obscure other meaningful Councilmatic errors (e.g., from Miami, or import_data or Solr...etc.)

I'd rather see a separate channel for the Pupa errors entirely, and then preserve the Councilmatic channel as it has been in the past.

I also think we can ignore Pupa errors once (1) we made a note of the error in a relevant Github issue (see above), or (2) we can absolutely identify the error as not a problem.

@hancush
Copy link
Member

hancush commented Jul 23, 2018

It may be that we have just not stemmed the tide of this class of error just yet, but I muted at least 15 cannot resolve errors Friday and it felt like at least that many more came in the next scrape to take their place. These felt urgent to resolve, because I knew the errors would just recur 20 minutes later and further clog the channel. I would estimate I spent about an hour on this quasi-urgent task and related context switching. I'm sure @reginafcompton lost some time on it, as well.

In summary, I do not feel that muting alone addresses the problem, because it is time consuming and – so far – less effective than I would like at keeping the notifications at bay. Perhaps the number of errors will be reduced when we've spent the time to mute them all; but it seems like by that point, not being notified at all would be the same solution, except it wouldn't cost us the hours.

To your point about redundancy, I would strongly prefer that alerts not be redundant. It becomes too easy to ignore them, and potentially miss a meaningful one. Moreover, we don't learn anything from redundant alerts, apart from that the error is still happening, which we can already assume, because we know it's often not self-resolving, and we haven't made a change to fix it.

@fgregg
Copy link
Member

fgregg commented Jul 23, 2018

For the flooding issue, it seems like we can address that by changing the frequency of reporting to semaphor

screenshot_2018-07-23 sentry

In my opinion @evz should not move the civicpro scrapers to a separate repo, since different people have responsibility for addressing those.

We already have councilmatic channel, where councilmatic errors should be located.

@fgregg
Copy link
Member

fgregg commented Jul 23, 2018

I updated the semaphor rule so that a "warning or error" level issue will only be reported once per 24 hours. critical errors will still be reported up to every 5 minutes.

@reginafcompton
Copy link
Contributor Author

reginafcompton commented Jul 23, 2018

Right @fgregg - I meant "obscure other meaningful SCRAPER errors", not Councilmatic errors.

I think that Semaphor update will make a difference.

We also need to undo the change to LOGGING from Friday. #25 I can do that this morning.

I am not sure, however, if we have an agreed upon step-by-step for dealing with these Pupa warnings. Does what I summarized above make sense? I think if we really want to understand the nature of these errors, then we'll need to think more about my suggested (2b).

@reginafcompton
Copy link
Contributor Author

I checked today's batch of "cannot resolve" errors against Legistar: none of them were present in the API.

I propose that we make a consolidated list of these bills (we can take a look at the scraper logs to get past errors) and send it to Metro. We need their help to determine if these bills:
(1) are private and will remain private (in which case no action from us is needed);
(2) are private and will be come public;
(3) are something else....

Then, we can make a plan for resolution.

I can pull together a list today and send it to Metro.

@fgregg
Copy link
Member

fgregg commented Jul 25, 2018 via email

@reginafcompton
Copy link
Contributor Author

I am not sure I understand your question @fgregg - can you say more?

@fgregg
Copy link
Member

fgregg commented Jul 25, 2018

Is this the first time we got this alert from sentry? If so, why? We scrape the events every night, so shouldn't we have seen these before.

@hancush
Copy link
Member

hancush commented Jul 25, 2018

that's actually an interesting question, @fgregg – looking at the frequency of occurrence charts in sentry (check em out!), it looks like these recur, but not every night. (it's possible the reason for this is totally obvious and i'm just not in the scraper headspace.) in any case, the ones from today aren't new.

@fgregg
Copy link
Member

fgregg commented Jul 26, 2018

i have suspicion that this is worth figuring out.

@reginafcompton
Copy link
Contributor Author

reginafcompton commented Jul 26, 2018

"Why did we not see these alerts more often?"

Forest turned up the volume on Pupa logging on June 22; I turned down the scraper volume from July 20-24. Sentry thus had 29 days to alert us about unresolved bills. However, according to our Semaphor chat, we periodically and a little haphazardly ignored (several, but not all) alerts for a period of time (e.g., for a week, until Monday, etc.) on July 5, 6, 13, 19, 20. This would explain why these bills do not have consistent daily alerts, for example:
https://sentry.io/datamade/scrapers-us-municipal/issues/587748286/events/

screen shot 2018-07-26 at 10 33 15 am

A couple inconsistencies – I see that some bills do not have alerts until later in June...why is that?
https://sentry.io/datamade/scrapers-us-municipal/issues/591071227/events/
https://sentry.io/datamade/scrapers-us-municipal/issues/591071135/events/

For many bills, we did not get alerts on July 3 or 4 - were they ignored? (@hancush do you recall?)

@reginafcompton
Copy link
Contributor Author

reginafcompton commented Jul 27, 2018

Coming to terms with the Pupa errors

Shelly gave us terrific information about some of these unresolved bills. (I gave her a large sample to look into.) Given this information and what we learned in this issue, we can distinguish four types of bills that raise the "Cannot resolve error":

  1. 2015-**** bills referenced in agendas that Metro created April and May 2015 - these are "practice" entries and do not have finalized agendas in Legistar.
  2. General Public Comments reports from May 2018 (2018-0316, 2018-0315, 2018-0312). Agendas reference these bills when Metro was using the commenting system in beta.
  3. Newly created Bills that remain private until the agenda is ready.
  4. Other bills that the scraper misses for sundry reasons to be determined. We know about two of these.

Actionable steps

  • If Metro decides to remove the early 2015 events, then we can do the same in our data sources, and be done with many of these "cannot resolve" errors. If Metro does not remove them, then we should consider instructing the scraper to skip these events. (Either way, we've already ignored them in Sentry.)
  • For the General Public Comments from May 2018, we could add a mechanism for skipping the import of these bills in Pupa. Or (more simply) we can ignore them in Sentry (as we have), and call it a day.

I am most concerned about classes (3) and (4), since these have caused issues in the past. On one hand, we've confronted this problem by aggressively scraping all bills on Fridays. However, this strategy slows the bill import time (from a maximum of 30 minutes to 45 minutes), since it takes about 22 minutes for the scraper to grab all bills. Alternative, more efficient strategies include:

In the short term, I prefer the first option (a windowed scrape of bills from the last year), since it's an easy adjustment.

Ideally, I would like our scrapers to have access to private bills. Why? Then Pupa errors will carry greater meaning, whereas now, we just get a flood of errors on certain Fridays and think, "oh well, these must be private bills that will soon become public....la-te-dah."

@reginafcompton
Copy link
Contributor Author

Metro tested switching bills from private to public using a few techniques. I outlined the results of those tests here.

Specifically, I learned two meaningful pieces of information:

(1) Publishing an agenda does not change the timestamp of the "Not viewable" bills, to which the agenda refers. The bills become public, but their MatterLastModifiedUtc remains unchanged. This confirms what we already suspected.

(2) Manually unchecking the "Not viewable" box for a bill does change the MatterLastModifiedUtc timestamp.


Next steps

With this knowledge, we have a few options, though one seems better than the others.

  • We can continue to aggressively scrape bills on Fridays (though that's an imperfect solution - see comments above).
  • We could also request that Metro staff manually toggle bills from private to public, whenever they publish an agenda. That seems prone to human error and tedious labor for Metro.

I think our best option is to write some code that scrapes bills related to newly published agendas, something like:

  1. find all the events with a newly published agenda (i.e., using the EventAgendaLastPublishedUTC).
  2. iterate over each event's eventitems
  3. scrape the bills referenced in the items

This logic could reside in the LAMetro bills scraper, though we could make some changes further upstream (assuming that this problem affects NYC and Chicago?).

@fgregg
Copy link
Member

fgregg commented Aug 2, 2018

So getting access to private bills is off the table?

@hancush
Copy link
Member

hancush commented Aug 2, 2018

could we check the agendalastmodified date for bills, like we do for events, in the python-legistar scraper?

edit: oh, haha, bills don't have agendas..... NEVER MIND ME.

@reginafcompton
Copy link
Contributor Author

@fgregg - Omar is looking into it. Let's wait for his reply before acting on anything.

@reginafcompton
Copy link
Contributor Author

From Metro:

'Unfortunately, we don’t know of a way to give the scraper access to the “Not Viewable on Insite" reports. Omar has asked Granicus about this in the past, and received back either “we’ll look into it” or no response at all.'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants