Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEFS download fixes #3349

Open
wants to merge 11 commits into
base: develop
Choose a base branch
from
Open

Conversation

infotroph
Copy link
Member

Description

Getting download.NOAA_GEFS working again, most notably by not scraping two different webpages to extract dates that we can predict by knowing the forecasts are retained for four days. Best I can tell there have been some changes in the NOAA server configuration and the format of the grib files since this function last worked.

See comments inline for my reasoning on each change, but I'd like a critical eye from someone more familiar with GEFS.

Motivation and Context

Review Time Estimate

  • Immediately
  • Within one week
  • When possible

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • My name is in the list of CITATION.cff
  • I have updated the CHANGELOG.md.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

#for(j in 1:31){
if(ens_index == 1){
base_filename2 <- paste0("gec00",".t",cycle,"z.pgrb2a.0p50.f")
curr_hours <- hours_char[hours <= 384]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of my changes in this hunk are just to simplify the logic, but flagging that I can't figure out why this line was here at all: All ensemble members from a given forecast cycle have the same number of hours available, so my unconfident best guess is someone got confused between cycle ids and ensemble ids. But cycle 00 is the one with more hours, so this would have been wrong anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think there are some ensemble members that are the full 35 days, and others that are shorter, even for cycle 00

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point to any examples of the shorter members, or documentation that mentions them? I've been poking around and haven't yet found any yet, but if this is an intermittent thing it'd be easy for me to have missed it.

curr_time <- lubridate::with_tz(Sys.time(), tzone = "UTC")
curr_date <- lubridate::as_date(curr_time)

noaa_page <- readLines('https://nomads.ncep.noaa.gov/pub/data/nccf/com/gens/prod/')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These pages (here and on deleted line 91 below) load fine in a browser, but inside R and via curl they throw Stream error in the HTTP/2 framing layer. But all this scraping and extracting can be replaced by knowing that the forecasts are retained for four days -- am I missing cases where data posting isn't reliable enough to assume "today and the three days before that"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My experience is there can be delays in when files show up and occasionally gaps, which is why we didn't just assume forecasts are always there.

Copy link
Member Author

@infotroph infotroph Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good to know, but note that the existing code only checked whether there exists a date folder that matches the requested forecast date, not whether all forecasts (or even all cycles) are present in it. How often do delays/gaps occur at the whole-day level?

Each individual file download (line 50 above) is wrapped in a tryCatch, so for both old and new code the likeliest outcome when files are missing will be a series of "skipping" messages (possibly then followed by a failure when downscaling can't bridge the time gap). Is that acceptable for this PR or do we need to dig further in on pre-download availability checks?

modules/data.atmosphere/R/GEFS_helper_functions.R Outdated Show resolved Hide resolved
hours_char = hours_char,
cycle = cycle,
base_filename1 = base_filename1,
vars = vars,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for clarity

Comment on lines -17 to -18
##' GEFS weather data isn't always posted immediately, and to compensate, this function adjusts requests made in the last two hours
##' back two hours (approximately the amount of time it takes to post the data) to make sure the most current forecast is used.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this two-hour adjustment was being done in the current code, and am sure it isn't being done after this PR. Deleted it from the docs.

@@ -223,8 +223,16 @@ downscale_ShortWave_to_half_hrly <- function(df,lat, lon, hr = 0.5){

for (k in 1:nrow(data.hrly)) {
if(is.na(data.hrly$surface_downwelling_shortwave_flux_in_air[k])){
SWflux <- as.matrix(subset(df, .data$day == data.hrly$day[k] & .data$hour == data.hrly$hour[k], data.hrly$surface_downwelling_shortwave_flux_in_air[k]))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing a bit on what this was supposed to be, because it could not have worked as written -- not only is .data invalid inside base::subset (it's specific to tidy evaluation) , but I'm not sure what subset would do with a bare number for its third argument.

My replacement below seems reasonable if the goal of this step is "when SWflux is null in the hourly data, take it from the matching day and hour of the coarser-scale data", but please speak up if you see a better fix.

modules/data.atmosphere/R/half_hour_downscale.R Outdated Show resolved Hide resolved
Comment on lines 200 to 201
lats <- round(lat_list/.5)*.5
lons <- round(lon_list/.5)*.5
Copy link
Member Author

@infotroph infotroph Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but: GEFS is now also available on a 0.25 degree grid. Would it be of interest to add support for that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants