Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory spikes x10 if shapes are in a network #1224

Open
2 tasks done
Irieo opened this issue Aug 16, 2024 · 1 comment
Open
2 tasks done

Memory spikes x10 if shapes are in a network #1224

Irieo opened this issue Aug 16, 2024 · 1 comment
Assignees
Labels

Comments

@Irieo
Copy link
Contributor

Irieo commented Aug 16, 2024

Checklist

  • I am using the current master branch or the latest release. Please indicate.
  • I am running on an up-to-date pypsa-eur environment. Update via conda env update -f envs/environment.yaml.

Describe the Bug

PR #1013 introduced a new feature that shape files are stored inside network files (in addition to the .geojson-files that are stored in the resources/). This is convenient for plotting; however, for large networks, reading networks requires massive memory spike compared to previous versions w/o n.shapes.

For example, take a workflow for the 50 node electricity-only network with an up-to-date pypsa-eur. Let's pick build_powerplants rule from build_electricity.smk with default 7GB memory allocation:

rule build_powerplants:
params:
powerplants_filter=config_provider("electricity", "powerplants_filter"),
custom_powerplants=config_provider("electricity", "custom_powerplants"),
everywhere_powerplants=config_provider("electricity", "everywhere_powerplants"),
countries=config_provider("countries"),
input:
base_network=resources("networks/base.nc"),
custom_powerplants="data/custom_powerplants.csv",
output:
resources("powerplants.csv"),
log:
logs("build_powerplants.log"),
threads: 1
resources:
mem_mb=7000,
conda:
"../envs/environment.yaml"
script:
"../scripts/build_powerplants.py"

The script build_powerplants.py requires ~10.6GB of memory for the 50 node network, whereas profiling the same script w/o the line that reads base network show everything w/o reading network requires only ~2.2GB of memory. The 7GB memory legacy setting is thus not sufficient anymore, breaking the workflow with the default settings.

What's causing the memory spike?

If profiling the following test script shows that most of 10GB memory spike is needed in PyPSA/pypsa/io.py for the xarray call self.ds = xr.open_dataset(path)

import pypsa
n = pypsa.Network("resources/test-50/networks/base.nc")

image

Now, if we drop n.shapes, write to nc, and read again -> the same line requires 80x less memory (120 MB):

n.mremove("Shape", n.shapes.index)
n.export_to_netcdf("resources/test-50/networks/base_noshapes.nc")
n = pypsa.Network("resources/test-50/networks/base_noshapes.nc")

image

What can be done?

-- increase memory requirements within PyPSA-Eur and PyPSA-x (not ideal given the size of spikes)
-- make n.shapes optional in config (trade-off between convenience and sanity)
-- any workaround for xr.open_dataset(..)?

@fneum
Copy link
Member

fneum commented Aug 27, 2024

xref #1238

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants