Memory spikes x10 if shapes are in a network #1224

Irieo · 2024-08-16T16:13:03Z

Checklist

I am using the current master branch or the latest release. Please indicate.
I am running on an up-to-date pypsa-eur environment. Update via conda env update -f envs/environment.yaml.

Describe the Bug

PR #1013 introduced a new feature that shape files are stored inside network files (in addition to the .geojson-files that are stored in the resources/). This is convenient for plotting; however, for large networks, reading networks requires massive memory spike compared to previous versions w/o n.shapes.

For example, take a workflow for the 50 node electricity-only network with an up-to-date pypsa-eur. Let's pick build_powerplants rule from build_electricity.smk with default 7GB memory allocation:

pypsa-eur/rules/build_electricity.smk

Lines 31 to 50 in 885a881

    
           rule build_powerplants: 
        
               params: 
        
                   powerplants_filter=config_provider("electricity", "powerplants_filter"), 
        
                   custom_powerplants=config_provider("electricity", "custom_powerplants"), 
        
                   everywhere_powerplants=config_provider("electricity", "everywhere_powerplants"), 
        
                   countries=config_provider("countries"), 
        
               input: 
        
                   base_network=resources("networks/base.nc"), 
        
                   custom_powerplants="data/custom_powerplants.csv", 
        
               output: 
        
                   resources("powerplants.csv"), 
        
               log: 
        
                   logs("build_powerplants.log"), 
        
               threads: 1 
        
               resources: 
        
                   mem_mb=7000, 
        
               conda: 
        
                   "../envs/environment.yaml" 
        
               script: 
        
                   "../scripts/build_powerplants.py"

The script build_powerplants.py requires ~10.6GB of memory for the 50 node network, whereas profiling the same script w/o the line that reads base network show everything w/o reading network requires only ~2.2GB of memory. The 7GB memory legacy setting is thus not sufficient anymore, breaking the workflow with the default settings.

What's causing the memory spike?

If profiling the following test script shows that most of 10GB memory spike is needed in PyPSA/pypsa/io.py for the xarray call self.ds = xr.open_dataset(path)

import pypsa
n = pypsa.Network("resources/test-50/networks/base.nc")

Now, if we drop n.shapes, write to nc, and read again -> the same line requires 80x less memory (120 MB):

n.mremove("Shape", n.shapes.index)
n.export_to_netcdf("resources/test-50/networks/base_noshapes.nc")
n = pypsa.Network("resources/test-50/networks/base_noshapes.nc")

What can be done?

-- increase memory requirements within PyPSA-Eur and PyPSA-x (not ideal given the size of spikes)
-- make n.shapes optional in config (trade-off between convenience and sanity)
-- any workaround for xr.open_dataset(..)?

The text was updated successfully, but these errors were encountered:

fneum · 2024-08-27T06:44:10Z

xref #1238

Irieo added the bug label Aug 16, 2024

Irieo assigned FabianHofmann Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory spikes x10 if shapes are in a network #1224

Memory spikes x10 if shapes are in a network #1224

Irieo commented Aug 16, 2024

fneum commented Aug 27, 2024

Memory spikes x10 if shapes are in a network #1224

Memory spikes x10 if shapes are in a network #1224

Comments

Irieo commented Aug 16, 2024

Checklist

Describe the Bug

What's causing the memory spike?

What can be done?

fneum commented Aug 27, 2024