Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serializing and deserializing linopy.Model #340

Open
tburandt opened this issue Aug 20, 2024 · 4 comments
Open

Serializing and deserializing linopy.Model #340

tburandt opened this issue Aug 20, 2024 · 4 comments

Comments

@tburandt
Copy link

tburandt commented Aug 20, 2024

Hi,

I am currently exploring/trying to setup larger models in parallel (in individual processes) and pass them back to the main process. Because the individual models are fairly large but can be prepared individually and largely independend from each other. Later on linked specific instances are linked through a few additional constraints.

However, although serializing into a pickle or dill works fine, when trying to serialize the pickle again, a recursion error is thrown and therefore, ProcessPoolExecutor cannot be used to prepare models in parallel. (I.e., ProcessPoolExecutor uses serialization to hand over data from one process to another) This can be easily checked with this example:

import dill
import pandas as pd

import linopy
import pickle

m = linopy.Model()
time = pd.Index(range(10), name="time")

x = m.add_variables(
    lower=0,
    coords=[time],
    name="x",
) # to be done in parallel process
y = m.add_variables(lower=0, coords=[time], name="y") # to be done in parallel process

factor = pd.Series(time, index=time) # to be done in parallel process

con1 = m.add_constraints(3 * x + 7 * y >= 10 * factor, name="con1") # to be done in parallel process
con2 = m.add_constraints(5 * x + 2 * y >= 3 * factor, name="con2") # to be done in parallel process

m.add_objective(x + 2 * y) # to be done in parallel process

with open("test.pkl", 'wb') as f:
    dill.dump(m, f)

with open("test.pkl", 'rb') as f:
    m2 = dill.load(f)

x.lower = 1 # or add whatever additional constraint
m.solve()

Which throws the following error:

Traceback (most recent call last):
  File "C:\github\test\linopy\test.py", line 29, in <module>
    m2 = dill.load(f)
         ^^^^^^^^^^^^
  File "C:\github\test\.venv\Lib\site-packages\dill\_dill.py", line 289, in load
    return Unpickler(file, ignore=ignore, **kwds).load()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\github\test\.venv\Lib\site-packages\dill\_dill.py", line 444, in load
    obj = StockUnpickler.load(self)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\github\test\.venv\Lib\site-packages\linopy\variables.py", line 1149, in __getattr__
    if name in self.data:
               ^^^^^^^^^
  File "C:\github\test\.venv\Lib\site-packages\linopy\variables.py", line 1149, in __getattr__
    if name in self.data:
               ^^^^^^^^^
  File "C:\github\test\.venv\Lib\site-packages\linopy\variables.py", line 1149, in __getattr__
    if name in self.data:
               ^^^^^^^^^
  [Previous line repeated 745 more times]
RecursionError: maximum recursion depth exceeded
@FabianHofmann
Copy link
Collaborator

@tburandt thanks for raising the issue. that's quite unfortunate. Pickling is not tested atm. how about storing it as netcdf in the meanwhile? should be as fast as pickling

@lkstrp
Copy link
Member

lkstrp commented Aug 21, 2024

This is most likely also the reason for deepcopy issues within PyPSA on some networks. I had a look into this a while ago, but this is a better starting point, so I will check again.

@FabianHofmann
Copy link
Collaborator

I have the vague feeling that the get_item and getattribute overrides could be related to this...

@tburandt
Copy link
Author

@FabianHofmann the problem is that multiprocessing and ProcessPoolExecutor (from concurrent.features) for example use pickle (or dill, i am not sure) to handover objects either from one process to another or back to the main process.

For storing the model manually, I can try netcdf. I might have some idea to solve my problem with that at least :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants