Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revive ValueEnum.as_dict method #974

Merged
merged 2 commits into from
Apr 3, 2024

Conversation

esoteric-ephemera
Copy link
Collaborator

A bug was reported in atomate2 where during a call to monty.json.jsanitize (specifically with strict=True and enum_values = False), an error was thrown during deserialization of VaspObject because it lacks an as_dict method. I reproduced this with versions of emmet-core > 0.77.1

Since VaspObject inherits from emmet.core.utils.ValueEnum, and its as_dict method was removed in PR #944, this is the source of the error

TL;DR: This PR adds back the kludgey as_dict -> str method of ValueEnum. Explainer below, I think these points merit discussion:

Discussion / resolution (???):

  • @Andrew-S-Rosen has suggested not reviving the as_dict method because an as_dict should not return a str and because the correct deserialization of an Enum can occur with changing enum_values = True in jsanitize
    • I agree with these points, but this requires a PR in jobflow with many unknown downstream consequences
  • Removing the as_dict from ValueEnum can have many downstream consequences because this method, while a kludge, has existed for a long time. @utf has raised the issue of downstream consequences from removing as_dict and I also agree with these
    • Adding clear documentation to explain what and why this function does on ValueEnum should make it clear to users.

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.02%. Comparing base (5cc66a2) to head (17d7a86).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #974   +/-   ##
=======================================
  Coverage   90.01%   90.02%           
=======================================
  Files         138      138           
  Lines       13226    13228    +2     
=======================================
+ Hits        11906    11908    +2     
  Misses       1320     1320           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Andrew-S-Rosen
Copy link
Member

Andrew-S-Rosen commented Mar 28, 2024

This is all fair.

I think it is worth us strongly considering adding a deprecation warning on this class, however, and committing to getting rid of it. We can be generous with the deprecation warning (e.g. 6 months). I'll still stand by my original point that the downstream codes relying on this hack should handle things in a better way. Namely, that means switching to Enum as the base class where appropriate and having jobflow call jsanitize with enums=True if the goal is to serialize enums to str (or, if this is not desirable, have atomate2 be able to pass a customize set of jsanitize kwargs to Jobflow somehow).

Reviving the .as_dict() method makes sense for now to prevent breaking downstream packages, but it also causes significant problems when objects containing a ValueEnum are used with various monty utilities. It is worth us doing things the right way in the long term. As an ecosystem, we should strive to avoid hacking .as_dict() methods with non-dictionary outputs, and I think that's a point we can all get behind.

Also tagging @jmmshn and @munrojm on this discussion.

@utf
Copy link
Member

utf commented Apr 3, 2024

Hi @Andrew-S-Rosen, thanks for your thoughts. I still have concerns about deprecating this class entirely. There is a desire to support both enum value and enum dict serialisation at the same time in atomate2/jobflow. I believe that means having enum_values=False and some other Enum-like object for achieving this.

A couple of thoughts going forward:

  • Is your main objection at the moment the confusing naming of the as_dict method, rather than the fact that these objects aren't serialisable by themselves? A potential solution could be to create a new object that behaves as ValueEnum but which doesn't have this method and therefore which won't confuse users.
  • Do we actually need specific enums for crystal system etc? Could we just use literal["triclinic", "monoclinic", ...] in the SymmetryData schema. E.g., here

Either way, I think merging this for now would be helpful for atomate2 and we can explore alternatives in the meantime.

@munrojm munrojm added the release:patch Patch updates label Apr 3, 2024
@munrojm munrojm merged commit 98debc2 into materialsproject:main Apr 3, 2024
10 checks passed
@utf
Copy link
Member

utf commented Apr 3, 2024

Thanks @munrojm !

@Andrew-S-Rosen
Copy link
Member

Andrew-S-Rosen commented Apr 3, 2024

@utf: Thanks for the follow-up!

Is your main objection at the moment the confusing naming of the as_dict method, rather than the fact that these objects aren't serialisable by themselves? A potential solution could be to create a new object that behaves as ValueEnum but which doesn't have this method and therefore which won't confuse users.

The misleading name for the method is certainly one reason, although not the major one. The major issue is that, by adding in this non-dictionary .as_dict() method, objects (e.g. dictionaries, Pydantic models) containing a ValueEnum will not play nicely with any monty routine that internally expects .as_dict() to return a dictionary. Of course, I understand the .as_dict() method was chosen because of the MSONable spec, but if there is a way to have a new object serialize like the existing ValueEnum without relying on a bolted on .as_dict()/.from_dict() method, that would certainly be great. This was the issue demonstrated via #914, but that is merely one of several instances. The issue is broader than CrystalSystem specifically.

Do we actually need specific enums for crystal system etc? Could we just use literal["triclinic", "monoclinic", ...] in the SymmetryData schema. E.g., here

Indeed, this is a separate but also important point. There are several times where there is no need to be using ValueEnum. This is likely one of those instances.

@esoteric-ephemera esoteric-ephemera deleted the enums branch May 21, 2024 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release:patch Patch updates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants