Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a serialization option for EXIF dictionaries #129

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

EtiennePelletier
Copy link
Contributor

@EtiennePelletier EtiennePelletier commented Nov 5, 2020

Many have requested this feature for a while (#59, #110). With Python 2 being dropped in the next release, the task to add a serialization option becomes more simple.

Converting IfdTags to native Python types cannot be done exclusively after parsing the EXIF tags currently because there is no way to know if the IfdTag.values were converted to a meaningful printable value (could be an enum, a make_string, etc) during processing (this depends on the tag definitions). This pull request adds a prefer_printable attribute to the IfdTag class that allows an external function to do the conversions properly.

In order to use the suggested conversion function, builtin_types=True kwarg can be added to calls to process_file:

exifread.process_file(..., builtin_types=True, ...)

A few fields contain or may contain binary data, preventing immediate dump to JSON (if chosen as serialization format, although databases like MongoDB handle storing binary natively). The tags are mainly JPEGThumbnail, TIFFThumbnail, EXIF MakerNote and MakerNote Tag 0x####, but there are more.
If JSON is strictly preferred, binary values can be handled afterwards in any appropriate way depending on the use case, or mainly excluded from the output dictionary with exifread.process_file(..., details=False, ...).

I updated the utils.get_gps_coords function to support both Exif tags where values are either IfdTag or native types. This also addresses #193.

All changes in this pull request are fully backwards compatible with current develop branch. I am open to any feedback!

🐍 🤹‍♂️

@ianare
Copy link
Owner

ianare commented Nov 6, 2020

This is great, thank you. I would like to take some time for testing and analysis but globally it looks good to me.

@EtiennePelletier
Copy link
Contributor Author

Hi @ianare , have you been able to have a look? A new release with this would be awesome! Thanks!

@EtiennePelletier
Copy link
Contributor Author

PR has been updated to fix conflicts from develop branch changes and account for import structure changes.

@ianare ianare force-pushed the develop branch 6 times, most recently from 05fe609 to 04efa3f Compare May 3, 2023 00:51
@EtiennePelletier
Copy link
Contributor Author

Hi @ianare,

I’ve updated this branch to include the latest changes and resolve conflicts. I also took the opportunity to:

  • Update the documentation.
  • Simplify the function and argument names.
  • Refactor the serialization process to be more modular, maintainable and easier to test.

In addition, I've made changes to ensure the tests pass (see PR #202). I've tested the new functionality extensively on the sample images as well as additional ones. Since serialization is optional, I haven’t added automated tests for this feature yet, but I’m open to doing so once related PRs are merged.

Other improvements since opening this PR in 2020 include:

  • Returning None instead of an empty string for blank tag values.
  • Improved cleanup of values during serialization.

To help with future debugging and testing, here are some commands I used to refine the conversion functions. You’ll need to uncomment the debugging code to use them.

# Run EXIF.py with builtin type option and output to 'serialize_test'
$ find ../exif-samples -regextype posix-egrep -iregex ".*\.(bmp|gif|heic|heif|jpg|jpeg|png|tiff|webp)" -print0 | LC_COLLATE=C sort -fz | xargs -0 EXIF.py -b > serialize_test

# Shows how many exif values from each field_type were converted, grouped by the output built-in type and conversion function used
$ grep convert_ serialize_test | sort | uniq -c

# Display specific input-to-output conversions, sorted by frequency (modify first grep argument for use case)
$ grep "convert_ascii: 2 to str" -A 1 serialize_test | grep -vP "(convert_|^--)" | sort | uniq -c | sort -h

Sample output listing type conversion paths

      2 convert_ascii: 2 to bytes
    112 convert_ascii: 2 to NoneType
    968 convert_ascii: 2 to str
      5 convert_bytes: 13 to NoneType
      5 convert_bytes: 1 to bytes
     25 convert_bytes: 1 to int
     25 convert_bytes: 1 to NoneType
     15 convert_bytes: 1 to str
      6 convert_numeric: 11 to NoneType
      5 convert_numeric: 12 to NoneType
    328 convert_numeric: 3 to int
    125 convert_numeric: 3 to list
      7 convert_numeric: 3 to NoneType
    500 convert_numeric: 4 to int
     24 convert_numeric: 4 to list
     15 convert_numeric: 4 to NoneType
      7 convert_numeric: 6 to NoneType
      1 convert_numeric: 8 to int
      3 convert_numeric: 8 to list
      2 convert_numeric: 8 to NoneType
     44 convert_numeric: 9 to int
      3 convert_numeric: 9 to NoneType
    361 convert_proprietary: 0 to str
      3 convert_proprietary: 1 to NoneType
      2 convert_proprietary: 1 to str
      1 convert_proprietary: 2 to NoneType
      2 convert_proprietary: 2 to str
   1337 convert_proprietary: 3 to str
    108 convert_proprietary: 4 to str
      1 convert_proprietary: 5 to NoneType
     34 convert_proprietary: 7 to NoneType
    370 convert_proprietary: 7 to str
      2 convert_proprietary: 8 to str
      3 convert_proprietary: 9 to str
     70 convert_ratio: 10 to float
     86 convert_ratio: 10 to int
     11 convert_ratio: 10 to list
      7 convert_ratio: 10 to NoneType
    387 convert_ratio: 5 to float
    557 convert_ratio: 5 to int
     89 convert_ratio: 5 to list
      8 convert_ratio: 5 to NoneType
    104 convert_undefined: 7 to bytes
      2 convert_undefined: 7 to int
     33 convert_undefined: 7 to NoneType
     96 convert_undefined: 7 to str

Sample output (truncated to last few lines only) showing Exif values that had field type 2 (ascii) and were converted to a string

      9 MakerNote ImageAdjustment --> 'NORMAL'
      9 MakerNote ImageSharpening --> 'NORMAL'
      9 MakerNote ImageStabilization --> 'VR-ON'
      9 MakerNote ISOSelection --> 'AUTO'
      9 MakerNote NikonCaptureVersion --> 'COOLPIX P6000V1.0'
      9 MakerNote NoiseReduction --> 'OFF'
      9 MakerNote Saturation --> 'NORMAL'
      9 MakerNote Tag 0x00B2 --> 'NORMAL'
      9 MakerNote ToneCompensation --> 'NORMAL'
      9 MakerNote Whitebalance --> 'AUTO'
     11 Image Make --> 'Canon'
     11 Image Make --> 'NIKON'
     11 MakerNote Quality --> 'FINE'
     14 GPS GPSLongitudeRef --> 'E'
     16 Image Software --> 'GIMP 2.4.5'
     17 GPS GPSLatitudeRef --> 'N'
     44 Interoperability InteroperabilityIndex --> 'R98

Thank you for considering this contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants