Incremental reads for MachOFile and FatFile #23

woodruffw · 2016-02-25T21:52:02Z

Right now, MachOFile.new and FatFile.new read entire binaries into memory. This is efficient when
manipulating their contents, but is unnecessarily expensive when testing the file's sanity
(good magic, reasonable size, etc). As a result, testing large numbers of Mach-O files with exception
handling is unnecessarily slow (when using MachOFile or FatFile directly).
#22 circumvents this problem when using the generic MachO.open method, but the Mach-O type classes should also do this individually.

I'm assigning this to myself, but it's not particularly high on the priority list (Homebrew uses MachO.open only).

The text was updated successfully, but these errors were encountered:

UniqMartin · 2016-06-17T12:45:20Z

I wonder if you have pondered this a little more. I think an easy thing to do would be to read just the first few pages (4096-byte units) of a Mach-O file. You could first just read the Mach header and then inspect its sizeofcmds field, then round that to the next page boundary and just read that part.

You could then manipulate that chunk and overwrite only that portion when syncing the changes to the same file or fetch the rest if writing to a new file. But it feels like it could make sense to optimize the read-only case a bit, as I think this is the more relevant one.

Or am I overlooking something where such an approach would massively complicate the code?

woodruffw · 2016-06-17T22:39:33Z

I have, yeah.

Incremental reads would certainly be beneficial in the read-only case (and probably wouldn't add too much complexity), but I'm not so sure about writing. I'm going to experiment a bit locally and see if I can come up with a solution that improves our read performance statistics without complicating I/O significantly - I like that ruby-macho currently only performs one read (two, counting MachO.open) per binary.

UniqMartin · 2016-06-17T23:04:09Z

I also like the simplicity of a single read. But it seems wasteful to read the whole binary (sometimes many megabytes) just to extract the first few kilobytes, though that's probably less noticeable nowadays thanks to very fast SSDs. But that's just a gut feeling and not supported by any numbers, thus it would be interesting if you can come up with some statistics. But was just curious; this is really low priority.

woodruffw · 2016-06-17T23:08:31Z

But it seems wasteful to read the whole binary (sometimes many megabytes) just to extract the first few kilobytes

Absolutely agree, and relying on hardware (SSDs) probably isn't sustainable. A good solution might be deferring the majority of the read until the first write call/other call that requires inspection beyond sizeofcmds. I'll see what I can come up with 👍

woodruffw added enhancement help wanted optimization long-term labels Feb 25, 2016

woodruffw self-assigned this Feb 25, 2016

woodruffw added this to the Release 2.0.0 milestone Jul 4, 2017

woodruffw mentioned this issue Jan 14, 2022

MachOView needs to support lazy loading for content #433

Open

woodruffw removed this from the Release 2.0.0 milestone Sep 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental reads for MachOFile and FatFile #23

Incremental reads for MachOFile and FatFile #23

woodruffw commented Feb 25, 2016

UniqMartin commented Jun 17, 2016

woodruffw commented Jun 17, 2016

UniqMartin commented Jun 17, 2016

woodruffw commented Jun 17, 2016

Incremental reads for MachOFile and FatFile #23

Incremental reads for MachOFile and FatFile #23

Comments

woodruffw commented Feb 25, 2016

UniqMartin commented Jun 17, 2016

woodruffw commented Jun 17, 2016

UniqMartin commented Jun 17, 2016

woodruffw commented Jun 17, 2016