Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunked reading over HTTP from node? #54

Open
mbklein opened this issue Jan 26, 2021 · 3 comments
Open

Chunked reading over HTTP from node? #54

mbklein opened this issue Jan 26, 2021 · 3 comments

Comments

@mbklein
Copy link

mbklein commented Jan 26, 2021

Looking through the documentation and the source, it seems that the UrlFetcher only works when exifr is running in a browser. Is there a way to make this work from node using node-fetch or axios? I realize you might not want to make either one a hard dependency, but it would be an enormous help not to have to transfer an entire 2.5 GB TIFF just to read its Exif data.

If this is not currently possible, I'd be happy to write and contribute a reader for it, but I'm not quite sure how to hook it into the main reader code. Is it just a matter of adding another conditional here to see if /^https?:/.test(arg) and delegating to the new reader?

@MikeKovarik
Copy link
Owner

Hello, first of all, I'm sorry for the late response. There's never enough of free time to dedicate to open source projects :D. Secondly, thank you for the interest in the library.

Now about the issue. Wow, 2.5GB, that's a hella big way to show off the chunked reading right? Anyway, you're right. I tried very hard to keep exifr without any dependencies. Especially XMP was hard since other libraries depend on bloated (that's not strictly bad, they might be well tested, but enormous for use in browsers) XML parsing libraries. And then there's a case with zlib which is required for ICC in PNG. In nodejs, exifr tries to use builtin zlib library, whereas in browser it just doesn't parse ICC at all. But that's off topic. Though I've been thinking of an API with which you could pass your own zlib (in browser) or xml parser (if you don't like exifr's basic implementation) for exifr to use. This could be the way to go without hardcoding node-fetch or axios.

I'd be very happy to accept PR and I'll be happy to help you along.

Where to hook in: Yes, good find. Something like this I think.

function readString(arg, options) {
	if (isBase64Url(arg))
		return callReaderClass(arg, options, 'base64')
	else if (platform.browser)
		return callReader(arg, options, 'url', fetchUrlAsArrayBuffer)
	else if (ARG IS URL && options.customFetch)
		return callReader(arg, options, 'url')
	else if (platform.node)
		return callReaderClass(arg, options, 'fs')
	else
		throwError(INVALID_INPUT)
}

You can ignore fetchUrlAsArrayBuffer in the reader.mjs since that is just a fallback for fetching the whole file if UrlFetcher

and then the UrlFetcher could look something like this

export class UrlFetcher extends ChunkedReader {

	async readWhole() {
		this.chunked = false
		let arrayBuffer
		if (this.options.customFetch)
			arrayBuffer = await this.options.customFetch(this.input)
		else
			arrayBuffer = await fetchUrlAsArrayBuffer(this.input)
		this._swapArrayBuffer(arrayBuffer)
	}

	async _readChunk(offset, length) {
		let end = length ? offset + length - 1 : undefined
		// note: end in http range is inclusive, unlike APIs in node,
		let headers = {}
		if (offset || end) headers.range = `bytes=${[offset, end].join('-')}`
		let res
		if (this.options.customFetch)
			res = await this.options.customFetch(this.input, {headers})
		else
			res = await fetch(this.input, {headers})
		let abChunk = await res.arrayBuffer()
		let bytesRead = abChunk.byteLength
		if (res.status === 416) return undefined
		if (bytesRead !== length) this.size = offset + bytesRead
		return this.set(abChunk, offset, true)
	}

}

Perhaps let abChunk = await res.arrayBuffer() should be part of the else branch. And obviously there will be a different way to get the status code and bytesRead.
Also I'm not that happy with customFetch :D I hope we can come up with some better name.

Anyway. This is just a brainstorming and a pseudo code of where to start and there's a lot more work. Looking forward to collaborating with you.

@mbklein
Copy link
Author

mbklein commented Mar 8, 2021

Thanks for the thorough response! I hope my teammates and I will have a chance to dive into this soon.

@mbklein
Copy link
Author

mbklein commented Mar 9, 2021

Hi @MikeKovarik,

I started working with what you described above, but then I realized the potential for expanding the “bring your own reader” concept and ended up with this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants