Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why are torrents hashed by URL? #14

Open
deckar01 opened this issue Oct 25, 2016 · 11 comments
Open

Why are torrents hashed by URL? #14

deckar01 opened this issue Oct 25, 2016 · 11 comments

Comments

@deckar01
Copy link

A patched version of WebTorrent is currently being used so that the torrent can be discovered using the hash of the URL instead of the hash of the contents of the resource. This allows peers to seed arbitrary content including stale copies of content and malicious payloads. Although the content is verified before being used, it doesn't seem necessary to track these torrents by URL hash and download invalid content.

Theoretically the only other piece of information the client needs to construct the standard info hash is the content length.

@goatandsheep
Copy link

They aren't really hashed by URL; they are hashed regularly and hashes are associated with a URL. You can change that when you need to update the content.

@deckar01
Copy link
Author

deckar01 commented Oct 26, 2016

The torrents are being seeded with an info hash based on the URL and not the content. As far as I am aware, the torrent protocol is not designed to be used in this way. This bypasses the security provided by the torrent hashing protocol.

The info hash is not just a way of discovering a resource, it is used by peers to verify the integrity of the torrent's content. I don't think torrent clients are designed to seed torrents if the info hash fails verification.

@goatandsheep
Copy link

nono that's not it at all. you have a security.js file that associates urls with different infohashes. For that reason, you can change the infohash. And it uses the default WebTorrent trackers, so no need for the full magnet link.

@deckar01
Copy link
Author

deckar01 commented Oct 26, 2016

Have you looked at the implementation? It is hashing the URL and treating it like the info hash in magnet URI. It only uses the hash of the content provided in security.js to throw away all the stale/invalid files that get downloaded.

sha(page_link.href, function(result){
  ...
    var magnet = 'magnet:?xt=urn:btih:'+result+ ...

https://github.com/guerrerocarlos/CacheP2P/blob/master/index.js#L60

@goatandsheep
Copy link

@deckar01 no, I haven't. I was simply making an assumption because this was the first question I asked when this came out. Oh boy that's no good:
#2

@deckar01
Copy link
Author

The torrented files do get verified against the content hash that the server provides, so although you download potentially malicious content, the browser never uses it. I am mainly concerned about normal torrent clients rejecting these files because their info hash is invalid.

@DanielSidhion
Copy link
Contributor

I believe torrents were hashed by the URL initially because it was way easier for people to set CacheP2P up (just throw the script in the page and you're done). Until recently, the code didn't even check against the security hash to throw away the downloaded invalid content.

Given the introduced changes to check against the security hash, I believe the amount of work for setting CacheP2P up will be the same if torrents go back to being hashed by content. CacheP2P can just look at the security hash (which already is a hash of the content) to seed and download torrents. There's no need to hash by the URL anymore.

I'd be willing to work on a PR for this if nobody else is interested. Might take some time, though.

@deckar01
Copy link
Author

@DanielSidhion I have been working on dropping the patched WebTorrent dependency for the last couple days. I also refactored it heavily with caching other types of assets in mind.

The library would probably need to provide a command line utility that generates the info hashes for the developer to ensure that it will always match the value that the client computes.

@DanielSidhion
Copy link
Contributor

That's great! I'll let you continue the work then.

As for the command line utility, I believe it would be better to have a web crawler to traverse all cacheable contents and build the hashes. That way, it's as independent from other frameworks as possible.

@deckar01
Copy link
Author

deckar01 commented Oct 27, 2016

I renamed my fork Belafonte. https://github.com/deckar01/belafonte

I was able to get the page content to produce a deterministic info hash by supplying the "name" and "creationDate" options to the seed() function. I used the URL for the name, but the server has to supply the creation date and use the same scheme when generating the info hashes.

screen shot 2016-10-26 at 10 16 03 pm

I am concerned that document.documentElement.innerHTML may not be deterministic between browsers though.

@deckar01
Copy link
Author

deckar01 commented Nov 2, 2016

I am working on integration tests, then I will open an MR for fixing the info hashes and the command line utility for generating them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants