Why are torrents hashed by URL? #14

deckar01 · 2016-10-25T04:01:58Z

A patched version of WebTorrent is currently being used so that the torrent can be discovered using the hash of the URL instead of the hash of the contents of the resource. This allows peers to seed arbitrary content including stale copies of content and malicious payloads. Although the content is verified before being used, it doesn't seem necessary to track these torrents by URL hash and download invalid content.

Theoretically the only other piece of information the client needs to construct the standard info hash is the content length.

goatandsheep · 2016-10-26T15:32:39Z

They aren't really hashed by URL; they are hashed regularly and hashes are associated with a URL. You can change that when you need to update the content.

deckar01 · 2016-10-26T16:26:24Z

The torrents are being seeded with an info hash based on the URL and not the content. As far as I am aware, the torrent protocol is not designed to be used in this way. This bypasses the security provided by the torrent hashing protocol.

The info hash is not just a way of discovering a resource, it is used by peers to verify the integrity of the torrent's content. I don't think torrent clients are designed to seed torrents if the info hash fails verification.

goatandsheep · 2016-10-26T16:29:50Z

nono that's not it at all. you have a security.js file that associates urls with different infohashes. For that reason, you can change the infohash. And it uses the default WebTorrent trackers, so no need for the full magnet link.

deckar01 · 2016-10-26T16:34:56Z

Have you looked at the implementation? It is hashing the URL and treating it like the info hash in magnet URI. It only uses the hash of the content provided in security.js to throw away all the stale/invalid files that get downloaded.

sha(page_link.href, function(result){
  ...
    var magnet = 'magnet:?xt=urn:btih:'+result+ ...

https://github.com/guerrerocarlos/CacheP2P/blob/master/index.js#L60

goatandsheep · 2016-10-26T16:37:27Z

@deckar01 no, I haven't. I was simply making an assumption because this was the first question I asked when this came out. Oh boy that's no good:
#2

deckar01 · 2016-10-26T17:06:40Z

The torrented files do get verified against the content hash that the server provides, so although you download potentially malicious content, the browser never uses it. I am mainly concerned about normal torrent clients rejecting these files because their info hash is invalid.

DanielSidhion · 2016-10-26T20:34:50Z

I believe torrents were hashed by the URL initially because it was way easier for people to set CacheP2P up (just throw the script in the page and you're done). Until recently, the code didn't even check against the security hash to throw away the downloaded invalid content.

Given the introduced changes to check against the security hash, I believe the amount of work for setting CacheP2P up will be the same if torrents go back to being hashed by content. CacheP2P can just look at the security hash (which already is a hash of the content) to seed and download torrents. There's no need to hash by the URL anymore.

I'd be willing to work on a PR for this if nobody else is interested. Might take some time, though.

deckar01 · 2016-10-26T21:25:36Z

@DanielSidhion I have been working on dropping the patched WebTorrent dependency for the last couple days. I also refactored it heavily with caching other types of assets in mind.

The library would probably need to provide a command line utility that generates the info hashes for the developer to ensure that it will always match the value that the client computes.

DanielSidhion · 2016-10-26T21:56:16Z

That's great! I'll let you continue the work then.

As for the command line utility, I believe it would be better to have a web crawler to traverse all cacheable contents and build the hashes. That way, it's as independent from other frameworks as possible.

deckar01 · 2016-10-27T13:27:13Z

I renamed my fork Belafonte. https://github.com/deckar01/belafonte

I was able to get the page content to produce a deterministic info hash by supplying the "name" and "creationDate" options to the seed() function. I used the URL for the name, but the server has to supply the creation date and use the same scheme when generating the info hashes.

I am concerned that document.documentElement.innerHTML may not be deterministic between browsers though.

deckar01 · 2016-11-02T15:05:34Z

I am working on integration tests, then I will open an MR for fixing the info hashes and the command line utility for generating them.

deckar01 mentioned this issue Oct 28, 2016

Security hashes are not verified when receiving files #7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why are torrents hashed by URL? #14

Why are torrents hashed by URL? #14

deckar01 commented Oct 25, 2016

goatandsheep commented Oct 26, 2016

deckar01 commented Oct 26, 2016 •

edited

Loading

goatandsheep commented Oct 26, 2016

deckar01 commented Oct 26, 2016 •

edited

Loading

goatandsheep commented Oct 26, 2016

deckar01 commented Oct 26, 2016

DanielSidhion commented Oct 26, 2016

deckar01 commented Oct 26, 2016

DanielSidhion commented Oct 26, 2016

deckar01 commented Oct 27, 2016 •

edited

Loading

deckar01 commented Nov 2, 2016

Why are torrents hashed by URL? #14

Why are torrents hashed by URL? #14

Comments

deckar01 commented Oct 25, 2016

goatandsheep commented Oct 26, 2016

deckar01 commented Oct 26, 2016 • edited Loading

goatandsheep commented Oct 26, 2016

deckar01 commented Oct 26, 2016 • edited Loading

goatandsheep commented Oct 26, 2016

deckar01 commented Oct 26, 2016

DanielSidhion commented Oct 26, 2016

deckar01 commented Oct 26, 2016

DanielSidhion commented Oct 26, 2016

deckar01 commented Oct 27, 2016 • edited Loading

deckar01 commented Nov 2, 2016

deckar01 commented Oct 26, 2016 •

edited

Loading

deckar01 commented Oct 26, 2016 •

edited

Loading

deckar01 commented Oct 27, 2016 •

edited

Loading