Skip to content
This repository has been archived by the owner on Apr 25, 2021. It is now read-only.

deduplicate - code for bloom filter is not enough #2

Open
kzimnicki opened this issue Jan 15, 2017 · 1 comment
Open

deduplicate - code for bloom filter is not enough #2

kzimnicki opened this issue Jan 15, 2017 · 1 comment

Comments

@kzimnicki
Copy link

Hi

I've checked your code using bloom filters:
https://github.com/janschultecom/akvokolekta/blob/master/src/main/scala/com/janschulte/akvokolekta/impl/EnhancedSource.scala#L23

Is it working correctly ? You are using Bloom Filters which says:

  • if bloom filter return false then element is definitely not in the Set.
  • if bloom filter return true then element might be in Set.

If I understand this correctly then here:
https://github.com/janschultecom/akvokolekta/blob/master/src/main/scala/com/janschulte/akvokolekta/impl/EnhancedSource.scala#L27
you should check some Set and return true or false if Set contains this element.

@wmaroy
Copy link

wmaroy commented Sep 23, 2017

Hi @janschultecom

I also agree on this. A bloom filter would not be enough. Some other check should be done when the filter returns "possibly in the set". When not memory bound some set and if memory bound a disk access for example.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants