Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java keyExists() has an incorrect behaviour in most cases #12921

Open
alim-akbashev opened this issue Aug 8, 2024 · 8 comments · May be fixed by #12935
Open

Java keyExists() has an incorrect behaviour in most cases #12921

alim-akbashev opened this issue Aug 8, 2024 · 8 comments · May be fixed by #12935

Comments

@alim-akbashev
Copy link

alim-akbashev commented Aug 8, 2024

The method keyExists() returns correct result only when data exist in a block cache.
It seems that key_exists_helper at rocksjni.cc has a bug due to a side-effect of c++ impl of KeyMayExist which changes the state of read_options at

roptions.read_tier = kBlockCacheTier; // read from block cache only

So consequent call of db->Get(read_opts, ...) looks up only through a block cache tier too:

  const bool may_exist =
      db->KeyMayExist(read_opts, cf_handle, key_slice, &value, &value_found); // <--- here in read_opts.read_tier become kBlockCacheTier

  if (may_exist) {
    ROCKSDB_NAMESPACE::Status s;
    {
      ROCKSDB_NAMESPACE::PinnableSlice pinnable_val;
      s = db->Get(read_opts, cf_handle, key_slice, &pinnable_val);  // <---- here we reuse read_opts with read_tier=kBlockCacheTier
    }

UPDATE: probably my assumption on the root cause is wrong. It seems that keyMayExist() just does not work as expected (and advertised in docs):

... That is to say that this method is probabilistic and may return false positives, but never a false negative.

Actually it returns false negative all the time :( it only returns positive result if the data by key is in a data block cache.

Expected behavior

keyExists() should not produce false negative cases

Actual behavior

keyExists() returns false-negative result if data by a requested key is not in a block cache.

Steps to reproduce the behavior

            db.put("someKey".getBytes(), "someValue".getBytes());
            // ...
            // clear a block cache by restarting the app or any other way
            // ...
            assertTrue(db.keyExists("someKey".getBytes()));
rhubner added a commit to evolvedbinary/rocksdb that referenced this issue Aug 10, 2024
@rhubner
Copy link
Contributor

rhubner commented Aug 10, 2024

Hello @alim-akbashev and thanks for reaching our community,

I double check code and I didn't find any misbehavior. I also wrote small test to verify your concerns.

The roptions.read_tier = kBlockCacheTier; is not problem. As you can see, on line above we do a copy of ReadOptions. Even the read options passed to the function are marked as cons (const ReadOptions& read_options), so it's impossible to change them.(Unless you do some dark C++ magic).

keyMayExist - it's littble bit confusing, let explain in table :

Return value Key in database
true Yes or No
false Not

Originally this function used BloomFilter, but today it may be replaced with RibbonFilter or for small databases it may use cache.

Radek

@alim-akbashev
Copy link
Author

hi @rhubner , thanks for your reply

Yea, seems my first assumption on the reason of the behaviour was wrong.

But i still have concerns regarding the current behaviour.

keyMayExist - it's littble bit confusing, let explain in table :

Return value Key in database
true Yes or No
false Not

If it would be so, that would not be an issue for me. But unfortunately it does not behave so.

I double check code and I didn't find any misbehavior. I also wrote small test to verify your concerns.

I believe, the test will fail if we add re-opening of a db between put and keyExists/keyMayExist (to clear all kind of possible caches). At least exactly such behaviour i do see using a java code - keyExists/keyMayExist returns false when the key does exist in database. But if I call get(key) first, keyExists/keyMayExist starts returning a correct result true for this key - my guess is that happens because the data by key (or just an index block?) is in a block cache at that moment.

@rhubner
Copy link
Contributor

rhubner commented Aug 10, 2024

Hi @alim-akbashev ,

believe, the test will fail if we add re-opening of a db between put and keyExists/keyMayExist

That's what I did

Feel free to update the test. Maybe I missed something. I also try to compact DB or flush WAL, but it didn't have any effect on the test.

Radek

@alim-akbashev
Copy link
Author

yea, i'm so sorry, @rhubner , my bad i didn't double-check the steps-to-reproduce.

it turned out, that to reproduce 1. it's required to fill a db with much more data 2. it's required to restart the app, just closing and opening the db again does not shows the issue.
i'm still not sure why it happens and why these conditions are required, this time i won't make any assumptions :)

anyways, i've created a demo projects which 100% reproduces the issue (at least in my environment) https://github.com/alim-akbashev/keyExistsBugDemo

it consist of two mini-apps:

  • FillDbApp creates a db at /tmp/rocksdb-test, puts a test key + 5M other random keys
  • TryTheKeyApp calls keyExists() on a test key

hope it helps in investigation.

@rhubner
Copy link
Contributor

rhubner commented Aug 11, 2024

Hello @alim-akbashev ,

thanks for the example. I can replicate your behavior on my machine.

I think it's the key_maye_exist what return wrong result. But this happen only for TtlDB. For normal RocksDB I'm getting correct results. Maybe different implementations? Let me investigate more.

Radek

@rhubner
Copy link
Contributor

rhubner commented Aug 12, 2024

Hello @alim-akbashev ,

I did some debbuging over weeked and today and move little bit further. I will be talking primary about DBWithTTLImpl::KeyMayExist

The first call db_->KeyMayExist return true for key which is in DB, but then the second if statement fail. See bellow the output of two calls :

ret : 1
value_found : 1
value.size : 0
SanityCheckTimestamp : Corruption: Error: value's length less than timestamp's
StripTS(value) : Corruption: Bad timestamp in key-value

What's strange, that value_found is 1(true) but there are no data in value.
I still don't know why this is happening, but I will continue investigating.

Radek

@alim-akbashev
Copy link
Author

alim-akbashev commented Aug 12, 2024

hello @rhubner , thanks for your efforts.

here's a quick update from my side - I've updated the demo app, now it contains only settings which really affect keyExists/keyMayExist behaviour + now it is a single app: https://github.com/alim-akbashev/keyExistsBugDemo/blob/main/src/main/java/akbashev/rocksdb/MinimalStepsToReproduceApp.java :

  • the only option which makes keyExists fail is .setCacheIndexAndFilterBlocks(true)
  • it does not depend on amount of data in a table - now the demo app reproduces the issue with a single record in a db. Millions records were required to trigger a compaction, now it's triggered by the code instead.
  • checking key existence with another process is not required as well - re-opening with a fresh instance of LRUCache is enough

So now I more or less certain that the behaviour depends on the index and if a db is configured to place indices in a block cache instead of heap (am I right that by default a bloom filter and an index are at heap and they are preloaded during opening a db?), keyMayExist does not trigger loading the corresponding index to a cache as get() does. Probably it's understandable since keyMayExists intended to be a lightweight alternative for get() with zero IO. If so, maybe it's rather an undocumented specific limitation than a bug...

Thanks,
Alim

@rhubner
Copy link
Contributor

rhubner commented Aug 12, 2024

Hello @alim-akbashev ,

I think I found it. In DBImpl::KeyMayExist we always set *value_found = true;.

Then at the end, you can see that the method return true, even when it's status is IsIncomplete.

I think the method should have something like this at the end :

 if(value_found != nullptr && s.IsIncomplete()) {
    *value_found = false;
  }

or line *value_found = true; should be removed.

@pdillinger What do you think about it ? Unfortunately I'm not very familiar with this part of RocksDB codebase. After the proposed change, ttl_test and bloom_test can run without problem.

Radek

cc: @adamretter

rhubner added a commit to evolvedbinary/rocksdb that referenced this issue Aug 14, 2024
rhubner added a commit to evolvedbinary/rocksdb that referenced this issue Aug 29, 2024
rhubner added a commit to evolvedbinary/rocksdb that referenced this issue Sep 27, 2024
rhubner added a commit to evolvedbinary/rocksdb that referenced this issue Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants