Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexOutOfBoundsException when accessing partition where the column was deleted #3731

Open
wants to merge 4 commits into
base: trunk
Choose a base branch
from

Conversation

sunil9977
Copy link
Contributor

  1. Added a null check for foundValue before processing it.
  2. Added appropriate test case for the same.

@bbotella
Copy link
Contributor

bbotella commented Dec 9, 2024

Please add a Jira ticket to the patch.

@sunil9977
Copy link
Contributor Author

@@ -705,7 +705,7 @@ public boolean isSatisfiedBy(TableMetadata metadata, DecoratedKey partitionKey,
if (column.type.isCounter())
{
ByteBuffer foundValue = getValue(metadata, partitionKey, row);
if (foundValue == null)
if (foundValue == null || foundValue.remaining() == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this patch is missing org.apache.cassandra.db.filter.RowFilter.MapElementExpression#isSatisfiedBy as well.

Rather than updating every code path, why not push this logic into getValue?

In getValue we can do

default:
                    Cell<?> cell = row.getCell(column);
                    if (cell == null) return null;
                    ByteBuffer bb = cell.buffer();
                    return bb.hasRemaining() ? bb : null;

with that one change all 4 code paths impacted are now seeing null (which they already handle)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also curious why we have empty bytes rather than null in this case... this implies that the pattern of

Cell<?> cell = row.getCell(column);
return cell == null ? null : cell.buffer();

is unsafe and has always been wrong... yet this pattern is common...

$ grep -r 'cell == null ? null : cell.buffer()' src/
src//java/org/apache/cassandra/db/marshal/MapType.java:            return cell == null ? null : cell.buffer();
src//java/org/apache/cassandra/db/marshal/ListType.java:            return cell == null ? null : cell.buffer();
src//java/org/apache/cassandra/db/marshal/UserType.java:            return cell == null ? null : cell.buffer();
src//java/org/apache/cassandra/db/filter/RowFilter.java:                    return cell == null ? null : cell.buffer();
src//java/org/apache/cassandra/index/internal/CassandraIndex.java:                               cell == null ? null : cell.buffer()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at callers of org.apache.cassandra.db.rows.Cell#buffer there are 62+ code paths that are similar pattern...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looked into it... we do org.apache.cassandra.cql3.terms.Constants.Deleter when we delete a column. This is defined as

builder.addCell(BufferCell.tombstone(column, timestamp, nowInSec, path));

which is

public static BufferCell tombstone(ColumnMetadata column, long timestamp, long nowInSec, CellPath path)
{
    return new BufferCell(column, timestamp, NO_TTL, nowInSec, ByteBufferUtil.EMPTY_BYTE_BUFFER, path);
}

so delete column is defined as write empty bytes... that explains the empty bytes at least

@@ -529,7 +530,7 @@ protected ByteBuffer getValue(TableMetadata metadata, DecoratedKey partitionKey,
return row.clustering().bufferAt(column.position());
default:
Cell<?> cell = row.getCell(column);
return cell == null ? null : cell.buffer();
return Cell.getValidCellBuffer(cell, nowInSeconds());
Copy link
Contributor

@dcapwell dcapwell Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, been bad about getting back to this... im not a fan of nowInSeconds being called here as we should really do this once per query... im trying to reviewer the higher level calling code to see if there is something we can do to improve this...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org.apache.cassandra.db.filter.RowFilter#filter(org.apache.cassandra.schema.TableMetadata, long)

we are passed in a nowInSeconds! It looks like we can refactor the caller to pass this value in, that way we don't call nowInSeconds for every cell (this can become a problem)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once we plumb nowInSeconds from the caller through here, then I am +1 to this patch

@dcapwell
Copy link
Contributor

I applied the patch to the branch that found the problem, and applied the feedback I gave and the test that found the issue is passing now!

Here is the modified version of this PR with my feedback applied

diff --git a/src/java/org/apache/cassandra/db/filter/RowFilter.java b/src/java/org/apache/cassandra/db/filter/RowFilter.java
index 9ea44ead96..ce96e8daa0 100644
--- a/src/java/org/apache/cassandra/db/filter/RowFilter.java
+++ b/src/java/org/apache/cassandra/db/filter/RowFilter.java
@@ -224,7 +224,7 @@ public class RowFilter implements Iterable<RowFilter.Expression>
 
                 // Short-circuit all partitions that won't match based on static and partition keys
                 for (Expression e : partitionLevelExpressions)
-                    if (!e.isSatisfiedBy(metadata, partition.partitionKey(), partition.staticRow()))
+                    if (!e.isSatisfiedBy(metadata, partition.partitionKey(), partition.staticRow(), nowInSec))
                     {
                         partition.close();
                         return null;
@@ -251,7 +251,7 @@ public class RowFilter implements Iterable<RowFilter.Expression>
                     return null;
 
                 for (Expression e : rowLevelExpressions)
-                    if (!e.isSatisfiedBy(metadata, pk, purged))
+                    if (!e.isSatisfiedBy(metadata, pk, purged, nowInSec))
                         return null;
 
                 return row;
@@ -303,7 +303,7 @@ public class RowFilter implements Iterable<RowFilter.Expression>
 
         for (Expression e : expressions)
         {
-            if (!e.isSatisfiedBy(metadata, partitionKey, purged))
+            if (!e.isSatisfiedBy(metadata, partitionKey, purged, nowInSec))
                 return false;
         }
         return true;
@@ -522,9 +522,9 @@ public class RowFilter implements Iterable<RowFilter.Expression>
          * (i.e. it should come from a RowIterator).
          * @return whether the row is satisfied by this expression.
          */
-        public abstract boolean isSatisfiedBy(TableMetadata metadata, DecoratedKey partitionKey, Row row);
+        public abstract boolean isSatisfiedBy(TableMetadata metadata, DecoratedKey partitionKey, Row row, long nowInSec);
 
-        protected ByteBuffer getValue(TableMetadata metadata, DecoratedKey partitionKey, Row row)
+        protected ByteBuffer getValue(TableMetadata metadata, DecoratedKey partitionKey, Row row, long nowInSec)
         {
             switch (column.kind)
             {
@@ -536,7 +536,7 @@ public class RowFilter implements Iterable<RowFilter.Expression>
                     return row.clustering().bufferAt(column.position());
                 default:
                     Cell<?> cell = row.getCell(column);
-                    return cell == null ? null : cell.buffer();
+                    return Cell.getValidCellBuffer(cell, nowInSec);
             }
         }
 
@@ -697,7 +697,8 @@ public class RowFilter implements Iterable<RowFilter.Expression>
             super(column, operator, value);
         }
 
-        public boolean isSatisfiedBy(TableMetadata metadata, DecoratedKey partitionKey, Row row)
+        @Override
+        public boolean isSatisfiedBy(TableMetadata metadata, DecoratedKey partitionKey, Row row, long nowInSec)
         {
             // We support null conditions for LWT (in ColumnCondition) but not for RowFilter.
             // TODO: we should try to merge both code someday.
@@ -711,7 +712,7 @@ public class RowFilter implements Iterable<RowFilter.Expression>
                 // representation. See CASSANDRA-11629
                 if (column.type.isCounter())
                 {
-                    ByteBuffer foundValue = getValue(metadata, partitionKey, row);
+                    ByteBuffer foundValue = getValue(metadata, partitionKey, row, nowInSec);
                     if (foundValue == null)
                         return false;
 
@@ -721,7 +722,7 @@ public class RowFilter implements Iterable<RowFilter.Expression>
                 else
                 {
                     // Note that CQL expression are always of the form 'x < 4', i.e. the tested value is on the left.
-                    ByteBuffer foundValue = getValue(metadata, partitionKey, row);
+                    ByteBuffer foundValue = getValue(metadata, partitionKey, row, nowInSec);
                     return foundValue != null && operator.isSatisfiedBy(column.type, foundValue, value);
                 }
             }
@@ -736,7 +737,7 @@ public class RowFilter implements Iterable<RowFilter.Expression>
                 }
                 else
                 {
-                    ByteBuffer foundValue = getValue(metadata, partitionKey, row);
+                    ByteBuffer foundValue = getValue(metadata, partitionKey, row, nowInSec);
                     return foundValue != null && operator.isSatisfiedBy(column.type, foundValue, value);
                 }
             }
@@ -821,7 +822,7 @@ public class RowFilter implements Iterable<RowFilter.Expression>
             return CompositeType.build(ByteBufferAccessor.instance, key, value);
         }
 
-        public boolean isSatisfiedBy(TableMetadata metadata, DecoratedKey partitionKey, Row row)
+        public boolean isSatisfiedBy(TableMetadata metadata, DecoratedKey partitionKey, Row row, long nowInSec)
         {
             assert key != null;
             // We support null conditions for LWT (in ColumnCondition) but not for RowFilter.
@@ -839,7 +840,7 @@ public class RowFilter implements Iterable<RowFilter.Expression>
             }
             else
             {
-                ByteBuffer serializedMap = getValue(metadata, partitionKey, row);
+                ByteBuffer serializedMap = getValue(metadata, partitionKey, row, nowInSec);
                 if (serializedMap == null)
                     return false;
 
@@ -940,7 +941,7 @@ public class RowFilter implements Iterable<RowFilter.Expression>
         }
 
         // Filtering by custom expressions isn't supported yet, so just accept any row
-        public boolean isSatisfiedBy(TableMetadata metadata, DecoratedKey partitionKey, Row row)
+        public boolean isSatisfiedBy(TableMetadata metadata, DecoratedKey partitionKey, Row row, long nowInSec)
         {
             return true;
         }
diff --git a/src/java/org/apache/cassandra/db/rows/Cell.java b/src/java/org/apache/cassandra/db/rows/Cell.java
index 3ddfeae39a..f7ecb5c8fb 100644
--- a/src/java/org/apache/cassandra/db/rows/Cell.java
+++ b/src/java/org/apache/cassandra/db/rows/Cell.java
@@ -240,6 +240,26 @@ public abstract class Cell<V> extends ColumnData
                                            // timestamp on expiry.
     }
 
+    /**
+     * Validates a cell's liveliness, tombstone status, and buffer contents.
+     *
+     * @param cell         The cell to validate.
+     * @param nowInSeconds The current time in seconds.
+     * @return A ByteBuffer (including valid empty buffers) if valid, or null otherwise.
+     */
+    public static ByteBuffer getValidCellBuffer(Cell<?> cell, long nowInSeconds) {
+        if (cell == null || cell.isTombstone()) {
+            return null;
+        }
+
+        if (!cell.isLive(nowInSeconds)) {
+            return null;
+        }
+
+        // Allow valid empty buffers
+        return cell.buffer();
+    }
+
     /**
      * The serialization format for cell is:
      *     [ flags ][ timestamp ][ deletion time ][    ttl    ][ path size ][ path ][ value size ][ value ]
diff --git a/test/unit/org/apache/cassandra/index/sai/plan/OperationTest.java b/test/unit/org/apache/cassandra/index/sai/plan/OperationTest.java
index 8b0acaaf2b..81292cbda0 100644
--- a/test/unit/org/apache/cassandra/index/sai/plan/OperationTest.java
+++ b/test/unit/org/apache/cassandra/index/sai/plan/OperationTest.java
@@ -515,7 +515,7 @@ public class OperationTest
         }
 
         @Override
-        public boolean isSatisfiedBy(TableMetadata metadata, DecoratedKey partitionKey, Row row)
+        public boolean isSatisfiedBy(TableMetadata metadata, DecoratedKey partitionKey, Row row, long nowInSec)
         {
             throw new UnsupportedOperationException();
         }
diff --git a/test/unit/org/apache/cassandra/index/sasi/plan/OperationTest.java b/test/unit/org/apache/cassandra/index/sasi/plan/OperationTest.java
index 79c86b977f..d8bc539c71 100644
--- a/test/unit/org/apache/cassandra/index/sasi/plan/OperationTest.java
+++ b/test/unit/org/apache/cassandra/index/sasi/plan/OperationTest.java
@@ -651,7 +651,7 @@ public class OperationTest extends SchemaLoader
         }
 
         @Override
-        public boolean isSatisfiedBy(TableMetadata metadata, DecoratedKey partitionKey, Row row)
+        public boolean isSatisfiedBy(TableMetadata metadata, DecoratedKey partitionKey, Row row, long nowInSec)
         {
             throw new UnsupportedOperationException();
         }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants