First dumb of logic to get route index running on the journal #3743

dcapwell · 2024-12-13T21:29:29Z

No description provided.

…ndomSchemaTest

dcapwell · 2024-12-14T00:57:43Z

src/java/org/apache/cassandra/service/accord/AccordCache.java

@@ -1190,7 +1190,7 @@ public Runnable save(AccordCommandStore commandStore, TxnId txnId, @Nullable Com
                    return null;
            }

-            return commandStore.appendToKeyspace(txnId, value);
+            return null;


to lower the diff I didn't want to rewrite the whole cache logic... null is the only thing possible to return now

dcapwell · 2024-12-14T01:02:30Z

src/java/org/apache/cassandra/service/accord/AccordSegmentCompactor.java

@@ -126,6 +126,7 @@ public Collection<StaticSegment<JournalKey, V>> compact(Collection<StaticSegment
                    while ((advanced = reader.advance()) && reader.key().equals(key));

                    if (advanced) readers.offer(reader); // there is more to this reader, but not with this key
+                    else reader.close();


this leak detection was caught by org.apache.cassandra.index.accord.RouteIndexTest#test. The fs would run out of space and when you looked it had very few files and only ~40mb of data, yet 4g was allocated! The root cause was that each time we open a reader we create a new channel and didn't close it, so we could never purge segments.

dcapwell · 2024-12-14T01:02:59Z

src/java/org/apache/cassandra/service/accord/AccordService.java

@@ -248,6 +248,8 @@ public static IVerbHandler<? extends Reply> responseHandlerOrNoop()

    public synchronized static void startup(NodeId tcmId)
    {
+        if (instance != null)


safety check, I hit an issue and saw we didn't detect this so just added to be safer.

dcapwell · 2024-12-14T01:04:17Z

src/java/org/apache/cassandra/service/accord/AccordService.java

-                durableBefores.put(safeStore.commandStore().id(), safeStore.durableBefore());
-            }
-        }));
+        if (node.commandStores().all().length > 0)


in my tests we are testing journal but we don't have any tables marked for accord, so we had 0 stores... compaction tests were also impacted as we could have data but due to us dropping accord we no longer have stores

dcapwell · 2024-12-14T01:06:04Z

src/java/org/apache/cassandra/service/accord/IndexRange.java

+import org.apache.cassandra.utils.ByteArrayUtil;
+import org.apache.cassandra.utils.ObjectSizes;
+
+public class IndexRange implements Comparable<IndexRange>, IMeasurableMemory


this was just extracted from the previous impl and put to the top level. With journal and tables both touching this, it was easier to make top level rather than an inner class

dcapwell · 2024-12-14T01:07:08Z

test/unit/org/apache/cassandra/concurrent/ForwardingExecutorFactory.java

+
+package org.apache.cassandra.concurrent;
+
+public class ForwardingExecutorFactory implements ExecutorFactory


used by RouteIndexTest to make the close/release threads immediately run rather than async, this avoided race condition issues with validation

dcapwell · 2024-12-14T01:11:51Z

test/unit/org/apache/cassandra/dht/IPartitionerTest.java

@@ -93,7 +94,9 @@ private boolean isTestType(Class<? extends IPartitioner> klass)
    @Test
    public void byteCompareSerde()
    {
-        qt().forAll(AccordGenerators.fromQT(CassandraGenerators.token())).check(token -> {
+        // make sure to use simplify as local partitioner can have a type that could generate data too large causing this test to be flakey
+        Gen<Token> qt = CassandraGenerators.partitioners().flatMap(p -> CassandraGenerators.token(CassandraGenerators.simplify(p)));


flakey test during my first CI run. LocalPartitioner can have very complex type, which leads to tokens too large causing the test to be flakey

dcapwell · 2024-12-14T01:12:44Z

test/unit/org/apache/cassandra/utils/CassandraGeneratorsTest.java

@@ -54,7 +54,7 @@ public void partitionerToToken()
    @Test
    public void partitionerKeys()
    {
-        qt().forAll(Gens.random(), toGen(CassandraGenerators.partitioners()))
+        qt().forAll(Gens.random(), toGen(CassandraGenerators.partitioners().map(CassandraGenerators::simplify)))


flakey test during my first CI run. LocalPartitioner can have very complex type, which leads to tokens too large causing the test to be flakey

…ly isnt working

… to avoid complicated tracking of close needed to support writing to a closed channel

dcapwell · 2024-12-16T23:59:44Z

src/java/org/apache/cassandra/journal/Journal.java

@@ -939,6 +985,8 @@ private StaticSegmentIterator()
                StaticSegment.KeyOrderReader<K> reader = staticSegment.keyOrderReader();
                if (reader.advance())
                    this.readers.add(reader);
+                else
+                    reader.close();


file leak found in testing

dcapwell · 2024-12-16T23:59:47Z

src/java/org/apache/cassandra/journal/Journal.java

@@ -962,6 +1010,8 @@ public void readAllForKey(K key, RecordConsumer<K> reader)
                reader.accept(next.descriptor.timestamp, next.offset, next.key(), next.record(), next.hosts(), next.descriptor.userVersion);
                if (next.advance())
                    readers.add(next);
+                else
+                    next.close();


file leak found in testing

dcapwell · 2024-12-17T00:04:13Z

test/unit/org/apache/cassandra/io/filesystem/ListenableFileSystem.java

@@ -796,18 +832,30 @@ else if (mode == MapMode.READ_WRITE)
                    long pos = position;
                    try
                    {
-                        while (local.hasRemaining())
+                        // the channel could be closed... so always create a new channel to avoid this problem
+                        try (FileChannel channel = provider().newFileChannel(path, Set.of(StandardOpenOption.WRITE)))


I am finding cases where the journal file is deleted before we finish writing to it... which is fine from a FS point of view... so to better handle that without failing due to the channel being closed, I moved to always opening a new channel for the write... if the file doesn't exist we just no-op (which is the behavior on real FS anyways

dcapwell · 2024-12-17T00:05:06Z

src/java/org/apache/cassandra/service/accord/AccordSegmentCompactor.java

@@ -67,6 +67,8 @@ public Collection<StaticSegment<JournalKey, V>> compact(Collection<StaticSegment
            KeyOrderReader<JournalKey> reader = segment.keyOrderReader();
            if (reader.advance())
                readers.add(reader);
+            else
+                reader.close();


another leak detected in the test

dcapwell · 2024-12-17T00:07:22Z

src/java/org/apache/cassandra/journal/Journal.java

@@ -906,8 +941,19 @@ private String maybeAddDiskSpaceContext(String message)
    @VisibleForTesting
    public void truncateForTesting()
    {
-        advanceSegment(null);
-        segments.set(Segments.none());
+        ActiveSegment<?, ?> discarding = currentSegment;


I still don't feel this is safe... but its been stable for me...

First dumb of logic to get route index running on the journal

9e50092

dcapwell requested a review from ifesdjeen December 13, 2024 21:29

dcapwell added 7 commits December 13, 2024 13:37

import

f7d48c2

move CQLTester.prePrepareServer after the fs change as that breaks Ra…

da965d4

…ndomSchemaTest

fixed a flakey tests

ec48878

detect null route

4716914

fixed flakey tests

95c768d

cleanup

c9b506d

cleanup

61f2305

dcapwell commented Dec 14, 2024

View reviewed changes

delete dead code

f76b0ff

dcapwell commented Dec 14, 2024

View reviewed changes

mimize diff

acfff73

dcapwell commented Dec 14, 2024

View reviewed changes

dcapwell added 11 commits December 16, 2024 11:11

testing out restarting accord

296ef1f

bump accord

fcbc661

fixed the bug with the test and restart. Now the seed is deteting rep…

b6fa3b0

…ly isnt working

restart is now working, but there is a file leak

37355ee

fixed typo

53d7c95

when using mmap for write open a new file check when we sync, this is…

9495214

… to avoid complicated tracking of close needed to support writing to a closed channel

fixed resource leak

4b7416e

fixed file leak

628a6d5

allow multiple restarts and added back emptyCompactionInfo

558764b

imports

2881a37

import and lower examples

6eb8341

code reuse

d825018

dcapwell commented Dec 16, 2024

View reviewed changes

dcapwell added 3 commits December 16, 2024 16:00

remove debug

23858dd

remove

0e13943

cleanup

4526549

dcapwell commented Dec 17, 2024

View reviewed changes

typo

ef308fc

dcapwell commented Dec 17, 2024

View reviewed changes

remove seed

bbbf7f3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First dumb of logic to get route index running on the journal #3743

First dumb of logic to get route index running on the journal #3743

dcapwell commented Dec 13, 2024

dcapwell Dec 14, 2024

dcapwell Dec 14, 2024

dcapwell Dec 14, 2024

dcapwell Dec 14, 2024

dcapwell Dec 14, 2024

dcapwell Dec 14, 2024

dcapwell Dec 14, 2024

dcapwell Dec 14, 2024

dcapwell Dec 16, 2024

dcapwell Dec 16, 2024

dcapwell Dec 17, 2024

dcapwell Dec 17, 2024

dcapwell Dec 17, 2024


		package org.apache.cassandra.concurrent;

		public class ForwardingExecutorFactory implements ExecutorFactory

First dumb of logic to get route index running on the journal #3743

Are you sure you want to change the base?

First dumb of logic to get route index running on the journal #3743

Conversation

dcapwell commented Dec 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment