Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sstable: new property collector API #3784

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

RaduBerinde
Copy link
Member

sstable: new property collector API

We simplify the property collector API: an instance now only needs to
maintain a single state, not three (data block, index, table). The
sstable writer uses multiple separate instances in parallel. The
instance for index blocks uses a new AddCollected API which adds
computed properties from the data blocks. Similarly, the table
properties are calculated in a separate instance using AddCollected
with the properties from the index blocks and the range key block.

This new interface will allow re-deriving index and table properties
for a partial file download which copies whole data blocks. In the
future, this would also allow copying entire data blocks during
compactions.

sstable: use sync.Pool for interval block collectors

@RaduBerinde RaduBerinde requested a review from jbowens July 22, 2024 19:30
@RaduBerinde RaduBerinde requested a review from a team as a code owner July 22, 2024 19:30
@cockroach-teamcity
Copy link
Member

This change is Reviewable

We simplify the property collector API: an instance now only needs to
maintain a single state, not three (data block, index, table). The
sstable writer uses multiple separate instances in parallel. The
instance for index blocks uses a new `AddCollected` API which adds
computed properties from the data blocks. Similarly, the table
properties are calculated in a separate instance using `AddCollected`
with the properties from the index blocks and the range key block.

This new interface will allow re-deriving index and table properties
for a partial file download which copies whole data blocks. In the
future, this would also allow copying entire data blocks during
compactions.
@RaduBerinde RaduBerinde force-pushed the interval-collector-new-api-and-pool branch from ada8da1 to b38f48b Compare July 22, 2024 19:47
Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 8 files at r1.
Reviewable status: 1 of 8 files reviewed, 3 unresolved discussions (waiting on @RaduBerinde)


sstable/block_property.go line 48 at r1 (raw file):

// derivable just from the properties of the data blocks it contains. Similarly,
// the table properties must be derivable just from the properties of the index
// blocks and range block.

I'm trying to think through whether we might regret this loss of power and expressibility. Even if we did eventually come to want to use some property that can't be hierarchical, I suppose it wouldn't be much effort to work it back in.


sstable/block_property.go line 125 at r1 (raw file):

	// error). SupportsSuffixReplacement() can be used to check if this method is
	// implemented.
	AddCollectedWithSuffixReplacement(oldProp []byte, newSuffix []byte) error

It seems we could further simplify the suffix replacement bit by making it a stateless function like

type BlockPropertySuffixTransform func(buf []byte, oldProp, newSuffix []byte) []byte

It feels a bit off to force BlockPropertyCollectors to know about suffix replacement through SupportsSuffixReplacement and AddCollectedWithSuffixReplacement. Instead, the caller initiating the suffix replacement could supply a map of collector names to transforms. We error out if a block property exists for which there is no supplied transform.


sstable/writer.go line 2047 at r1 (raw file):

				// Add range key properties.
				rangeKeyProp := w.blockPropCollectors[i].rangeKeyBlock.Finish(scratch)
				if err := tableCollector.AddCollected(rangeKeyProp); err != nil {

Is this right? My recollection is that range key properties and point key properties are separate. Eg, a table-level block property should not reflect the state of the contained range keys. The reason is because the point and range key iterators are separate, so the decision of whether or not to examine a particular file should be too. For this reason IterOptions takes separate block-property filters for point keys and range keys:

pebble/options.go

Lines 133 to 145 in 368636f

// PointKeyFilters can be used to avoid scanning tables and blocks in tables
// when iterating over point keys. This slice represents an intersection
// across all filters, i.e., all filters must indicate that the block is
// relevant.
//
// Performance note: When len(PointKeyFilters) > 0, the caller should ensure
// that cap(PointKeyFilters) is at least len(PointKeyFilters)+1. This helps
// avoid allocations in Pebble internal code that mutates the slice.
PointKeyFilters []BlockPropertyFilter
// RangeKeyFilters can be usefd to avoid scanning tables and blocks in tables
// when iterating over range keys. The same requirements that apply to
// PointKeyFilters apply here too.
RangeKeyFilters []BlockPropertyFilter

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 8 files reviewed, 3 unresolved discussions (waiting on @jbowens)


sstable/writer.go line 2047 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Is this right? My recollection is that range key properties and point key properties are separate. Eg, a table-level block property should not reflect the state of the contained range keys. The reason is because the point and range key iterators are separate, so the decision of whether or not to examine a particular file should be too. For this reason IterOptions takes separate block-property filters for point keys and range keys:

pebble/options.go

Lines 133 to 145 in 368636f

// PointKeyFilters can be used to avoid scanning tables and blocks in tables
// when iterating over point keys. This slice represents an intersection
// across all filters, i.e., all filters must indicate that the block is
// relevant.
//
// Performance note: When len(PointKeyFilters) > 0, the caller should ensure
// that cap(PointKeyFilters) is at least len(PointKeyFilters)+1. This helps
// avoid allocations in Pebble internal code that mutates the slice.
PointKeyFilters []BlockPropertyFilter
// RangeKeyFilters can be usefd to avoid scanning tables and blocks in tables
// when iterating over range keys. The same requirements that apply to
// PointKeyFilters apply here too.
RangeKeyFilters []BlockPropertyFilter

Hm.. but there is no separate place to store the range-key derived properties, is there? We don't write out props for the range key block currently..

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 8 files reviewed, 3 unresolved discussions (waiting on @jbowens)


sstable/block_property.go line 125 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

It seems we could further simplify the suffix replacement bit by making it a stateless function like

type BlockPropertySuffixTransform func(buf []byte, oldProp, newSuffix []byte) []byte

It feels a bit off to force BlockPropertyCollectors to know about suffix replacement through SupportsSuffixReplacement and AddCollectedWithSuffixReplacement. Instead, the caller initiating the suffix replacement could supply a map of collector names to transforms. We error out if a block property exists for which there is no supplied transform.

This mapping would have to be provided through the Options, yes? Feels more clunky than requiring it innthe implementation.. but I'll experiment with it.

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 8 files reviewed, 3 unresolved discussions (waiting on @RaduBerinde)


sstable/writer.go line 2047 at r1 (raw file):

Previously, RaduBerinde wrote…

Hm.. but there is no separate place to store the range-key derived properties, is there? We don't write out props for the range key block currently..

Hmmm... maybe range key block property filters have just never worked. I know we don't use them in Cockroach

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 8 files reviewed, 3 unresolved discussions (waiting on @jbowens)


sstable/block_property.go line 48 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I'm trying to think through whether we might regret this loss of power and expressibility. Even if we did eventually come to want to use some property that can't be hierarchical, I suppose it wouldn't be much effort to work it back in.

Yes.. I could imagine a case where the property is too large to store at the data block level but is reasonable for the table level. But if we want to be able to derive properties for partially downloaded sst files, we will need this stricter framework.


sstable/writer.go line 2047 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Hmmm... maybe range key block property filters have just never worked. I know we don't use them in Cockroach

The RangeKeyFilters code use the table properties (newRangeKeyIter -> checkAndIntersectFilters -> sstable.IntersectsTable), so they should work (but of course the properties will include point keys).

It looks like we stopped using them in cockroachdb/cockroach#86260 but I'm not sure I quite understand the problem.

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 8 files reviewed, 3 unresolved discussions (waiting on @RaduBerinde)


sstable/writer.go line 2047 at r1 (raw file):

Previously, RaduBerinde wrote…

The RangeKeyFilters code use the table properties (newRangeKeyIter -> checkAndIntersectFilters -> sstable.IntersectsTable), so they should work (but of course the properties will include point keys).

It looks like we stopped using them in cockroachdb/cockroach#86260 but I'm not sure I quite understand the problem.

The MVCCIncrementalIterator relies on very subtle semantics to keep the time-bound iterator and non-bounded iterator in sync. I believe the issue is that the user-observed fragmentation varies (because we're hiding some range keys), which breaks the assumptions that keep the two iterators in lock step.

In any case, I don't think we want to combine point and range key properties since we'll only be reducing the effectiveness of each.

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 8 files reviewed, 3 unresolved discussions (waiting on @jbowens)


sstable/writer.go line 2047 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

The MVCCIncrementalIterator relies on very subtle semantics to keep the time-bound iterator and non-bounded iterator in sync. I believe the issue is that the user-observed fragmentation varies (because we're hiding some range keys), which breaks the assumptions that keep the two iterators in lock step.

In any case, I don't think we want to combine point and range key properties since we'll only be reducing the effectiveness of each.

This is great, thanks. So in the short term, we can just not compute range key properties. And in a follow-up I will add the ability to compute them and store them in the range key block handle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants