Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help users of t-digest upgrade #171

Open
tdunning opened this issue Jun 7, 2021 · 3 comments
Open

Help users of t-digest upgrade #171

tdunning opened this issue Jun 7, 2021 · 3 comments

Comments

@tdunning
Copy link
Owner

tdunning commented Jun 7, 2021

The following important projects are using older t-digest versions and should probably upgrade. An asterisk marks projects that have had a pull request or notification.

3.2:

  • elastic: server
  • atlassian bamboo core
    datavec api
  • apache drill
  • apache pinot core
    ML Dataset
    spotify metrics core
    apache servicemix
    apache nutch
    apache beam sdks java extensions sketching
    kixi stats
    apache druid extensions
    deeplearning4j modelexport solr

3.1:
apache solr
Apache Solr Content Extraction Library
Apache Solr Language Identifier
Apache Solr Prometheus Exporter Package
apache mahout math
apache kylin
Kite Morphlines Metrics Scalable
Brushfire Core (from stripe)

@dblock
Copy link

dblock commented Jun 20, 2022

I am upgrading OpenSearch, opensearch-project/OpenSearch#3634.

We have the following assertion in tests with 3.2:

        final TDigestState state = new TDigestState(100);
        Arrays.stream(values).forEach(state::add);
        assertEquals(state.centroidCount(), values.length);

This is no longer the case with 3.3. @tdunning, could you please help me understand what changed? There are more failures after this, including very different percentiles returned with this upgrade.

@tdunning
Copy link
Owner Author

I think that this test is flawed by having assumptions out of view.

The number of centroids was always defined as <= the number of samples inserted. That is the entire point of the sketch ... to not store everything.

What may have changed is that the limit on centroid count has been made a bit more strict.

I could comment more intelligently if I knew what was in values

@dblock
Copy link

dblock commented Jun 21, 2022

Thanks @tdunning. I tagged you in opensearch-project/OpenSearch#3634 that has source data from basic tests, and the diff in results from the upgrade. Could I please ask you to take a look? I certainly didn't expect such different numbers from an upgrade of a dot release of t-digest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants