Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unable to push down predicates for NESTED data type from Athena to Lambda connector #1693

Open
shubhA941 opened this issue Jan 9, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@shubhA941
Copy link

Describe the bug
I am using Athena OpenSearch lambda connector to query OpenSearch index data in SQL manner. While doing so, i am seeing from lambda logs that it is unable to evaluate predicates (filters) for nested object, and hence, its scanning/scrolling all data from OpenSearch index.

I have OpenSearch index schema which looks like this:
creationTime : string
usage : struct -- struct contains first class attributes and other nested attributes

This is how Struct looks like : struct<date:string,version:bigint,revenueList:array<struct<country:string,unit:string,cost:double,quantity:bigint,eventList:array<structcount:bigint,type:string>,state:string>>>

Issue with Athena Query : "select * from index where usage.version = 1"
When i query nested object (here, usage), lambda did not evaluate any predicates and starts to scroll/scan full OpenSearch index. While doing so, athena query times out at 15 mins (lambda max time).

Lambda logs for the same :
2024-01-09 19:35:52 595425a9-f70d-4ff2-85f9-415a2f4e075f INFO ElasticsearchQueryUtils:114 - Predicates are NOT formed.
2024-01-09 19:35:52 595425a9-f70d-4ff2-85f9-415a2f4e075f INFO GeneratedRowWriter:129 - recompile: Detected a new block, rebuilding field writers so they point to the correct Arrow vectors.
2024-01-09 19:36:03 595425a9-f70d-4ff2-85f9-415a2f4e075f INFO S3BlockSpiller:208 - writeRow: Spilling block with 33625 rows and 16000220 bytes and config 16000000 bytes

(This logs keeps on coming until lambda timeout, i.e. scrolling all index data and writing 16000000 bytes in athena spill bucket)

Expected behavior
Ideally, lambda should be evaluating right set of predicates on nested objects as well. In our case predicates are important because it totally defines how the filtering clauses would get executed on the OpenSearch query. If right set of filters are not passed to OpenSearch, then lambda would starts to scan all index data which is costlier in terms of time.

@shubhA941 shubhA941 added the bug Something isn't working label Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant