-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Core Dump in mirror_replay
Test Suite During Execution
#782
Comments
FYI: Non debug builds produce the following:
|
CI seem lost isolation2 that |
@yjhjstz & @avamingli I hope to bring it and others online soon. I am able to run two of the isolation2 tests and did notice there are failures (output differences). They can be seen here. https://github.com/edespino/cloudberry/actions/runs/12364538041 |
This test is currently causing core dumps when run as part of the greenplum_schedule. To prevent this from blocking other testing while we investigate the root cause: - Created new fixme_schedule containing only mirror_replay - Removed mirror_replay from greenplum_schedule - Added installcheck-fixme make target to run problematic tests in isolation Issue: apache#782
Hi, at a glance, that's a case we should fix, please feel free to create the PR bringing isolation2 back if there were only that case failed. I will help you fix the diffs there.(on vacation today, perhaps tomorrow I will be back) |
diff -I HINT: -I CONTEXT: -I GP_IGNORE: -U3 /__w/cloudberry/cloudberry/src/test/isolation2/expected/parallel_retrieve_cursor/explain.out /__w/cloudberry/cloudberry/src/test/isolation2/results/parallel_retrieve_cursor/explain.out
[18](https://github.com/edespino/cloudberry/actions/runs/12364538041/job/34508249298#step:18:19)
--- /__w/cloudberry/cloudberry/src/test/isolation2/expected/parallel_retrieve_cursor/explain.out 2024-12-16 17:38:39.620082360 -0800
[19](https://github.com/edespino/cloudberry/actions/runs/12364538041/job/34508249298#step:18:20)
+++ /__w/cloudberry/cloudberry/src/test/isolation2/results/parallel_retrieve_cursor/explain.out 2024-12-16 17:38:39.628082370 -0800
[20](https://github.com/edespino/cloudberry/actions/runs/12364538041/job/34508249298#step:18:21)
@@ -113,40 +113,40 @@
[21](https://github.com/edespino/cloudberry/actions/runs/12364538041/job/34508249298#step:18:22)
QUERY PLAN
[22](https://github.com/edespino/cloudberry/actions/runs/12364538041/job/34508249298#step:18:23)
___________
[23](https://github.com/edespino/cloudberry/actions/runs/12364538041/job/34508249298#step:18:24)
Seq Scan on pg_catalog.pg_class
[24](https://github.com/edespino/cloudberry/actions/runs/12364538041/job/34508249298#step:18:25)
- Output: oid, relname, relnamespace, reltype, reloftype, relowner, relam, relfilenode, reltablespace, relpages, reltuples, relallvisible, reltoastrelid, relhasindex, relisshared, relpersistence, relkind, relnatts, relchecks, relhasrules, relhastriggers, relhassubclass, relrowsecurity, relforcerowsecurity, relispopulated, relreplident, relispartition, relisivm, relrewrite, relfrozenxid, relminmxid, relacl, reloptions, relpartbound
[25](https://github.com/edespino/cloudberry/actions/runs/12364538041/job/34508249298#step:18:26)
-GP_IGNORE:(3 rows)
[26](https://github.com/edespino/cloudberry/actions/runs/12364538041/job/34508249298#step:18:27)
+ Output: oid, relname, relnamespace, reltype, reloftype, relowner, relam, relfilenode, reltablespace, relpages, reltuples, relallvisible, reltoastrelid, relhasindex, relisshared, relpersistence, relkind, relnatts, relchecks, relhasrules, relhastriggers, relhassubclass, relrowsecurity, relforcerowsecurity, relispopulated, relreplident, relispartition, relisivm, relisdynamic, relrewrite, relfrozenxid, relminmxid, relacl, reloptions, relpartbound
[27](https://github.com/edespino/cloudberry/actions/runs/12364538041/job/34508249298#step:18:28)
+GP_IGNORE:(4 rows) help to add |
This test is currently causing core dumps when run as part of the greenplum_schedule. To prevent this from blocking other testing while we investigate the root cause: - Created new fixme_schedule containing only mirror_replay - Removed mirror_replay from greenplum_schedule - Added installcheck-fixme make target to run problematic tests in isolation Issue: apache#782
* Enhance Build Pipeline with Debug and Core Analysis Support Adds comprehensive debug build support and automated core dump analysis to the Cloudberry build pipeline. Key features: - Debug build capability with preserved symbols and debug-specific RPMs - Automated core dump detection and analysis during test execution - Core file correlation with test failures - Enhanced test result reporting with core dump status - Improved artifact management for debug builds The changes enable better debugging of test failures and provide more detailed information about process crashes during testing. * test: Move mirror_replay test to separate schedule due to core dumps This test is currently causing core dumps when run as part of the greenplum_schedule. To prevent this from blocking other testing while we investigate the root cause: - Created new fixme_schedule containing only mirror_replay - Removed mirror_replay from greenplum_schedule - Added installcheck-fixme make target to run problematic tests in isolation Issue: #782 * test: Mark mirror_replay cores as warnings When enable_check_core is disabled, the test should proceed with a warning rather than failing. Modified the core file check and summary to mark mirror_replay with a warning status in these cases. This complements the previous isolation of this test into fixme_schedule, allowing testing to proceed while we investigate the underlying core dump issue.
@edespino can you help to bring |
Yes I will |
@yjhjstz FYI: installcheck-cbdb-parallel is now live: https://github.com/apache/cloudberry/actions/runs/12502691175 |
Apache Cloudberry version
main branch
What happened
The
mirror_replay
test suite is consistently generating a core dump during execution. This test is part of thegreenplum_schedule
running under theic-good-opt-off
(make -c src/test/regress installcheck-good
) test matrix configuration. From the core dump's stack , the issue occurs specifically during the append-only segment file handling in the startup process.Environment
Project: Apache Cloudberry
Test Suite: mirror_replay
Schedule: greenplum_schedule
Test Matrix Config: ic-good-opt-off
Build Type: Debug build with the following configuration:
Stack Trace
The core dump stack trace indicates the crash occurs during append-only segment file handling:
Impact
What you think should happen instead
Analysis
How to reproduce
Ensure your system is capable of generating core files. Execute the following dev test execution command:
make -c src/test/regress installcheck-good
Issue reproduces consistently without additional steps
Operating System
Rocky Linux 9 (should be platfo independent)
Anything else
Additional Context
The error occurs during the append-only truncate replay operation (ao_truncate_replay), suggesting potential issues with either:
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: