[2/2]Cherry-Pick Fix gpcheckcat false alarms for pg_default_acl etc #814

yjhjstz · 2024-12-25T07:18:37Z

Fixes #ISSUE_Number

What does this PR do?

Type of Change

Bug fix (non-breaking change)
New feature (non-breaking change)
Breaking change (fix or feature with breaking changes)
Documentation update

Breaking Changes

Test Plan

Unit tests added/updated
Integration tests added/updated
Passed make installcheck
Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Followed contribution guide
Added/updated documentation
Reviewed code for security implications
Requested review from cloudberry committers

Additional Context

CI Skip Instructions

In ATExecAddColumn() where we find child tables to apply a flag, we used to open the tables w/o lock. That is because we have already held a lock on the parent and we implicitly believed that no one else could alter the child tables now. That is still correct, but the upstream had introduced an expectation for relation_open() in 4b21acf that all callers of relation_open() should have acquired a lock except in bootstrap mode. I do not think it's important to still use NoLock and try to workaround that assert in relation_open() in some way. Therefore, resolving the FIXME as it is.

Issue: gpinitsystem is not working with debug option -D and throwing errors as below: 20220715:11:44:09:048391 gpinitsystem:-[INFO]:-End Function PING_HOST /usr/local/gpdb6/bin/lib/gp_bash_functions.sh: line 1051: [: 2>/dev/null: integer expression expected /usr/local/gpdb6/bin/lib/gp_bash_functions.sh: line 1051: [: REMOTE_EXECUTE_AND_GET_OUTPUT: integer expression expected /usr/local/gpdb6/bin/lib/gp_bash_functions.sh: line 1055: [: too many arguments /usr/local/gpdb6/bin/lib/gp_bash_functions.sh: line 1060: [: too many arguments /usr/local/gpdb6/bin/lib/gp_bash_functions.sh: line 1066: [: too many arguments /usr/local/gpdb6/bin/lib/gp_bash_functions.sh: line 1079: [: too many arguments 20220715:11:44:09:048391 gpinitsystem:-[INFO]:-End Function GET_PG_PID_ACTIVE Cause: When we run gpinitsystem with debug option, all the "LOG_MSG" statements are logging(echoing) messages to both STDOUT(console) and log file (As per LOG_MSG function definition present in gpdb/bin/lib/gp_bash_functions.sh). Causing "REMOTE_EXECUTE_AND_GET_OUTPUT" function to echo(print) mutliple messages but only one string is expected to be returned from this function, so the error When we run gpinitsystem without debug option, there is no issue because in that case "LOG_MSG" will log messages to log file only. Fix: Save the initial DEBUG_LEVEL and turn off the debugging at start of this function and then reset DEBUG_LEVEL to INITIAL_DEBUG_LEVEL at end of this function, which will cause all the LOG_MSG statements to log messages only in log file, not echo on STDOUT. Added behave scenario for the above changes

The LSN output from pg_switch_wal() is commonly used with pg_walfile_name(). However, the LSN set outputted from gp_switch_wal() cannot be used by pg_walfile_name() because of the very likely timeline differences of the coordinator segment and all the segments. To make sure users/developers that want the WAL segment filename and 100% guarantee that it is correct, we should bake it into gp_switch_wal(). Having a separate catalog function would create a window where HA failover would make the timeline ids incorrect and we would have the same problem all over again. We also convert the function into a C function to guarantee that all the function calls in gp_switch_wal() are called on the correct segments. Using the EXECUTE ON SEGMENT/COORDINATOR combo with a UNION was observed to have known issues when dealing with segment-specific values (specifically the coordinator) and redistribute motion plans. Issue example: ``` postgres=# SELECT gp_segment_id, last_archived_wal FROM gp_stat_archiver ORDER BY gp_segment_id; gp_segment_id | last_archived_wal ---------------+-------------------------- -1 | 000000070000000300000033 0 | 000000060000000200000039 1 | 00000006000000020000003B 2 | 00000006000000020000003D (4 rows) postgres=# create table testtable(a int); CREATE TABLE -- the walfilenames all have the coordinator's timeline id in them postgres=# SELECT gp_segment_id, pg_switch_wal, pg_walfile_name(pg_switch_wal) AS walfilename FROM gp_switch_wal() ORDER BY gp_segment_id; gp_segment_id | pg_switch_wal | walfilename ---------------+---------------+-------------------------- -1 | 3/D0042DB0 | 000000070000000300000034 0 | 2/E8042230 | 00000007000000020000003A 1 | 2/F0042230 | 00000007000000020000003C 2 | 2/F8042230 | 00000007000000020000003E (4 rows) -- these are the expected WAL segment filenames with correct timeline id postgres=# SELECT gp_segment_id, last_archived_wal FROM gp_stat_archiver ORDER BY gp_segment_id; gp_segment_id | last_archived_wal ---------------+-------------------------- -1 | 000000070000000300000034 0 | 00000006000000020000003A 1 | 00000006000000020000003C 2 | 00000006000000020000003E (4 rows) ```

Considering that the original command is UPDATE for a SplitUpdate, fire INSERT and DELETE triggers may lead to the wrong action to be enforced. And the triggers in GPDB may require cross segments data changes, disallow the INSERT and DELETE triggers on a SplitUpdate.

We used to always dispatch in DefineIndex() but now we do not if it is part of the ALTER TABLE command execution. So we no longer need to skip DefineIndex for QEs. Confirmed and remove this FIXME.

This issue does not seeem relevant any more on recent versions of clang. I was able to build with configure flag --with-llvm without any issue on clang 10.0.0.

The test started failing after a change to `gp_switch_wal()` was made. The db_size_functions regress test was not updated accordingly. Fix the test by simply updating the expected output (matchsub in the test added as well). GPDB commit reference: https://github.com/greenplum-db/gpdb/commit/d3163467b018082b1b36d70f94fe95e59898b6e0

* Fix a panic case in the postgres_fdw test. It happens in the below query: select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10; The DDL statements are: ``` CREATE EXTENSION greenplum_fdw; CREATE SERVER loopback FOREIGN DATA WRAPPER greenplum_fdw OPTIONS (host 'localhost', port '15432', dbname 'gpadmin'); CREATE USER MAPPING FOR CURRENT_USER SERVER loopback; create table foo (f1 int, f2 int); create table loct1 (f1 int, f2 int, f3 int); create foreign table foo2 (f3 int) inherits (foo) server loopback options (table_name 'loct1', use_remote_estimate 'true'); ``` The reason is that set_append_path_locus()->cdbpath_create_motion_path() returns NULL (i.e. can not create a motion to reach the target locus requirement) but set_append_path_locus() does not handle this bad case and panic due to this (i.e. when accessing subpath->sameslice_relids while subpath is NULL). This happens because in the above query, foo has a child with foreign table (and a local heap table child additionally) and the append node tries to create a parameterized path (one is an index scan path and another is the foreign scan path; the target locus is Entry). This happens when use_remote_estimate is set on the foreign table. Per postgresGetForeignPaths() code only when this option is set, parameterized path for foreign scan is considered. Fixing this by returning NULL path for the case that set_append_path_locus() returns false. Callers of them should take care of the NULL returning cases. Co-authored-by: Huiliang.liu <[email protected]> Co-authored-by: Paul Guo <[email protected]>

In postgres, multiple recursive slef-references is disallowed by the SQL spec, and prevent inlining of multiply-referenced CTEs with outer recursive refs, so they only have one WorkTable correlate to one RecursiveUnion. But in GPDB, we don't support CTE scan plan, if a WITH RECURSIVE query has multiply-referenced CTEs with outer recursive, there will be multiple WorkTable scans correlate to one RecursiveUnion, and they share the same readptr of working table, so that it will lose some datas. Create a readptr for each WorkTable scan so that they won't influence each other. In postgres, it can generate CTE SubPlans for WITH subqueries, if a CTE subplan have outer recursive refs with outer recursive CTE, it will exist a WorkTable in subplan. And initialize state information for subplans will be before initialize on the main query tree, so there are corner cases where we'll get the init call before the RecursiveUnion does. IN GPDB, we don't have CTE scan node, so we won't generate CTE subplan for WITH subqueries, also we won't call the ExecInitWorkTableScan func before ExecInitRecursiveUnion. Set rustate in the INIT step. since shareinputscan with outer refs is not supported by GPDB, if contain outer self references, the cte also need to be inlined. Remove GPDB_12_MERGE_FIXME in subselect.sql which describes the issue.

gpcheckcat currently flags default privilege objects as having missing dependencies. This is because it searches for dependencies of pg_default_acl oids in pg_depend instead of in pg_shdepend. The fix is to recognize pg_default_acl as a table outside the purview of pg_depend. Minimal repro: postgres=# CREATE ROLE foo; postgres=# ALTER DEFAULT PRIVILEGES FOR ROLE foo REVOKE EXECUTE ON FUNCTIONS FROM PUBLIC; postgres=# SELECT oid, * FROM pg_default_acl; oid | defaclrole | defaclnamespace | defaclobjtype | defaclacl -------+------------+-----------------+---------------+------------- 17304 | 17303 | 0 | f | {foo=X/foo} (1 row) postgres=# SELECT * FROM pg_shdepend WHERE objid = 17304; dbid | classid | objid | objsubid | refclassid | refobjid | deptype -------+---------+-------+----------+------------+----------+--------- 12812 | 826 | 17304 | 0 | 1260 | 17303 | o (2 rows) $ gpcheckcat postgres ... Object oid: 17304 Table name: pg_default_acl Name of test which found this issue: dependency_pg_default_acl Table pg_default_acl has a dependency issue on oid 17304 at content -1 Table pg_default_acl has a dependency issue on oid 17304 at content 0 Table pg_default_acl has a dependency issue on oid 17304 at content 1 Table pg_default_acl has a dependency issue on oid 17304 at content 2

The report_progress function may take a NULL argument for cluster, but was not checking it before trying to print cluster->major_version_str, resulting in a core dump. Add a check before fprintf call, which also resolves a FIXME. Also use --progress flag in pg_upgrade regression test. Authored-by: Brent Doil <[email protected]>

yjhjstz added the cherry-pick cherry-pick upstream commts label Dec 25, 2024

yjhjstz changed the title ~~Cherry-Pick Fix gpcheckcat false alarms for pg_default_acl etc~~ [2/2]Cherry-Pick Fix gpcheckcat false alarms for pg_default_acl etc Dec 25, 2024

yjhjstz force-pushed the pick_1223 branch from 5421a71 to 1cf48ce Compare December 25, 2024 07:24

my-ship-it previously approved these changes Dec 25, 2024

View reviewed changes

my-ship-it requested a review from avamingli December 25, 2024 09:49

avamingli previously approved these changes Dec 25, 2024

View reviewed changes

huansong and others added 13 commits December 26, 2024 07:39

Cleanup a FIXME in ATExecAddIndex

8d51a5e

We used to always dispatch in DefineIndex() but now we do not if it is part of the ALTER TABLE command execution. So we no longer need to skip DefineIndex for QEs. Confirmed and remove this FIXME.

Remove -emit-llvm bitcode compile flag work around

c99d2d8

This issue does not seeem relevant any more on recent versions of clang. I was able to build with configure flag --with-llvm without any issue on clang 10.0.0.

change verify checksum FIXME to FEATURE NOT SUPPORTED

1fadf71

yjhjstz force-pushed the pick_1223 branch from 1cf48ce to 2f91573 Compare December 25, 2024 23:40

yjhjstz dismissed stale reviews from avamingli and my-ship-it via 8837b15 December 26, 2024 02:34

yjhjstz force-pushed the pick_1223 branch from 8837b15 to 2f91573 Compare December 26, 2024 04:09

yjhjstz requested review from my-ship-it and avamingli December 26, 2024 05:14

avamingli approved these changes Dec 26, 2024

View reviewed changes

my-ship-it approved these changes Dec 26, 2024

View reviewed changes

my-ship-it merged commit 8d20831 into apache:main Dec 26, 2024
31 of 32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[2/2]Cherry-Pick Fix gpcheckcat false alarms for pg_default_acl etc #814

[2/2]Cherry-Pick Fix gpcheckcat false alarms for pg_default_acl etc #814

yjhjstz commented Dec 25, 2024

[2/2]Cherry-Pick Fix gpcheckcat false alarms for pg_default_acl etc #814

[2/2]Cherry-Pick Fix gpcheckcat false alarms for pg_default_acl etc #814

Conversation

yjhjstz commented Dec 25, 2024

What does this PR do?

Type of Change

Breaking Changes

Test Plan

Impact

Checklist

Additional Context

CI Skip Instructions