Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2/2]Cherry-Pick Fix gpcheckcat false alarms for pg_default_acl etc #814

Merged
merged 13 commits into from
Dec 26, 2024

Conversation

yjhjstz
Copy link
Member

@yjhjstz yjhjstz commented Dec 25, 2024

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


@yjhjstz yjhjstz added the cherry-pick cherry-pick upstream commts label Dec 25, 2024
@yjhjstz yjhjstz changed the title Cherry-Pick Fix gpcheckcat false alarms for pg_default_acl etc [2/2]Cherry-Pick Fix gpcheckcat false alarms for pg_default_acl etc Dec 25, 2024
my-ship-it
my-ship-it previously approved these changes Dec 25, 2024
@my-ship-it my-ship-it requested a review from avamingli December 25, 2024 09:49
avamingli
avamingli previously approved these changes Dec 25, 2024
huansong and others added 13 commits December 26, 2024 07:39
In ATExecAddColumn() where we find child tables to apply a flag,
we used to open the tables w/o lock. That is because we have already
held a lock on the parent and we implicitly believed that no one
else could alter the child tables now. That is still correct, but
the upstream had introduced an expectation for relation_open()
in 4b21acf that all callers of
relation_open() should have acquired a lock except in bootstrap
mode. I do not think it's important to still use NoLock and try
to workaround that assert in relation_open() in some way.
Therefore, resolving the FIXME as it is.
Issue:
gpinitsystem is not working with debug option -D and throwing errors as below:
20220715:11:44:09:048391 gpinitsystem:-[INFO]:-End Function PING_HOST
/usr/local/gpdb6/bin/lib/gp_bash_functions.sh: line 1051: [: 2>/dev/null: integer expression expected
/usr/local/gpdb6/bin/lib/gp_bash_functions.sh: line 1051: [: REMOTE_EXECUTE_AND_GET_OUTPUT: integer expression expected
/usr/local/gpdb6/bin/lib/gp_bash_functions.sh: line 1055: [: too many arguments
/usr/local/gpdb6/bin/lib/gp_bash_functions.sh: line 1060: [: too many arguments
/usr/local/gpdb6/bin/lib/gp_bash_functions.sh: line 1066: [: too many arguments
/usr/local/gpdb6/bin/lib/gp_bash_functions.sh: line 1079: [: too many arguments
20220715:11:44:09:048391 gpinitsystem:-[INFO]:-End Function GET_PG_PID_ACTIVE

Cause:
When we run gpinitsystem with debug option, all the "LOG_MSG" statements are logging(echoing) messages to both STDOUT(console) and log file (As per LOG_MSG function definition present in gpdb/bin/lib/gp_bash_functions.sh). Causing "REMOTE_EXECUTE_AND_GET_OUTPUT" function to echo(print) mutliple messages but only one string is expected to be returned from this function, so the error
When we run gpinitsystem without debug option, there is no issue because in that case "LOG_MSG" will log messages to log file only.

Fix:
Save the initial DEBUG_LEVEL and turn off the debugging at start of this function and then reset DEBUG_LEVEL to INITIAL_DEBUG_LEVEL at end of this function, which will cause all the LOG_MSG statements to log messages only in log file, not echo on STDOUT.

Added behave scenario for the above changes
The LSN output from pg_switch_wal() is commonly used with
pg_walfile_name(). However, the LSN set outputted from gp_switch_wal()
cannot be used by pg_walfile_name() because of the very likely
timeline differences of the coordinator segment and all the
segments. To make sure users/developers that want the WAL segment
filename and 100% guarantee that it is correct, we should bake it into
gp_switch_wal(). Having a separate catalog function would create a
window where HA failover would make the timeline ids incorrect and we
would have the same problem all over again.

We also convert the function into a C function to guarantee that all
the function calls in gp_switch_wal() are called on the correct
segments. Using the EXECUTE ON SEGMENT/COORDINATOR combo with a UNION
was observed to have known issues when dealing with segment-specific
values (specifically the coordinator) and redistribute motion plans.

Issue example:
```
postgres=# SELECT gp_segment_id, last_archived_wal FROM gp_stat_archiver ORDER BY gp_segment_id;
 gp_segment_id |    last_archived_wal
---------------+--------------------------
            -1 | 000000070000000300000033
             0 | 000000060000000200000039
             1 | 00000006000000020000003B
             2 | 00000006000000020000003D
(4 rows)

postgres=# create table testtable(a int);
CREATE TABLE

-- the walfilenames all have the coordinator's timeline id in them
postgres=# SELECT gp_segment_id, pg_switch_wal, pg_walfile_name(pg_switch_wal) AS walfilename FROM gp_switch_wal() ORDER BY gp_segment_id;
 gp_segment_id | pg_switch_wal |       walfilename
---------------+---------------+--------------------------
            -1 | 3/D0042DB0    | 000000070000000300000034
             0 | 2/E8042230    | 00000007000000020000003A
             1 | 2/F0042230    | 00000007000000020000003C
             2 | 2/F8042230    | 00000007000000020000003E
(4 rows)

-- these are the expected WAL segment filenames with correct timeline id
postgres=# SELECT gp_segment_id, last_archived_wal FROM gp_stat_archiver ORDER BY gp_segment_id;
 gp_segment_id |    last_archived_wal
---------------+--------------------------
            -1 | 000000070000000300000034
             0 | 00000006000000020000003A
             1 | 00000006000000020000003C
             2 | 00000006000000020000003E
(4 rows)
```
Considering that the original command is UPDATE for a SplitUpdate, fire
INSERT and DELETE triggers may lead to the wrong action to be enforced.
And the triggers in GPDB may require cross segments data changes, disallow
the INSERT and DELETE triggers on a SplitUpdate.
We used to always dispatch in DefineIndex() but now we do not if
it is part of the ALTER TABLE command execution. So we no longer
need to skip DefineIndex for QEs. Confirmed and remove this FIXME.
This issue does not seeem relevant any more on recent versions of clang.
I was able to build with configure flag --with-llvm without any issue on
clang 10.0.0.
The test started failing after a change to `gp_switch_wal()` was
made. The db_size_functions regress test was not updated
accordingly. Fix the test by simply updating the expected output
(matchsub in the test added as well).

GPDB commit reference:
https://github.com/greenplum-db/gpdb/commit/d3163467b018082b1b36d70f94fe95e59898b6e0
* Fix a panic case in the postgres_fdw test.

It happens in the below query:
	  select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;

The DDL statements are:
```
CREATE EXTENSION greenplum_fdw;
CREATE SERVER loopback FOREIGN DATA WRAPPER greenplum_fdw OPTIONS (host 'localhost', port '15432', dbname 'gpadmin');
CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;

create table foo (f1 int, f2 int);
create table loct1 (f1 int, f2 int, f3 int);
create foreign table foo2 (f3 int) inherits (foo)
  server loopback options (table_name 'loct1', use_remote_estimate 'true');
```

The reason is that set_append_path_locus()->cdbpath_create_motion_path()
returns NULL (i.e. can not create a motion to reach the target locus
requirement) but set_append_path_locus() does not handle this bad case and
panic due to this (i.e. when accessing subpath->sameslice_relids while subpath
is NULL). This happens because in the above query, foo has a child with foreign
table (and a local heap table child additionally) and the append node tries to
create a parameterized path (one is an index scan path and another is the
foreign scan path; the target locus is Entry).  This happens when
use_remote_estimate is set on the foreign table. Per postgresGetForeignPaths()
code only when this option is set, parameterized path for foreign scan is
considered.

Fixing this by returning NULL path for the case that set_append_path_locus()
returns false.  Callers of them should take care of the NULL returning cases.

Co-authored-by: Huiliang.liu <[email protected]>
Co-authored-by: Paul Guo <[email protected]>
…063)

This patch adds support for describing auxiliary tables. e.g.,

```
postgres=# create table foo(i int) with (appendonly=true);

postgres=# create index on foo(i);
CREATE INDEX

postgres=# select blkdirrelid::regclass from pg_appendonly where relid = 'foo'::regclass;
        blkdirrelid
----------------------------
 pg_aoseg.pg_aoblkdir_16411
(1 row)

postgres=# \d+ pg_aoseg.pg_aoblkdir_16411
Appendonly block directory table: "pg_aoseg.pg_aoblkdir_16411"
     Column     |  Type   | Storage
----------------+---------+---------
 segno          | integer | plain
 columngroup_no | integer | plain
 first_row_no   | bigint  | plain
 minipage       | bytea   | plain
Indexes:
    "pg_aoblkdir_16411_index" PRIMARY KEY, btree (segno, columngroup_no, first_row_no)

postgres=# \d+ pg_aoseg.pg_aovisimap_16411
Appendonly visibility map table: "pg_aoseg.pg_aovisimap_16411"
    Column    |  Type   | Storage
--------------+---------+---------
 segno        | integer | plain
 first_row_no | bigint  | plain
 visimap      | bytea   | plain
Indexes:
    "pg_aovisimap_16411_index" PRIMARY KEY, btree (segno, first_row_no)

postgres=# \d+ pg_aoseg.pg_aoseg_16411
Appendonly segment entry table: "pg_aoseg.pg_aoseg_16411"
     Column      |   Type   | Storage
-----------------+----------+---------
 segno           | integer  | plain
 eof             | bigint   | plain
 tupcount        | bigint   | plain
 varblockcount   | bigint   | plain
 eofuncompressed | bigint   | plain
 modcount        | bigint   | plain
 formatversion   | smallint | plain
 state           | smallint | plain
```
In postgres, multiple recursive slef-references is disallowed by the SQL spec,
and prevent inlining of multiply-referenced CTEs with outer recursive refs, so
they only have one WorkTable correlate to one RecursiveUnion. But in GPDB,
we don't support CTE scan plan, if a WITH RECURSIVE query has
multiply-referenced CTEs with outer recursive, there will be multiple WorkTable
scans correlate to one RecursiveUnion, and they share the same readptr of
working table, so that it will lose some datas. Create a readptr for each
WorkTable scan so that they won't influence each other.

In postgres, it can generate CTE SubPlans for WITH subqueries, if a CTE subplan
have outer recursive refs with outer recursive CTE, it will exist a WorkTable in
subplan. And initialize state information for subplans will be before initialize on
the main query tree, so there are corner cases where we'll get the init call before
the RecursiveUnion does. IN GPDB, we don't have CTE scan node, so we won't
generate CTE subplan for WITH subqueries, also we won't call the
ExecInitWorkTableScan func before ExecInitRecursiveUnion. Set rustate in the INIT
step.

since shareinputscan with outer refs is not supported by GPDB, if contain outer
self references, the cte also need to be inlined.

Remove GPDB_12_MERGE_FIXME in subselect.sql which describes the issue.
gpcheckcat currently flags default privilege objects as having missing
dependencies. This is because it searches for dependencies of
pg_default_acl oids in pg_depend instead of in pg_shdepend. The fix is
to recognize pg_default_acl as a table outside the purview of pg_depend.

Minimal repro:

postgres=# CREATE ROLE foo;
postgres=# ALTER DEFAULT PRIVILEGES FOR ROLE foo REVOKE EXECUTE ON FUNCTIONS FROM PUBLIC;
postgres=# SELECT oid, * FROM pg_default_acl;
  oid  | defaclrole | defaclnamespace | defaclobjtype |  defaclacl
-------+------------+-----------------+---------------+-------------
 17304 |      17303 |               0 | f             | {foo=X/foo}
(1 row)

postgres=# SELECT * FROM pg_shdepend WHERE objid = 17304;
 dbid  | classid | objid | objsubid | refclassid | refobjid | deptype
-------+---------+-------+----------+------------+----------+---------
 12812 |     826 | 17304 |        0 |       1260 |    17303 | o
(2 rows)

$ gpcheckcat postgres
...
Object oid: 17304
Table name: pg_default_acl

Name of test which found this issue: dependency_pg_default_acl
Table pg_default_acl has a dependency issue on oid 17304 at content -1
Table pg_default_acl has a dependency issue on oid 17304 at content 0
Table pg_default_acl has a dependency issue on oid 17304 at content 1
Table pg_default_acl has a dependency issue on oid 17304 at content 2
The report_progress function may take a NULL argument
for cluster, but was not checking it before trying to
print cluster->major_version_str, resulting in a core
dump.

Add a check before fprintf call, which also resolves a FIXME.

Also use --progress flag in pg_upgrade regression test.

Authored-by: Brent Doil <[email protected]>
@my-ship-it my-ship-it merged commit 8d20831 into apache:main Dec 26, 2024
31 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick cherry-pick upstream commts
Projects
None yet
Development

Successfully merging this pull request may close these issues.