Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1-dimensional-data-modeling] pg_restore not idempotent? #133

Open
gitgithan opened this issue Nov 18, 2024 · 2 comments
Open

[1-dimensional-data-modeling] pg_restore not idempotent? #133

gitgithan opened this issue Nov 18, 2024 · 2 comments

Comments

@gitgithan
Copy link

gitgithan commented Nov 18, 2024

I noticed running pg_restore -U user -d postgres data.dump repeatedly causes some tables to grow in size.
I expected tables to be the same no matter how many restores are done.

This would be confusing users later if they don't know this can happen.
Users may restore multiple times when retrying failed attempts.

The 2 images show 1 restore apart. Top section is local, bottom section is docker.
imageimage

I'm guessing duplicated data is being inserted after multiple restores.

Is docker non-idempotent behaviour explained at Initialization scripts section of https://hub.docker.com/_/postgres?

Warning: scripts in /docker-entrypoint-initdb.d are only run if you start the container with a data directory that is empty; any pre-existing database will be left untouched on container startup. One common problem is that if one of your /docker-entrypoint-initdb.d scripts fails (which will cause the entrypoint script to exit) and your orchestrator restarts the container with the already initialized data directory, it will not continue on with your scripts.

@gitgithan
Copy link
Author

gitgithan commented Nov 19, 2024

I propose using pg_restore -c --if-exists -U user -d postgres data.dump

--if-exists caters to the 1st run of pg_restore

From man,

   -c
   --clean
       Before restoring database objects, issue commands to DROP all the objects that will be restored. This option is
       useful for overwriting an existing database. If any of the objects do not exist in the destination database,
       ignorable error messages will be reported, unless --if-exists is also specified.

From right to left,
3 pg_restore without -c --if-exists, followed by 1 pg_restore with. To show that the sizes go back to the original size instead of duplicating data.

image

This non-idempotent issue has caused errors that made me second guess whether the restore succeeded.
With a -c during pg_restore, these errors should not appear and give users more confidence the restore works.

--------TRUNCATED--------
pg_restore: from TOC entry 3227; 2606 16435 FK CONSTRAINT game_details game_details_game_id_fkey postgres
pg_restore: error: could not execute query: ERROR:  constraint "game_details_game_id_fkey" for relation "game_details" already exists
Command was: ALTER TABLE ONLY public.game_details
    ADD CONSTRAINT game_details_game_id_fkey FOREIGN KEY (game_id) REFERENCES public.games(game_id);


pg_restore: warning: errors ignored on restore: 16

This idea is more relevant to local setups of postgres.
For docker which uses initialization scripts, the 1st post in this issue cites that docker only runs initialization scripts once, so adding -c --if-exists in /docker-entrypoint-initdb.d/init-db.sh seems pointless.
However these 2 options are still useful if users exec into container to restore again if depending on /docker-entrypoint-initdb.d/init-db.sh fails


Extra investigations

I checked output_file.sql from pg_restore -U user -f output_file.sql data.dump
and was surprised to see it drops all constraints and tables, so I expect the restore to be idempotent even without -c or --if-exists. My guess is even though the commands are laid out here, what actually gets run still gets affected by flags used or missing during pg_restore, so I bet these 4 + 9 statements will not run unless -c and --if-exists is used during restore

ALTER TABLE IF EXISTS ONLY public.game_details DROP CONSTRAINT IF EXISTS game_details_game_id_fkey;
ALTER TABLE IF EXISTS ONLY public.player_seasons DROP CONSTRAINT IF EXISTS player_seasons_pkey;
ALTER TABLE IF EXISTS ONLY public.games DROP CONSTRAINT IF EXISTS games_pkey;
ALTER TABLE IF EXISTS ONLY public.actor_films DROP CONSTRAINT IF EXISTS actor_films_pkey;
DROP TABLE IF EXISTS public.users;
DROP TABLE IF EXISTS public.teams;
DROP TABLE IF EXISTS public.player_seasons;
DROP TABLE IF EXISTS public.games;
DROP TABLE IF EXISTS public.game_details;
DROP TABLE IF EXISTS public.events;
DROP TABLE IF EXISTS public.devices;
DROP TABLE IF EXISTS public.arena;
DROP TABLE IF EXISTS public.actor_films;

@ry-v1
Copy link
Collaborator

ry-v1 commented Dec 14, 2024

@gitgithan - can pls make PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants