Skip to main content

PG Replication on 10 Master / Slave Configuration using traditional cp


How to do Streaming / Replication Setup on PG :
*************************************

Stage 1 : Note down, In this Stage 1, The below example shows that Primary and Standby replication only..No fail over & Fail Back..
          For failover , refer stage 2 and fail back refer stage 3 .
Primary :

Primary Side  : 192.168.100.11
Secondary   : 192.168.100.12


1. CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'replicator';

2. Edit postgresql.conf as below
ALTER SYSTEM SET wal_level TO 'replica';    Must be at least set to hot_standby  until version 9.5 or replica
ALTER SYSTEM SET archive_mode TO 'ON';
ALTER SYSTEM SET max_wal_senders TO '3';
ALTER SYSTEM SET wal_keep_segments TO '10';
ALTER SYSTEM SET listen_addresses TO '*';
ALTER SYSTEM SET hot_standby TO 'ON';
ALTER SYSTEM SET archive_command TO 'cp %p /mnt/server/archivedir/%f';

 You may not need archive_mode or archive_command or restore_command for streaming replication. But, when you setup streaming replication and due to a network lag or whatever reason, if the Slave is falling behind, also if the WALs in pg_xlog or pg_wal are recycled, without archiving the WALs the slave can never get back to sync with Master. Thus, we made sure to ensure we suggest the best practices while building streaming replication, rather than just building it. In every Production environment, these parameters are a must.

3. Add the below on pg_hba.conf file.

host replication replicator 192.168.100.12/32 md5

4. As a postgres user, export PATH=$PATH:/usr/pgsql-10/bin

pg_ctl -D /var/lib/pgsql/10/data restart -mf

Standby Server Side :
=====================

Remove existing custom ts directories or Data directory ..

pg_basebackup -h 192.168.100.11 -U replicator -p 5432 -D /var/lib/pgsql/10/data -P -Xs -R


pg_basebackup -D /var/lib/postgresql/10/replica1 -U rep_user -w -P -R # -X stream



-R : Creates recovery.conf on standby which is mandatory for recovery process.

standby_mode = 'on'
primary_conninfo = 'host=192.168.0.28 port=5432 user=replicator password=replicator'
restore_command = 'cp /path/to/archive/%f %p'
archive_cleanup_command = 'pg_archivecleanup /path/to/archive %r'

pg_ctl -D /var/lib/pgsql/10/data start

One way of checking how up to date is our replication is by checking the amount of WAL records generated in the primary, but not yet applied in the standby.

Validation :

On Master
==========
$ ps -eaf | grep sender

You can see more details by querying the master’s pg_stat_replication view.

Shell

$ psql
postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 1287
usesysid         | 24615
usename          | replicator
application_name | walreceiver
client_addr      | 192.168.0.28
client_hostname  |
client_port      | 36924
backend_start    | 2018-09-07 10:40:48.074496-04
backend_xmin     |
state            | streaming
sent_lsn         | 0/50000D68
write_lsn        | 0/50000D68
flush_lsn        | 0/50000D68
replay_lsn       | 0/50000D68
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async


On Slave
==========
$ ps -eaf | grep receiver
$ ps -eaf | grep startup


1. goto production db, create one db and check the same in secondary node.

postgres=# SELECT pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/50002C8
(1 row)

Slave:


postgres=# SELECT pg_last_wal_receive_lsn();
pg_last_wal_receive_lsn
-------------------------
0/50002C8
(1 row)

postgres=# SELECT pg_last_wal_replay_lsn();
pg_last_wal_replay_lsn
------------------------
0/50002C8
(1 row)
Note: These functions are for PostgreSQL 10. For previous versions, you need to use: SELECT pg_last_xlog_receive_location(); and SELECT pg_last_xlog_replay_location();

We can use the following query to get the lag in seconds.

PostgreSQL 10:


SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS log_delay;
Previous Versions:


SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()
THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS log_delay;

Ref : https://www.percona.com/blog/2018/09/07/setting-up-streaming-replication-postgresql/

STAGE 2 : The below is the continuation of Stage 1. Now standby is getting data from production.
          the below is going to show that, We are going to crash the primary and do testing for Fail Over. ( Means : Standby to Primary conversion )
 
Ref : https://blog.2ndquadrant.com/pgrewind-and-pg95/
 
  Step 1 :  List out the background process running on primary...
 
  bash-4.2$  ps -ef|grep -i post
root      1285  1268  0 19:14 pts/0    00:00:00 sudo su postgres
root      1286  1285  0 19:14 pts/0    00:00:00 su postgres
postgres  1288  1286  0 19:14 pts/0    00:00:00 bash
postgres  2358     1  0 19:35 pts/0    00:00:00 /usr/pgsql-10/bin/postgres -D /var/lib/pgsql/10/data
postgres  2360  2358  0 19:35 ?        00:00:00 postgres: logger process
postgres  2365  2358  0 19:35 ?        00:00:00 postgres: checkpointer process
postgres  2366  2358  0 19:35 ?        00:00:00 postgres: writer process
postgres  2367  2358  0 19:35 ?        00:00:00 postgres: stats collector process
postgres  3227  2358  0 19:47 ?        00:00:00 postgres: wal writer process
postgres  3228  2358  0 19:47 ?        00:00:00 postgres: autovacuum launcher process
postgres  3229  2358  0 19:47 ?        00:00:00 postgres: archiver process   last was 00000002.history
postgres  3230  2358  0 19:47 ?        00:00:00 postgres: bgworker: logical replication launcher
postgres 12267 12229  0 22:43 ?        00:00:00 postgres: wal sender process repuser 192.168.100.11(57610) streaming 0/12000140 ===> Note Sender Process here.


  step 2 :  List out the background process running on standby...
 
  root      1285  1268  0 19:14 pts/0    00:00:00 sudo su postgres
root      1286  1285  0 19:14 pts/0    00:00:00 su postgres
postgres  1288  1286  0 19:14 pts/0    00:00:00 bash
postgres  2358     1  0 19:35 pts/0    00:00:00 /usr/pgsql-10/bin/postgres -D /var/lib/pgsql/10/data
postgres  2360  2358  0 19:35 ?        00:00:00 postgres: logger process
postgres  2361  2358  0 19:35 ?        00:00:02 postgres: startup process   recovering 00000001000000000000                                                             0010
postgres  2365  2358  0 19:35 ?        00:00:00 postgres: checkpointer process
postgres  2366  2358  0 19:35 ?        00:00:00 postgres: writer process
postgres  2367  2358  0 19:35 ?        00:00:00 postgres: stats collector process
postgres 15007 14998  0 22:43 ?        00:00:06 postgres: wal receiver process   streaming 0/12000140 ===> Note Receiver  Process here.

Step 3 :   On Primary execute the command as postgres user ==> pg_ctl -D /var/lib/pgsql/10/data stop  ( assuming that Crashed )


            on Standby execute the below.

pg_ctl -D ${STANDBY1_PGDATA} promote

Note : If you check log file, now it will display to accept dbc connections. Earlier it shows accept read only.

Note : Hereafter, This stdby instance called as Promoted Standby ..Even if you startup the Production DB also, , no changes will be visible from the promoted standby since it is prod now.

Now Promoted standby is the New Master Node ( Primary )

STAGE 3 : Now, We assume that We resolved the disk issues in OLD Primary Server and We have to change the state from OLD Primary to Standby mode.

      Step 1 : Make sure that no processes running on old primary(pgnode1 )
 
pg_ctl -D ${OLD_MASTER_PGDATA} stop

  Step 2 :
 
Run the pg rewind command on OLD primary ( ie., pgnode1 Node ),

/var/lib/pgsql/10/data

bash-4.2<pgnode1>$ pg_rewind --target-pgdata=/var/lib/pgsql/10/data --source-server=" host=pgnode2 port=5432 user=postgres dbname=postgres"
servers diverged at WAL location 0/10000098 on timeline 1
rewinding from last common checkpoint at 0/10000028 on timeline 1
Done!
bash-4.2$



Note : If you get any errors like could not find previous WAL record at 0/501E680, Failure Exiting then

pg_rewind --target-pgdata=${OLD_MASTER_PGDATA} \
    --source-server="port=5433 user=postgres dbname=postgres"
servers diverged at WAL position 0/501E680 on timeline 1
could not open file "/var/lib/pgsql/9.5/master/pg_xlog/000000010000000000000005": No such file or directory

the procedure is to manually copy the missing WALs and put them into the pg_wal/pg_xlog of the master(pgnode2), then run pg_rewind again on pgnode1


  Step 3 : On New Primary Node (Promoted primary node - pgnode2 ), You have to edit pg_hba.conf file to include new standby server IP.
 
 
#local   replication     all                                     peer
local   replication     repuser                                 trust
host    replication     repuser         192.168.100.11/32       md5 =====================> Change the Stanby Node IP ( Now OLD Primary is the standby)
host    replication     all             ::1/128                 ident

  Step 4 : Restart cluster services to take affect pg_ctl -D $PG_DATA restart on Promoted Standby ( Primary Node now )

  Step 4 : On Standby Node ( OLD Primary Node - pgnode1 ),  go to $PG_DATA location and make changes on recovery.conf file.
This file is responsible to know who is the Primary for this standby recovery process.

Rename the resolve.done to resolve.conf file on pgnode1..Change the host IP address. It should match the Actual PROD running now. ( pgnode2 )

bash-4.2$ hostname
pgnode1
bash-4.2$ cat recovery.conf
standby_mode = 'on'
primary_conninfo = 'user=repuser password=repuser123 host=192.168.100.12 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any'
restore_command = 'cp /var/lib/pgsql/10/backups/archivedir/%f %p'
recovery_target_timeline = 'latest'
bash-4.2$

Step 5 : On pgnode1 ( standby node ) : Run the below command to start and check the logs.

     pg_ctl -D /var/lib/pgsql/10/data start
 
Note : Now you can check the updates from logs on both New Primary and converted secondary nodes..Create any table on new primary should reflect to converted standby ..

Comments

Popular posts from this blog

How to configure Barman along with One Node Wal Streaming.

How to configure Barman : Step 1 : Remove existing postgres Instllations if you have for safer side. sudo su root ==> yum remove postgres* & yum remove barman* step 2 : Install this on Both Prod & Backup servers. PostgreSQL 11 You can install the 2ndQuadrant's General Public RPM repository for PostgreSQL 11 by running the following instructions as a user with sudo privileges on the destination Linux server: curl https://dl.2ndquadrant.com/default/release/get/11/rpm | sudo bash The above command will run a shell script that will install the repository definition in your server. You can check the content of the script by running: curl https://dl.2ndquadrant.com/default/release/get/11/rpm | less Step 3 : Install EPEL depends on OS version on prod :  CentOS/RHEL - 7 rpm -Uvh https://yum.postgresql.org/11/redhat/rhel-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm yum install postgresql11-server yum install postgresql11-contrib  ===> Th

PG Frequently used DBA Commands

************************************************************************************************************************************************** NOTE :  By default if You connect any DB in the Cluster with out specified User Name, Eventhough those DBs are owned by different users, then internally postgres will be the User and Public is the default Schema Name respective to that. If you create any tables after logging into DB, It will use Public Schema only since this will be the default search path. If you connect db along with particular user then, objects will be created to the respective corresponding schema pointed to the owner. Even you revoke Public Schema access to respective of all users or specified users, By default search path will be public only. But since public is revoked, it will look out who is the owner of the schema to own the object as owner. ALTER ROLE advuser IN DATABASE "Adventureworks"     SET search_path TO adv_s