部署repmgr PG14集群<无autofailover>
系统环境
OS: CentOS7
192.168.56.11 node1
192.168.56.12 node2
部署操作流程
1. 系统环境初始化
echo redhat | passwd root --stdin
echo -e "192.168.56.11 node1 " >>/etc/hosts
echo -e "192.168.56.12 node2 " >>/etc/hosts
sed -i 's/vault.centos.org/mirrors.aliyun.com/g' /etc/yum.repos.d/CentOS*
2. 所有节点安装PG14 <root执行>
sudo yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
sudo yum -y install epel-release yum-utils
sudo yum-config-manager --enable pgdg14
sudo yum install postgresql14-server postgresql14 -y
echo -e " export PATH=/usr/pgsql-14/bin/:$PATH" >>/var/lib/pgsql/.bash_profile
3. 所有节点安装repmgr<root执行>
## on all nodes
sudo yum -y install repmgr_14*
4. 关闭防火墙或者放行PG端口(默认5432)<root执行>
systemctl stop firewalld
5. 主节点创建repmgr账户 , pg_hba.cfg 以及postgresql.conf 参数调整 , repmgr.conf文件配置 <PG owner执行>
5.1 创建repmgr
createuser repmgr -P -s
createdb repmgr -O repmgr;
5.2 调整pg_hba.cfg 以及postgresql.conf参数
## -- postgresql.conf
max_wal_senders = 10
max_replication_slots = 10
wal_level = 'hot_standby'
hot_standby = on
archive_mode = on
archive_command = '/bin/true'
shared_preload_libraries = 'repmgr'
## -- pg_hba.cfg
--- 这里采用了trust模式,因此无需.pgpass
local replication repmgr trust
host replication repmgr 127.0.0.1/32 trust
host replication repmgr 192.168.56.11/32 trust
host replication repmgr 192.168.56.12/32 trust
local repmgr repmgr trust
host repmgr repmgr 127.0.0.1/32 trust
host repmgr repmgr 192.168.56.11/32 trust
host repmgr repmgr 192.168.56.12/32 trust
若有需要 .pgpass ,则需要如下配置:su - postgres
cd /var/lib/pgsql
touch .pgpass
chmod 0600 .pgpass
echo -e "192.168.56.11:5432:repmgr:repmgr:passwd_123456">>.pgpass
echo -e "192.168.56.12:5432:repmgr:repmgr:passwd_123456">>.pgpass
5.3 repmgr.conf文件配置
#-- 主库
node_id=1
node_name=node1
conninfo='host=node1 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/pgsql/14/data/'
#-- 从库
node_id=2
node_name=node2
conninfo='host=node2 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/pgsql/14/data/'
6. 启动主节点PG 并 注册为primary server <PG owner执行>
# -- 首先启动主库PG
sudo /usr/pgsql-14/bin/postgresql-14-setup initdb
sudo systemctl enable postgresql-14
sudo systemctl start postgresql-14
# -- 先模拟运行注册
su - postgres
repmgr -f /etc/repmgr/14/repmgr.conf primary register --dry-run
# -- 将上面错误全部解决后,执行注册
repmgr -f /etc/repmgr/14/repmgr.conf primary register
7. 清空备节点PG data目录, 搭建/clone备节点 <PG owner执行>
# - 确保从库PG以及PG 所有data目录是空的,如果不是,则手动删除即可
systemctl stop postgresql-14
rm -rf /var/lib/pgsql/14/data
8. 将备节点注册成standby server 并启动PG实例 <PG owner执行>
repmgr -f /etc/repmgr/14/repmgr.conf standby register --force
9. 校验集群信息
-bash-4.2$ repmgr cluster show
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.11 port=5432 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.11 port=5432 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=node2 port=5432 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.11 port=5432 fallback_application_name=repmgr options=-csearch_path="
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | host=192.168.56.11 port=5432 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | standby | running | node1 | default | 100 | 1 | host=node2 user=repmgr port=5432 dbname=repmgr connect_timeout=2
10. 常见错误
10.1 connection to server at "192.168.56.11", port 5432 failed: Connection refused
# -- 明明PG 实例都启动了,但是死活显示如题,最后排查到 默认情况下 postgresql只监听local,因此修改参数listen-addresses ='*'
-bash-4.2$ repmgr -f /etc/repmgr/14/repmgr.conf primary register --dry-run
INFO: connecting to primary database...
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.11 port=5432 fallback_application_name=repmgr options=-csearch_path="
ERROR: connection to database failed
DETAIL:
connection to server at "192.168.56.11", port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
DETAIL: attempted to connect using:
user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.11 port=5432 fallback_application_name=repmgr options=-csearch_path=
#-- solution 修改如下配置即可
查看 postgresql.conf
listen-addresses ='*'
10.2.ERROR: "repmgr" is not a superuser and no superuser name supplied
# 需要确保做repmgr的账户拥有超级权限
postgres=# \du
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------------------+-----------
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
repmgr | Superuser | {}
10.3. Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
# 系统级别的问题, 配置passwordless时,在节点node1上跑 ssh node2 date会直接拒绝
# - solution
修改如下 PasswordAuthentication = yes
PasswordAuthentication yes
10.4.网络有问题
-- 各种查看,发现没有问题
-bash-4.2$ repmgr -hnode1 -p5432 -U repmgr -d repmgr -f /etc/repmgr/14/repmgr.conf standby clone --dry-run
NOTICE: destination directory "/var/lib/pgsql/14/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=node1 port=5432 user=repmgr dbname=repmgr
ERROR: connection to database failed
DETAIL:
connection to server at "node1" (192.168.56.11), port 5432 failed: No route to host
Is the server running on that host and accepting TCP/IP connections?
-- 最后发现是防火墙需要关掉 - systemctl stop firewalld 或者通过firewall-cmd放行PG 实例端口