转自:http://blog.sina.com.cn/s/blog_773523db0101054f.html
--1
执行长时间查询时报错。 skytf=> SELECT
count(id) from skytf.tbl_info;
ERROR: canceling
statement due to conflict with recovery
DETAIL: User query might have needed to see row
versions that must be removed.
备注: 表
"skytf.tbl_info" 是个大表,光数据就有 12G,这个统计SQL 正常情况下需要2分钟左右完成, 但每次执行到一会儿是,抛出以上错误。根据错误信 息,初步估计当在从库上执行查询时,与主库发生了冲突。
--2 网上GOOGLE
,信息如下
Long running queries on the
standby are a bit tricky, because they
might need to see row versions that are already removed on the
master.
备注: 意思是说,长时间SQL如果跑在 standby 节点上可以说是一个笑话,因为
standby 节点有可能需要读取主库上被 removed
的数据。
--3 解决方法,修改参数
修改参数,设置成以下值, max_standby_streaming_delay = 300
s;
max_standby_streaming_delay (integer)
When Hot Standby is active, this parameter determines how long the
standby server should wait before canceling standby queries that
conflict with about-to-be-applied WAL entries, as described in
Section 25.5.2. max_standby_streaming_delay applies when WAL data
is being received via streaming replication. The default is 30
seconds. Units are milliseconds if not specified. A value of -1
allows the standby to wait forever for conflicting queries to
complete. This parameter can only be set in the postgresql.conf
file or on the server command line.
Note that max_standby_streaming_delay is not the same as the
maximum length of time a query can run before cancellation; rather
it is the maximum total time allowed to apply WAL data once it has
been received from the primary server. Thus, if one query has
resulted in significant delay, subsequent conflicting queries will
have much less grace time until the standby server has caught up
again.
备注:上面的解释很好理解:当在 Standby 提供应用时,如果 Standby 节点上的
SQL 与接收主库日志发生冲突时,这个参数决定了从库等侍这个查询的时间,默认值为 30 s, 难怪,刚才的统计SQL,执行时间估计在二分钟左右,从而被 Standby 库主动 Cancel 了。也可以将这个参数设置成 -1. 表示
standby 节点永远等侍这个查询,这无疑是有风险的,如果这个查询不结束,那么从库一直处于与主库的中断状态,不会同步主库数据,而会一直等从库这个SQL执行完成, 这里将参数设置成 300s ,是经过了与开发人员的沟通后确定的一个值。
--4
再次执行统计SQL skytf=> select
count(*) from tbl_info;
count
----------
88123735
(1 row)
Time: 131068.569 ms
备注:这回终于可以执行了,这个SQL花了 二分钟多,低于5分钟。
--5
其它建议
Another option is to increase vacuum_defer_cleanup_age on the primary
server, so that dead rows will not be cleaned up as quickly as they
normally would be. This will allow more time for queries to execute
before they are cancelled on the standby, without having to set a
high max_standby_streaming_delay. However it is difficult to
guarantee any specific execution-time window with this approach,
since vacuum_defer_cleanup_age is measured in transactions executed
on the primary server.
备注:上面这段话来自手册上的,也是针对从库与主库可能产生冲突时的建议方法,可以设置参数vacuum_defer_cleanup_age, 由于这个参数是以事务数来确定的,在实际操作时很难操作,故不采设置这个参数的方法。
--6
总结
PostgreSQL 的 Hot Standby
是个好东西,但用从库的时候也要注意,用得不好从库可能拒绝提供服务。
|