现象:
ADG无法切换:验证时就报错 ORA-16467
记录问题,顺便展现一次troubleshooting的心路历程。
具体查询:
在主库操作,
@primary
切换验证:
alter database switchover to demorac verify;
报错ORA-16467:
SQL> alter database switchover to demorac verify; alter database switchover to demorac verify * ERROR at line 1: ORA-16467: switchover target is not synchronized
主库alert告警日志:
ORA-16467 signalled during: alter database switchover to demorac verify...
主库传输链路并没有报错:
SQL> select error from v$archive_dest where dest_id= 2; ERROR -----------------------------------------------------------------
但是,如果去查v$archive_dest_status,就会发现问题,说有可解决的GAP:
SQL> select dest_id, status, gap_status from v$archive_dest_status where dest_id = 2; DEST_ID STATUS GAP_STATUS ---------- --------- ------------------------ 2 VALID RESOLVABLE GAP
但是去备库查询,
@standby,
ADG并没有任何延迟:
SQL> select * from v$dataguard_stats; SOURCE_DBID SOURCE_DB_ NAME VALUE UNIT TIME_COMPUTED DATUM_TIME CON_ID ----------- ---------- ---------------------- ---------------------- ------------------------------ ------------------------------ ------------------------------ ---------- 0 transport lag +00 00:00:00 day(2) to second(0) interval 05/11/2023 18:16:43 05/11/2023 18:16:41 0 0 apply lag +00 00:00:00 day(2) to second(0) interval 05/11/2023 18:16:43 05/11/2023 18:16:41 0 0 apply finish time day(2) to second(3) interval 05/11/2023 18:16:43 0 0 estimated startup time 18 second 05/11/2023 18:16:43 0
查MOS文档资料,有一个bug:
可是很快也排除掉:现象细节不完全匹配,另外这个bug正好在19.16已经修复:
[oracle@bogon ~]$ $ORACLE_HOME/OPatch/opatch lsinventory|grep 33663444 29353271, 29706141, 26352569, 33663444, 30476768, 30092280, 30843271
但这个文档给的一些解释中也得到了一些启发:
REDISCOVERY INFORMATION: SELECT GAP_STATUS FROM V$ARCHIVE_DEST_STATUS with show unresolvable gap. Workaround temporarily re-enable the disabled redo thread
首先,我们查V$ARCHIVE_DEST_STATUS已经确认是resolvable gap,不匹配但是也有问题。
然后redo thread的re-enable,这个workaround让我去想到查询redo的thread,结果发现果然有些异常:
SQL> select * from v$log; GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM NEXT_CHANGE# NEXT_TIME CON_ID ---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------- ------------ --------- ---------- 1 1 1132 209715200 512 1 NO CURRENT 44911942 11-MAY-23 9.2954E+18 0 2 1 1130 209715200 512 1 YES INACTIVE 44901958 11-MAY-23 44911931 11-MAY-23 0 3 1 1131 209715200 512 1 YES INACTIVE 44911931 11-MAY-23 44911942 11-MAY-23 0 4 2 0 104857600 512 1 YES UNUSED 0 0 0 5 2 0 104857600 512 1 YES UNUSED 0 0 0
我这里目前主库是单实例,而备库才是RAC,可是,为何主库的redo居然会有 thread#=2 的redo?
虽然都是unused,但出现在这里就很奇怪!
不知道谁动了这个环境,想不起来做过这样的测试。但可以肯定的是,完全可以把这个不该存在的thread删除掉!
直接删除会报错:
alter database drop logfile group 4; alter database drop logfile group 5; SQL> alter database drop logfile group 4; alter database drop logfile group 4 * ERROR at line 1: ORA-01567: dropping log 4 would leave less than 2 log files for instance UNNAMED_INSTANCE_2 (thread 2) ORA-00312: online log 4 thread 2: '/flash/fast_recovery_area/DEMO/onlinelog/o1_mf_4_kxn73zny_.log' SQL> alter database drop logfile group 5; alter database drop logfile group 5 * ERROR at line 1: ORA-01567: dropping log 5 would leave less than 2 log files for instance UNNAMED_INSTANCE_2 (thread 2) ORA-00312: online log 5 thread 2: '/flash/fast_recovery_area/DEMO/onlinelog/o1_mf_5_kxn73zqd_.log'
删除不掉就尝试去先禁用掉这个没有用的thread 2,然后再次尝试删除:
alter database disable thread 2;
SQL> alter database disable thread 2; Database altered. SQL> select * from v$log; GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM NEXT_CHANGE# NEXT_TIME CON_ID ---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------- ------------ --------- ---------- 1 1 1132 209715200 512 1 NO CURRENT 44911942 11-MAY-23 9.2954E+18 0 2 1 1130 209715200 512 1 YES INACTIVE 44901958 11-MAY-23 44911931 11-MAY-23 0 3 1 1131 209715200 512 1 YES INACTIVE 44911931 11-MAY-23 44911942 11-MAY-23 0 4 2 0 104857600 512 1 YES UNUSED 0 0 0 5 2 0 104857600 512 1 YES UNUSED 0 0 0 SQL> alter database drop logfile group 4; Database altered. SQL> alter database drop logfile group 5; Database altered. SQL> SQL> SQL> select * from v$log; GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM NEXT_CHANGE# NEXT_TIME CON_ID ---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------- ------------ --------- ---------- 1 1 1132 209715200 512 1 NO CURRENT 44911942 11-MAY-23 9.2954E+18 0 2 1 1130 209715200 512 1 YES INACTIVE 44901958 11-MAY-23 44911931 11-MAY-23 0 3 1 1131 209715200 512 1 YES INACTIVE 44911931 11-MAY-23 44911942 11-MAY-23 0
删除成功!这次想来总该可以了吧?
SQL> alter database switchover to demorac verify; alter database switchover to demorac verify * ERROR at line 1: ORA-16467: switchover target is not synchronized
额,还是不行,但感觉再刺激刺激就OK了,继续尝试之前的十八般武艺(主要还是多切几次日志):
SQL> select dest_id, status, gap_status from v$archive_dest_status where dest_id = 2; DEST_ID STATUS GAP_STATUS ---------- --------- ------------------------ 2 VALID RESOLVABLE GAP SQL> SQL> alter system switch logfile; System altered. SQL> select dest_id, status, gap_status from v$archive_dest_status where dest_id = 2; DEST_ID STATUS GAP_STATUS ---------- --------- ------------------------ 2 VALID RESOLVABLE GAP SQL> alter system archive log current; System altered. SQL> select dest_id, status, gap_status from v$archive_dest_status where dest_id = 2; DEST_ID STATUS GAP_STATUS ---------- --------- ------------------------ 2 VALID RESOLVABLE GAP SQL> alter system switch logfile; alter system switch logfile; System altered. SQL> System altered. SQL> SQL> SQL> SQL> alter system switch logfile; System altered. SQL> select dest_id, status, gap_status from v$archive_dest_status where dest_id = 2; DEST_ID STATUS GAP_STATUS ---------- --------- ------------------------ 2 VALID NO GAP SQL>
哎呀,直接提示没有GAP了,赶紧尝试继续切换。。
SQL> alter database switchover to demorac verify; alter database switchover to demorac verify * ERROR at line 1: ORA-16475: succeeded with warnings, check alert log for more details SQL> !date Thu May 11 18:36:31 CST 2023
额?成功了,但是有警告,看告警日志又是一个新的ORA-16475 错误。
2023-05-11T18:36:14.906996+08:00 alter database switchover to demorac verify 2023-05-11T18:36:15.094516+08:00 SWITCHOVER VERIFY: Send VERIFY request to switchover target DEMORAC SWITCHOVER VERIFY WARNING: switchover target temporary files are not the same with the primary. See switchover target's alert log for details. ORA-16475 signalled during: alter database switchover to demorac verify...
哎呀,去看看备库的alert日志:
2023-05-11T18:41:29.407685+08:00 SWITCHOVER VERIFY BEGIN SWITCHOVER VERIFY WARNING: primary database has 5 temporary files, this database has 4 temporary files. More temp files should be added to this database. SWITCHOVER VERIFY COMPLETE
注意这个时间不太一致是因为两个机器时间不一样(差了5分钟),可以先不用管。
看问题本身是说临时文件,这无所谓,切换后可以自动创建。
快快来一把久违的切换吧!
--主库执行成功 alter database switchover to demorac; --新主库demorac alter database open; --新备库demo startup recover managed standby database disconnect;
具体ADG切换参考:
嗯,终于OK了,也感觉肚子饿了,去点餐了。