墨墨导读:为了及时共享行业案例,通知共性问题,达成共享和提前预防,我们整理和编辑了《云和恩墨技术通讯》,通过对过去一段时间的知识回顾,故障归纳,以期提供有价值的信息供大家参考。同时,我们也希望能够将热点事件、新的产品特性及其他有价值的信息聚集起来,为您提供具有前瞻性的支持信息,保持对于当前最新的数据库新闻和事件的了解,其中包括重要数据库产品发布、警报、更新、新版本、补丁等。 数据技术嘉年华,十周年盛大开启,点我立即报名!大会以“自研·智能·新基建——云和数据促创新 生态融合新十年” 为主题,相邀数据英雄,总结过往十年历程与成绩,展望未来十年趋势与目标!近60场演讲,大咖云集,李飞飞、苏光牛、林晓斌、黄东旭...,快来pick你喜欢的嘉宾主题吧! https://www./doc/6459 (复制到浏览器中打开或者点击文末左下角“阅读原文”立即下载)这里推荐一个常见的问题,希望对大家有借鉴作用。 故障:HAIP在两个私网网卡上发生互换,导致ASM实例启动失败-罗杨杰 2020-09-21T17:32:58.644321+08:00
No connectivity to other instances in the cluster during startup. Hence, LMON is terminating the instance. Please check the LMON trace file for details. Also, please check the network logs of this instance along with clusterwide network health for problems and then re-start this instance.
LMON (ospid: 40805): terminating the instance due to ORA error 481
Cause - 'Instance is being terminated by LMON'
Dumping the Process Summary
1: PSEUDO process
2: PMON ospid 40751 sid 463 ser 40704, waiting for 'pmon timer' (DEAD)
…
13: PMAN ospid 40797 sid 234 ser 18733, waiting for 'pman timer' (DEAD)
14: DIA0 ospid 40801 sid 466 ser 32316, waiting for 'DIAG idle wait' (DEAD)
15: LMON ospid 40805 sid 698 ser 47006, … 看得出来ASM实例都是空闲等待事件。综上,ASM实例被异常终止是由于关键进程LMON出现异常。 .....
2020-09-21 17:25:01.016 [OCSSD(35350)]CRS-1601: CSSD Reconfiguration complete. Active nodes are issdb1 issdb2 .
.....
2020-09-21 17:27:59.992 [ORAAGENT(36381)]CRS-5011: Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/u01/app/oracle/diag/crs/issdb1/crs/trace/crsd_oraagent_grid.trc"
2020-09-21 17:27:59.994 [ORAAGENT(36381)]CRS-5011: Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/u01/app/oracle/diag/crs/issdb1/crs/trace/crsd_oraagent_grid.trc"
ens3f1:
inet 169.254.17.120/20 brd 169.254.31.255 scope global ens3f1:1
ens2f1:
inet 169.254.15.85/20 brd 169.254.15.255 scope global ens2f1:1
ens3f1:
inet 169.254.4.9/20 brd 169.254.15.255 scope global ens3f1:1
ens2f1:
inet 169.254.27.4/20 brd 169.254.31.255 scope global ens2f1:2
169.254.0.0 0.0.0.0 255.255.240.0 U 0 0 0 ens2f1
169.254.16.0 0.0.0.0 255.255.240.0 U 0 0 0 ens3f1
169.254.0.0 0.0.0.0 255.255.240.0 U 0 0 0 ens3f1
169.254.16.0 0.0.0.0 255.255.240.0 U 0 0 0 ens2f1 发现两个HAIP所在网卡的目标路径不一致,两个HAIP的地址在两个私网网卡上发生了互换,导致HAIP不通。这个时候两个节点的ASM实例无法进行通信,也就出现了前面提到的LMON进程发出终止实例的现象。 grep "f1: NIC Link is" messages-20200920
Sep 15 10:08:10 issdb1 kernel: i40e 0000:37:00.1 ens2f1: NIC Link is Down
Sep 15 10:08:31 issdb1 kernel: i40e 0000:37:00.1 ens2f1: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Sep 15 10:08:33 issdb1 kernel: i40e 0000:13:00.1 ens3f1: NIC Link is Down
Sep 15 10:08:58 issdb1 kernel: i40e 0000:13:00.1 ens3f1: NIC Link is Up, 10 Gbps Full
Bug 29379299 - On ODA The HAIP IP Addresses Are Swapped Between the Network Interfaces (Doc ID 29379299.8)
[grid@xxxxx2 trace]$ opatch lspatches
31335188;TOMCAT RELEASE UPDATE 19.0.0.0.0 (31335188)
31281355;Database Release Update : 19.8.0.0.200714 (31281355)
31304218;ACFS RELEASE UPDATE 19.8.0.0.0 (31304218)
31305087;OCW RELEASE UPDATE 19.8.0.0.0 (31305087)
OPatch succeeded.
[root@xxxx2 olr]# /u01/app/grid/product/19.0.0/grid/bin/crsctl stop crs
[root@xxxx2 olr]# /u01/app/grid/product/19.0.0/grid/bin/crsctl start crs
|
|