背景:协助客户做验证,客户使用的是RHEL7.6环境,我这边是OEL7.6环境,开始以为区别不大,结果acfs兼容还是遇到问题,特此记录下。
现象:asmca图形没有acfs相关内容,无法使用acfs。
起初以为是个简单的问题,之前也遇到因为bug导致类似现象,结果这次应用最新的RU补丁依然不行。
[grid@db193 ~]$ lsmod|grep oracle
这里依然没有结果显示,再次尝试安装还是报错当前OS版本不被支持:
[root@db193 bin]# pwd /u01/app/19.3.0/grid/bin [root@db193 bin]# ./acfsroot install ACFS-9459: ADVM/ACFS is not supported on this OS version: 'EL7'
此时是非常奇怪的,客户环境RHEL7.6虽然遇到一些异常,但最起码是可以安装使用的,难道有什么区别吗?
通过MOS搜索ACFS支持的OS平台:
起初从列表中可以确认有一些bug 27494830 等,但是目前环境已经应用最新的RU,而且也查了这些bug,都已经应用过补丁:
[grid@db193 ~]$ $ORACLE_HOME/OPatch/opatch lsinventory |grep 27494830 22162072, 27494830, 27917085, 28064731, 28293236, 28321248, 28375150
再次仔细看MOS文章时,发现支持的OS版本实际和我目前环境是有区别的:
All Updates, 4.14.35-1902 and later UEK 4.14.35 kernels
查了下,这个实际对应的是 OEL 7.7,而我这个是OEL 7.6,所以确实不支持。。
[grid@db193 ~]$ acfsdriverstate -orahome $ORACLE_HOME supported ACFS-9459: ADVM/ACFS is not supported on this OS version: 'EL7' ACFS-9201: Not Supported
那客户RHEL 7.6 为何就支持呢?两者差异在哪里?
实际上我们知道OEL有两个内核可选择,一个是UEK内核,另一个就是兼容红帽的RHCK内核,而我的环境默认是UEK内核,很遗憾这个内核对应的7.6版本就是不支持ACFS的。
可是测试任务重,不可能升级/重新安装系统,于是想到是否可以切换到RHCK内核呢?因为根据列表看到RHEL 7.6就是支持的版本:
Update 6 3.10.0-957 and later 3.10.0 Red Hat Compatible kernels
所以尝试更改内核,根据MOS文档:
有些步骤在我的环境是不需要的,我这边实际测试只需如下步骤:
--Oracle Linux切换uek内核到rhck内核 [root@db195 ~]# uname -a Linux db195 4.14.35-1818.3.3.el7uek.x86_64 #2 SMP Mon Sep 24 14:45:01 PDT 2018 x86_64 x86_64 x86_64 GNU/Linux [root@db195 ~]# awk -F\' '$1=="menuentry " {print i++ " : " $2}' /etc/grub2.cfg 0 : Oracle Linux Server (4.14.35-1818.3.3.el7uek.x86_64 with Unbreakable Enterprise Kernel) 7.6 1 : Oracle Linux Server (3.10.0-957.el7.x86_64 with Linux) 7.6 2 : Oracle Linux Server (0-rescue-06634a96d9af4acdaa83c9227d61a7f3 with Linux) 7.6 [root@db195 ~]# grub2-set-default 1 [root@db195 ~]# grub2-mkconfig -o /boot/grub2/grub.cfg Generating grub configuration file ... Found linux image: /boot/vmlinuz-4.14.35-1818.3.3.el7uek.x86_64 Found initrd image: /boot/initramfs-4.14.35-1818.3.3.el7uek.x86_64.img Found linux image: /boot/vmlinuz-3.10.0-957.el7.x86_64 Found initrd image: /boot/initramfs-3.10.0-957.el7.x86_64.img Found linux image: /boot/vmlinuz-0-rescue-06634a96d9af4acdaa83c9227d61a7f3 Found initrd image: /boot/initramfs-0-rescue-06634a96d9af4acdaa83c9227d61a7f3.img done [root@db195 ~]# reboot [root@db195 ~]# uname -a Linux db195 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 1 00:13:43 PDT 2018 x86_64 x86_64 x86_64 GNU/Linux
更换RHCK内核后,再次查询acfs是否支持:
[root@db195 ~]# su - grid 上一次登录:二 9月 14 00:57:30 CST 2021 [grid@db195 ~]$ cd $ORACLE_HOME/bin [grid@db195 bin]$ ./acfsdriverstate -orahome $ORACLE_HOME supported ACFS-9200: Supported
终于支持了,此时再次检查ACFS的Modules并尝试安装成功:
[root@db193 bin]# lsmod|grep oracle [root@db193 bin]# cd /u01/app/19.3.0/grid/bin [root@db193 bin]# ./acfsroot install ACFS-9300: ADVM/ACFS distribution files found. ACFS-9314: Removing previous ADVM/ACFS installation. ACFS-9315: Previous ADVM/ACFS components successfully removed. ACFS-9294: updating file /etc/sysconfig/oracledrivers.conf ACFS-9307: Installing requested ADVM/ACFS software. ACFS-9294: updating file /etc/sysconfig/oracledrivers.conf ACFS-9308: Loading installed ADVM/ACFS drivers. ACFS-9321: Creating udev for ADVM/ACFS. ACFS-9323: Creating module dependencies - this may take some time. ACFS-9154: Loading 'oracleoks.ko' driver. ACFS-9154: Loading 'oracleadvm.ko' driver. ACFS-9154: Loading 'oracleacfs.ko' driver. ACFS-9327: Verifying ADVM/ACFS devices. ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'. ACFS-9156: Detecting control device '/dev/ofsctl'. ACFS-9309: ADVM/ACFS installation correctness verified. [root@db193 bin]# lsmod|grep oracle oracleacfs 5184608 0 oracleadvm 1163390 0 oracleoks 757134 2 oracleacfs,oracleadvm [root@db193 bin]#
在所有节点都安装后,然后查看状态:
[grid@db193 ~]$ crsctl stat res -t -init -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.asm 1 ONLINE ONLINE db193 Started,STABLE ora.cluster_interconnect.haip 1 ONLINE ONLINE db193 STABLE ora.crf 1 ONLINE ONLINE db193 STABLE ora.crsd 1 ONLINE ONLINE db193 STABLE ora.cssd 1 ONLINE ONLINE db193 STABLE ora.cssdmonitor 1 ONLINE ONLINE db193 STABLE ora.ctssd 1 ONLINE ONLINE db193 ACTIVE:0,STABLE ora.diskmon 1 OFFLINE OFFLINE STABLE ora.evmd 1 ONLINE ONLINE db193 STABLE ora.gipcd 1 ONLINE ONLINE db193 STABLE ora.gpnpd 1 ONLINE ONLINE db193 STABLE ora.mdnsd 1 ONLINE ONLINE db193 STABLE ora.storage 1 ONLINE ONLINE db193 STABLE --------------------------------------------------------------------------------
此时发现依然没有acfs的资源,尝试asmca创建试试看:最后执行脚本有问题,尝试手工启动依然有问题:
[root@db193 bin]# /u01/app/19.3.0/grid/bin/srvctl start filesystem -d /dev/asm/oggsou-85 PRCA-1138 : 无法启动一个或多个文件系统资源: Not all ADVM/ACFS drivers have been loaded. CRS-2674: Start of 'ora.data.oggsou.acfs' on 'db195' failed Not all ADVM/ACFS drivers have been loaded. CRS-2674: Start of 'ora.data.oggsou.acfs' on 'db193' failed
尝试添加acfs资源,acfsroot enable:
[root@db193 bin]# cd /u01/app/19.3.0/grid/bin/ [root@db193 bin]# ./acfsroot enable ACFS-9376: Adding ADVM/ACFS drivers resource succeeded. CRS-2672: Attempting to start 'ora.drivers.acfs' on 'db193' CRS-2676: Start of 'ora.drivers.acfs' on 'db193' succeeded ACFS-9380: Starting ADVM/ACFS drivers resource succeeded.
此时再次查询发现ora.drivers.acfs已经有了。
再次尝试启动filesystem成功:
[root@db193 bin]# /u01/app/19.3.0/grid/bin/srvctl start filesystem -d /dev/asm/oggsou-85
再次查询acfs资源,已经正常mount成功:
[grid@db193 ~]$ crsctl stat res -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.DATA.OGGSOU.advm ONLINE ONLINE db193 STABLE ONLINE ONLINE db195 STABLE ora.data.oggsou.acfs ONLINE ONLINE db193 mounted on /oggsou,S TABLE ONLINE ONLINE db195 mounted on /oggsou,S TABLE
最后reboot两个机器,验证是否acfs可以开机自动启动,验证结果是OK的,因为之前协助解决问题时,是参考之前经验加了服务启动项,实际发现这种正常操作下来后并不需要配置启动项,去查询也是没有的:
[root@db193 system]# pwd /etc/systemd/system [root@db193 system]# ls -lrth 总用量 16K drwxr-xr-x. 2 root root 44 7月 16 2019 system-update.target.wants drwxr-xr-x. 2 root root 32 7月 16 2019 getty.target.wants drwxr-xr-x. 2 root root 87 7月 16 2019 default.target.wants drwxr-xr-x. 2 root root 35 7月 16 2019 local-fs.target.wants drwxr-xr-x. 2 root root 38 7月 16 2019 dev-virtio\x2dports-org.qemu.guest_agent.0.device.wants drwxr-xr-x. 2 root root 57 7月 16 2019 basic.target.wants lrwxrwxrwx. 1 root root 37 7月 16 2019 default.target -> /lib/systemd/system/multi-user.target drwxr-xr-x. 2 root root 51 7月 30 2019 sockets.target.wants drwxr-xr-x. 2 root root 31 7月 30 2019 remote-fs.target.wants drwxr-xr-x. 2 root root 4.0K 9月 9 2019 sysinit.target.wants drwxr-xr-x 2 root root 34 9月 13 17:17 oracle-ohasd.service.d -rw-r--r-- 1 root root 699 9月 13 17:17 oracle-ohasd.service -rw-r--r-- 1 root root 452 9月 13 17:22 oracle-tfa.service drwxr-xr-x. 2 root root 4.0K 9月 13 17:22 multi-user.target.wants drwxr-xr-x 2 root root 60 9月 13 17:22 graphical.target.wants [root@db193 system]#
实际操作下来遇到的知识点还是蛮多,看来有些新版本的东西还是要实际动手验证下,不能只凭历史经验,也是应了那句老话:纸上得来终觉浅,绝知此事要躬行。