1、 Oracle Linux 7和Redhat Linux 7:/var/tmp/.oracle中的socket文件被删除
Oracle Database – Enterprise Edition – 版本 11.2.0.4 和更高版本
Linux x86-64
症状
Oracle Linux 7和Redhat Linux 7:/var/tmp/.oracle中的socket文件被神秘删除.
更改
无
原因
Oracle Linux 7和Redhat Linux 7都有一个内核服务systemd-tmpfiles-clean.service,由systemd管理并删除临时位置的文件。
上述服务删除:
- 在/tmp 中的文件/目录超过10天没有访问的(在tmp.conf中定义)
- 在/var/tmp中的文件/目录超过30天没有访问的(在tmp.conf中定义)
通过检查文件/目录的所有atime/mtime/ctime来确定“没有访问”。
解决方案
排除套接字文件被内核服务systemd-tmpfiles-clean.service删除
要排除tmp目录中的套接字文件被tempfile clean服务删除,请更改/usr/lib/tmpfiles.d/tmp.conf的内容并添加
x /tmp/.oracle*
x /var/tmp/.oracle*
x /usr/tmp/.oracle*
上面的“x”选项指示systemd-tmpfiles-clean.service排除列出目录中的文件。
注意:目录/var/tmp/.oracle包含许多“特殊”套接字文件,本地客户端使用这些文件通过IPC协议(sqlnet)连接到各种Oracle进程,包括TNS监听器,CSS,CRS和EVM守护进程甚至是数据库或ASM实例。在Clusterware运行时删除套接字文件时 ,会出现Doc ID 391790.1的症状
ALERT: Setting RemoveIPC=yes on Redhat 7.2 and higher Crashes ASM and Database Instances as Well as Any Application That Uses a Shared Memory Segment (SHM) or Semaphores (SEM) (Doc ID 2081410.1)
ontroled by the option RemoveIPC in the /etc/systemd/logind.conf configuration file,
see man logind.conf(5) for details.
The default value for RemoveIPC in RHEL7.2 and higher is yes.
As a result, when the last oracle or grid user disconnects, the OS removes shared memory segments and semaphores for those users.
As Oracle ASM and Databases use shared memory segments for SGA, removing shared memory segments will crash the Oracle ASM and database instances.
Please refer to the Redhat bug 1264533 – https://bugzilla.redhat.com/show_bug.cgi?id=1264533
OCCURRENCE
The problem affects all applications including Oracle Databases that use the shared memory segments and semaphores; thus, both, Oracle ASM and database instances are affected.
Oracle Linux 7.2 avoids this problem by setting RemoveIPC to no explicitly on /etc/systemd/logind.conf configuration file,
but if /etc/systemd/logind.conf is touched or modified before the upgrade started, the yum/update will write the correct/new configuration file (with RemoveIPC=no) as logind.conf.rpmnew,
and if user retains their original configuration file, then most likely the failures described in this note will occur.
To avoid this problem, after the upgrade be sure to edit the logind.conf and set RemoveIPC=no. This is documented in the Oracle Linux 7.2 release notes.
SYMPTOMS
1) Installing 11.2 and 12c GI/CRS fails, because ASM crashes towards the end of the installation.
2) Upgrading to 11.2 and 12c GI/CRS fails.
3) After Redhat Linux is upgraded to 7.2 and higher, 11.2 and 12c ASM and database instances crash.
The removal of the IPC objects by systemd-logind may happen at any time, as such the failure patterns can vary greatly, here are some examples of how failures may look like:
ORA-27157: OS post/wait facility removed
ORA-27300: OS system dependent operation:semop failed with status: 43
ORA-27301: OS failure message: Identifier removed
ORA-27302: failure occurred at: sskgpwwait1
KFOD-00313: No ASM instances available. CSS group services were successfully initilized by kgxgncin
KFOD-00105: Could not open pfile ‘init@.ora’
Creation of ASM password file failed. Following error occurred: Error in Process: $GRID_HOME/bin/orapwdEnter password for SYS:
OPW-00009: Could not establish connection to Automatic Storage Management instance
2015/11/20 21:38:45 CLSRSC-184: Configuration of ASM failed
2015/11/20 21:38:46 CLSRSC-258: Failed to configure and start ASM
Nov 20 21:38:43 testc201 kernel: traps: oracle[24861] trap divide error
ip:3896db8 sp:7ffef1de3c40 error:0 in oracle[400000+ef57000]
WORKAROUND
1) Set RemoveIPC=no in /etc/systemd/logind.conf
2) Reboot the server or restart systemd-logind as follows:
# systemctl daemon-reload
# systemctl restart systemd-logind
PATCHES
Migrating to Oracle Linux 7.2 and higher from Redhat 7.2 and higher resolves this problem.
If migrating to Oracle Linux 7.2 is not possible, please use the above workaround by setting RemoveIPC=no in /etc/systemd/logind.conf
Comment