直面ODA(Oracle Database Appliance)RAC一体机

《Database Appliance并非Mini版的Exadata-还原真实的Oracle Unbreakable Database Appliance》一文中,我分享了一些ODA官方白皮书中了解到的技术细节。

实际接触到Oracle Database Appliance,这个原本号称mini版的Exadata一体机(后台被证实不是),还是在昨天。

 

IMG_0023

 

 

IMG_0022

 
IMG_0024

Exadata X2-2 vs EMC Greenplum DCA vs Netezza TwinFin 12主要配置大对比

下图列出了Oracle Exadata X2-2 vs EMC Greenplum DCA vs Netezza TwinFin 12 三种一体机的主要配置对比:

 

Slide:了解真实的Oracle Unbreakable Database Appliance

Database Appliance并非Mini版的Exadata-还原真实的Oracle Unbreakable Database Appliance

Oracle甲骨文系统有限公司在北京时间9月23日发布了一款Oracle数据库机即Oracle Database Appliance。Oracle Database Appliance是一款面向中小型企业的使用简单、用得起的高可用数据库专用服务器,该数据库机基于Sun Fire服务器、Oracle Enterprise Linux和Oracle Database Server 11g release 2 Enterprise Edition,构成一套双节点的Oracle Real Application Cluster集群以供使用。

 

Database Appliance的正面图

 

Oracle Unbreakable Database Appliance1

 

Database Appliance的后视图

 

Oracle Unbreakable Database Appliance2

 

Oracle Unbreakable Database Appliance的硬件采用4-RU的机架。包括以下这些硬件配置:

  • 共享磁盘
  • 24 SAS dual ported disk slots
  • 20 x 600GB
    • 12TB RAW, 4 TB usable
  • 4 x 73GB SSD (Flash Disks) for redo logs
  • 2服务器Host
  • 2 Socket x86 per server
  • 6 Cores per Socket, 12 cores per server
  • 96 GB of Memory per server
  • 2-12 cores per server enabled on demand
  • 网络硬件
  • Redundant 1Gb internal private network
  • 2x 10Gb public network per server
  • 6x 1Gb public network per server
  • 硬件冗余
  • Redundant power and cooling
  • Hot-serviceable components
  • Triple-mirrored storage

显然在即将举行的Oracle Open World 2011大会上Oracle Database Appliance将会成为热议的焦点。这里我们来细数一下Oracle Database Appliance这款数据库机产品的优势和弱点。

 

用户目前最为关心的恐怕还是Oracle Unbreakable Database Appliance为我们带来了那些特性和优势:

 

1.一套高度统一标准的平台

 

所有的Oracle Database Appliance数据库机均才采用一致的硬件配置、软件版本、硬件固件,这导致所有的客户手头所拥有的Database Appliance几乎没有什么差别,也就意味着用户在使用Database Appliance过程中所遇到的问题将是高度相似的;充分利用这一点,Database Appliance的用户可以分享全世界所有其他客户的使用经验,这让问题的迅速定位、解决变成可能。Database Appliance在配置上要比Exadata数据库一体机更为统一。

 

2.  一套整合的平台

 

Database Appliance中的每一个组件,包括共享存储、内联网络、数据库主机和Oracle企业版数据库服务器软件在出厂时均已经被整合好了。这同客户从多个厂商那里采购硬件时是不同,客户从DELL、IBM、HP那里订购服务器主机、从EMC、Netapp那里采购存储、从Cisco那里采购网络设备,当这些硬件设备全部到齐后需要一支具有丰富经验的IT team协作来部署Oracle RAC,这其中需要DBA、SA、Storage Admin多种职责的team成员合作并耗费较长的时间才能构建一套RAC集群系统。因为在这个复杂的配置过程中可能出现的网络配置错误或者存储问题可能几天也无法完成工作,更别提在系统今后的运维过程中可能留下的隐患。而如果用户采用Database Appliance这一整合的平台,就可以在很短的时间内完成部署投入使用。

 

3. 一套高可用的平台

 

Oracle Database Appliance 采用极为简单的系统架构,在此之下是Oracle数据库被久经考验的高可用方案。包括冗余的风扇和电源、2路冗余的网络以及使用ASM自动存储管理特性实现的三路存储冗余,此外还采用了2节点的RAC RAC集群系统,并具有在不同的地理位置部署2台Database Appliance以实现Oracle Data Guard数据保护的能力。基于客户的自身对可用性和容量的要求,还可以在2节点RAC集群的基础上部署Oracle RAC ONE Node。

 

4.一套可扩展的平台 – 灵活的CPU许可证(CPU Licensing)

 

Oracle Database Appliance 提供了在x86平台上首创的灵活CPU License许可证采购方式。通过Oracle Database Appliance,客户可以只购买当前需要的CPU数量的LICENSE;当系统有扩容需求时,客户可以购买额外的CPU使用许可证以便在现有系统上启用更多的CPU内核,最大可以扩展到24 cores。所有的Database Appliance采用Intel Xeon X5675作为主机CPU。

出厂的Oracle Database Appliance 最低启用4 cores ,最多启用满配的24 cores。

当需要为系统扩容启用更多的CPU时只需要:

  • 登录My Oracle Support
  • 向Oracle Support说明当前服务器的配置
  • 下载相关的License Key以便重新配置服务器
  • 通过命令行重新配置Database Applianced的BIOS并重启系统

 

5.一套管理简单的平台

 

Oracle Database Appliance附带的Appliance Manager 可以显著地简化appliance整个的安装流程。据某些用户介绍整个安装过程将不超过2个小时,很少需要人工参与,仅仅需要为SCAN Listener监听器分配必要的 IP和DNS域名解析服务,安装完成后用户即获得一套完善的2节点RAC集群系统,以及空的可立即投入使用的数据库DB。

 

Oracle官方宣传的Database Appliance安装概念:

Oracle Unbreakable Database Appliance3

 

Oracle Appliance Manager 可部署的模块包括:

  • Deploys OS, Oracle Appliance Manager, Grid Infrastructure & Database
  • Configures GI & RDBMS (Oracle Database)
  • Ensures correct configuration of disks & networks
  • Consistent implementation of known Best Practices
  • Configures optimal disk layout for ASM
  • Performs initial configuration of disks & ASM DG(s)

 

Oracle Appliance Manager 的存储管理模块包括:

  • Oracle Appliance Manager Daemon (oakd) is started during boot
  • Discovers storage subsystem
  • Tracks configuration by storing metadata
  • Monitors status of disks
  • Generates alerts on failures
  • Takes corrective action on appropriate events
  • Interacts with ASM for complete automation

 

Oracle Appliance Manager 的Patching补丁模块包括 :

  • Patching Module provides tools to patch OS, Oracle Application Manager Modules, Grid Infrastructure(GI), RDBMS
  • Provides a single interface and command to patch all the components including OS, firmware, BIOS , GI and RDBMS
  • Patching Module will update the repository to reflect the newly installed patches and firmware’s
  • Bundle Patches for all components that is to be patched.

 

Oracle Appliance Manager 的检验诊断工具包括:

  • A set of tools for validation & diagnostics
  • Validation tool provides detailed information about the components – both HW & SW
  • Diagget tool collects all the diagnostics information and can be used when experiencing problems.
  • Healthcheck can be used to check the health of OS, DB, Clusterware and other comet components to ensure they are healthy and functionally optimally.

 

使用Appliance Manager部署的示例流程:

 

Oracle Appliance Manager Configurator1

Oracle Appliance Manager Configurator2

Oracle Appliance Manager Configurator3

Oracle Appliance Manager Configurator4

Oracle Appliance Manager Configurator5

Oracle Appliance Manager Configurator6

 

就上述这些特性来看,Database Appliance对于中小型正在发展的企业还是有不小的诱惑力的。当然任何产品都有其弱势的地方,我们来说说Database Appliance的不足:

 

Oracle Unbreakable Database Appliance不是Mini版的Exadata!

 

在Oracle正式揭幕Database Appliance之前,当时还称之为( Comet – CPU Cores by Demand),有很多人预测Appliance将会是Mini版的Exadata,将继承数据库一体机的一些独有优势。

但是实际看来,Oracle Unbreakable Database Appliance除了和Exadata一样也叫数据库机外并没有太多的共同点。一个最大的区别在于Database Appliance没有使用Exadata Storage software,这一Exadata的核心组件。而仅仅是采用了最普通的Oracle Database 11g release 2企业版和Oracle Enterprise Linux的组合,Database Appliance没有Smart Scan、没有Storage Index、没有Hybrid Columnar Compression,这些令人惊艳的X特性,这恐怕是最最令人失落的一点!(想用Smart Scan?请先买Exadata的单!)

 

单系统扩展难

 

虽然Database Appliance提供了方便的CPU License扩展方案,但是一旦单台满配置的Database Appliance无法满足客户的要求时,要想在Database Appliance的基础上扩展几乎是不可能的。

 

无法客制化硬件

 

单一配置的Database Appliance没有为增加客制化的硬件留有余地,这意味着用户不可能为Appliance增加光纤HBA卡或者其他任何控制器。但是这一点也保证了Appliance的硬件配置是高度统一标准的。

 

 Oracle Unbreakable Database Appliance的售价

 

好了,看了上面一堆的评述你肯定对Database Appliance这玩样的售价抱着很大的好奇心,我们来看看Oracle Unbreakable Database Appliance的价格(仅供参考)。

 

Street Prices with 30% HW discount
Configuration HW List SW list System list 40% SW discount 60% SW discount 80% SW discount
EE – 4 cores 50000 95000 145000 92000 73000 54000
RAC One Node – 4 cores 50000 115000 165000 104000 81000 58000
RAC – 4 cores (2×2) 50000 141000 191000 119600 91400 63200
EE – 12 cores 50000 285000 335000 206000 149000 92000
RAC One Node – 12 cores 50000 345000 395000 242000 173000 104000
RAC –  12 cores (6×2) 50000 423000 473000 288800 204200 119600
EE – 24 cores  (12×2) 50000 570000 620000 377000 263000 149000
RAC One Node – 24 cores (12×2) 50000 690000 740000 449000 311000 173000
RAC – 24 cores (12×2) 50000 846000 896000 542600 373400 204200

 

以上可以看到实际Database Appliance的硬件报价(HW List)仅为50000美元,Oracle Database Server 11.2 EE + 4 cores的组合总报价为145000美元,最搞折扣情况下的总报价为54000美元。

 

再来看看Database Appliance的售后支持费用,以下列出了Database Appliance 一年的support报价:

 

Support Prices with 30% HW discount
Configuration HW Supt List SW Supt list Support list 40% SW discount 60% SW discount 80% SW discount
EE – 4 cores 6000 20900 26900 16740 12560 8380
RAC One Node – 4 cores 6000 25300 31300 19380 14320 9260
RAC – 4 cores (2×2) 6000 31020 37020 22812 16608 10404
EE – 12 cores 6000 62700 68700 41820 29280 16740
RAC One Node – 12 cores 6000 75900 81900 49740 34560 19380
RAC – 12 cores (6×2) 6000 93060 99060 60036 41424 22812
EE – 24 cores (12×2) 6000 125400 131400 79440 54360 29280
RAC One Node – 24 cores (12×2) 6000 151800 157800 95280 64920 34560
RAC – 24 cores (12×2) 6000 186120 192120 115872 78648 41424

 

 

Database Appliance或许是一步好棋

 

Exadata一直被Oracle宣称为其30年来最耀眼的明星,但显然这明星不是中小企业所能请得动的。Database Appliance的出现彻底补足了Oracle目前数据库机产品线的缺口。现在Oracle拥有低端的4 cores Appliance, 中端的 24 cores Appliance, 以及高端的Exadata X2-2和 X2-8。

Oracle的整个数据库一体机(Database Machine)战略以及其中的主要产品都已经付出水面,Oracle的硬件帝国建筑于SUN之上,然而风格却与SUN迥异。

 

Oracle Unbreakable Database Appliance

Exadata Database Machine Host的操作系统OS版本

之前有同事问我Exadata用的是什么操作系统这个问题?

最早Oracle与HP合作的Exadata V1采用的是Oracle Enterprise Linux,而Oracle-Sun Exadata V2则目前还仅提供OEL,但是已经通过了Solaris -11 Express在 Exadata V2上的测试, 所以很快Exadata V2将会有Solaris的选择。

目前现有的Exadata X2-2 和 X2-8 绝大多数采用2个OEL 5的小版本:

较早出厂的使用OEL 5.3
# cat /etc/enterprise-release
Enterprise Linux Enterprise Linux Server release 5.3 (Carthage)

近期出场的使用OEL 5.5

# cat /etc/enterprise-release
Enterprise Linux Enterprise Linux Server release 5.5 (Carthage)

# uname -a
Linux vrh1.us.oracle.com 2.6.18-128.1.16.0.1.el5 #1 SMP Tue x86_64 x86_64 x86_64 GNU/Linux

 

The IB should be one of the compatible cards specified in Note 888828.1
If you build a backup server machine it is best tro build as close a clone of the Exadata Compute nodes as you can get.
I.e. install OEL 5 Update 5 and one of the IB cards specified in the note and you will have the correct ofed versions and kernel
This will guarantee interoperabilty and correct operation with the kernel and ofed drivers
From the doc
InfiniBand OFED Software
Exadata Storage Servers and database servers will interoperate with different InfiniBand OFED software versions, however, Oracle recommends that all versions be the same unless performing a rolling upgrade. Review Note 1262380.1 for database server software and firmware guidelines.

InfiniBand HCA
Exadata Storage Servers and database servers will interoperate with different InfiniBand host channel adapter (HCA) firmware versions, however, Oracle recommends that all versions be the same unless performing a rolling upgrade. Review Note 1262380.1 for database server software and firmware guidelines.

For a complete list of the Oracle QDR Infinband adaptors see here:

http://www.oracle.com/technetwork/documentation/oracle-net-sec-hw-190016.html#infinibandadp

For the compute nodes all firmware updates must be done via the bundle patches descibed in Doc 888828.1
So I would advise upgrading to the latest supported bundel patch.

For you backup server choose the same model card that came with the X2 compute nodes.
Install Oracle Eterprise Linux Release 5 Update 5
Upgrade the firmware to the same firmware an on the X2 or higher if not already the same,

Database Machine and Exadata Storage Server 11g Release 2 (11.2) Supported Versions [ID 888828.1]

Exadata用户的困境

当然这是Oracle-Sun Exadata V2 数据库一体机的竞争者,Netezza公司制作的一个小视频,但并非空穴来风!

 

 

很多人告诉我有了Exadata这么智能的一体机后都可以不需要dba了,那么这是事实吗?

 

就我个人的体验,部署了Exadata之后DBA花在管理上的时间更多了。

 

Exadata上复杂的数据库升级步骤:

Infiniband Switch – > Exadata Storage Software – > Exadata Host Minimal Patch -> Grid Infrastructure – > Database

客户视角的Exadata 详细升级过程:

  1. Exadata Cells from 11.2.1.3.1 to 11.2.2.2.0
  2. Exadata database nodes from 11.2.1.3.1 to 11.2.2.2.0
  3. Applying a number of pre and post patching fixes (addendums to the standard patching instructions,probably included due to customer reported issue with eth patching
  4. Upgrade kernel on database nodes
  5. Upgrade GI Home to 11.2.0.2
  6. Upgrade ALL RDBMS Homes to 11.2.0.2
  7. Upgrade the database to 11.2.0.2
  8. Apply 11.2.0.2 BP2 to ALL Homes(Part of this need to be done manually ,as opatch auto doesn’t work when the installation has different GI and RDBMS owners)

 

升级所涉及的多层Component :

这升级步骤不是人写的,有木有?!

Exadata上的 Bundle Patch (Replaces recalled patch 8626760) 是可以recall 召回的有木有?!

#别去搜这个patch,说了recall了就是不会让你能找到了!

号称出厂即最优的配置,结果连简单的ulimit-memlock参数设置都有问题的情况,有木有?

 

昨天客户语重心长的告诉我,他们准备把Exadata V2 上的核心应用迁移走,客户在09年就开始用Exadata,是不是国内第一家我不知道,但至少应该是用于生产的第一批。但是这2年来因为Exadata折 腾了无数次,现在终于不想折腾了。实际是用户在测试Exadata时总是能看到其优点,但实际使用时只看到缺点了!

历数Exadata V2数据库一体机的几大致命缺点

  1. 软硬件一体,耦合过于紧密
  2. 升级操作十分复杂,客户看到升级流程后几乎绝望
  3. 国内使用经验贫乏
  4. support只有Oracle自己一家,没有第三方可选
  5. 就以往的表现来看美国支撑的X team并不给力。。。。。。
  6. 贵,Exadata很贵!具体了解其报价Exadata V2 Pricing

Booting Exadata!

Booting Exadata! It’s a joke here!.

exadata_boot

Warning:Even Exadata has a wrong memlock setting

事情要从大约2个月前的一起事故说起,有一套部署在Oracle-Sun Exadata V2 Database Machine上的4节点11.2.0.1 RAC数据库,其中一个节点的RAC关键后台进程LMS报ORA-00600[kjbmprlst:shadow]错误,随后LMS后台进程将该节点上的实例终止。其他节点上的CRS软件检测到该意外终止后,数据库进入全局资源的重新配置过程(Reconfiguration),Reconfiguration在所有剩余节点上都顺利完成了。

但是随后其中一个节点的告警日志中持续出现”Process W000 died, see its trace file”,似乎是实例无法得到分配新进程的必要资源,同时应用程序出现无法登陆该节点上实例的情况,本来4节点的RAC数据库,因为ORA-00600挂了一个,现在又有一个节点登不上,一下变得只剩下一半性能。

随后我赶到了问题现场,继续诊断问题,并发现了以下症状,在此一一列举:

1.尝试远程登录该实例,但是失败,出现ORA-12516 TNS:listener could not find available handler with matching protocol stack”错误。反复登录会出现以下信息:

Linux Error: 12: Cannot allocate memory
 Additional information: 1
 ORA-01034: ORACLE not available

 

2.确认过ORACLE_SID、ORACLE_HOME等多环境变量后使用”sqlplus / as sysdba”登录却返回”Connected to an idle instance.”(这一点最为蹊跷),无法以sysdba登录就无法收集必要的诊断信息,这个虽然可以通过gdb等手段做systemstate dump,但是暂时绕过

 

3. 后台进程W000由SMCO进程启动, SMCO进程的日志如下,所报状态为KSOSP_SPAWNED:

Process W000 is dead (pid=2648 req_ver=3812 cur_ver=3812 state=KSOSP_SPAWNED).
 *** 2011-07-08 02:44:32.971
 Process W000 is dead (pid=2650 req_ver=3813 cur_ver=3813 state=KSOSP_SPAWNED).

 

4. 确认组成instance的内存和后台进程均存活,且仍有日志产生

[oracle@maclean04 trace]$ ipcs -m

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 0          root      644        72         2
0x00000000 32769      root      644        16384      2
0x00000000 65538      root      644        280        2
0xac5ffd78 491524     oracle    660        4096       0
0x96c5992c 1409029    oracle    660        4096       0  

[oracle@maclean04 trace]$ ls -l /dev/shm
total 34839780
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_0
-rw-r----- 1 oracle oinstall         0 Jun  7 07:19 ora_maclean4_1409029_1
-rw-r----- 1 oracle oinstall         0 Jun  7 07:19 ora_maclean4_1409029_10
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_100
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_101
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_102
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_103
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_104
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_105
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_106
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_107
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_108
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_109
-rw-r----- 1 oracle oinstall         0 Jun  7 07:19 ora_maclean4_1409029_11
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_110
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_111
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_112
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_113
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_114
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_115
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_116
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_117
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_118
-rw-r----- 1 oracle oinstall 268435456 Jun  7 07:19 ora_maclean4_1409029_119
-rw-r----- 1 oracle oinstall         0 Jun  7 07:19 ora_maclean4_1409029_12
.......................

[oracle@maclean04 trace]$ ps -ef|grep ora_
oracle    5466     1  0 Jul03 ?        00:00:18 ora_pz99_maclean4
oracle   14842 10564  0 19:54 pts/9    00:00:00 grep ora_
oracle   18641     1  0 Jun08 ?        00:00:02 ora_q002_maclean4
oracle   23932     1  0 Jun07 ?        00:04:26 ora_pmon_maclean4
oracle   23934     1  0 Jun07 ?        00:00:06 ora_vktm_maclean4
oracle   23938     1  0 Jun07 ?        00:00:00 ora_gen0_maclean4
oracle   23940     1  0 Jun07 ?        00:00:06 ora_diag_maclean4
oracle   23942     1  0 Jun07 ?        00:00:00 ora_dbrm_maclean4
oracle   23944     1  0 Jun07 ?        00:01:01 ora_ping_maclean4
oracle   23946     1  0 Jun07 ?        00:00:16 ora_psp0_maclean4
oracle   23948     1  0 Jun07 ?        00:00:00 ora_acms_maclean4
oracle   23950     1  0 Jun07 ?        02:27:29 ora_dia0_maclean4
oracle   23952     1  0 Jun07 ?        01:19:42 ora_lmon_maclean4
oracle   23954     1  0 Jun07 ?        02:23:59 ora_lmd0_maclean4
oracle   23956     1  5 Jun07 ?        1-13:50:36 ora_lms0_maclean4
oracle   23960     1  4 Jun07 ?        1-12:44:25 ora_lms1_maclean4
oracle   23964     1  0 Jun07 ?        00:00:00 ora_rms0_maclean4
oracle   23966     1  0 Jun07 ?        00:00:00 ora_lmhb_maclean4
oracle   23968     1  0 Jun07 ?        01:58:35 ora_mman_maclean4
oracle   23970     1  0 Jun07 ?        06:28:39 ora_dbw0_maclean4
oracle   23972     1  0 Jun07 ?        06:27:08 ora_dbw1_maclean4
oracle   23974     1  2 Jun07 ?        16:49:56 ora_lgwr_maclean4
oracle   23976     1  0 Jun07 ?        00:20:48 ora_ckpt_maclean4
oracle   23978     1  0 Jun07 ?        00:07:03 ora_smon_maclean4
oracle   23980     1  0 Jun07 ?        00:00:00 ora_reco_maclean4
oracle   23982     1  0 Jun07 ?        00:00:00 ora_rbal_maclean4
oracle   23984     1  0 Jun07 ?        00:01:00 ora_asmb_maclean4
oracle   23986     1  0 Jun07 ?        00:08:15 ora_mmon_maclean4
oracle   23988     1  0 Jun07 ?        00:18:19 ora_mmnl_maclean4
oracle   23992     1  0 Jun07 ?        00:00:00 ora_d000_maclean4
oracle   23994     1  0 Jun07 ?        00:00:00 ora_s000_maclean4
oracle   23996     1  0 Jun07 ?        00:00:00 ora_mark_maclean4
oracle   24065     1  0 Jun07 ?        01:16:54 ora_lck0_maclean4
oracle   24067     1  0 Jun07 ?        00:00:00 ora_rsmn_maclean4
oracle   24079     1  0 Jun07 ?        00:01:02 ora_dskm_maclean4
oracle   24174     1  0 Jun07 ?        00:08:18 ora_arc0_maclean4
oracle   24188     1  0 Jun07 ?        00:08:19 ora_arc1_maclean4
oracle   24190     1  0 Jun07 ?        00:00:59 ora_arc2_maclean4
oracle   24192     1  0 Jun07 ?        00:08:12 ora_arc3_maclean4
oracle   24235     1  0 Jun07 ?        00:00:00 ora_gtx0_maclean4
oracle   24237     1  0 Jun07 ?        00:00:00 ora_rcbg_maclean4
oracle   24241     1  0 Jun07 ?        00:00:00 ora_qmnc_maclean4
oracle   24245     1  0 Jun07 ?        00:00:00 ora_q001_maclean4
oracle   24264     1  0 Jun07 ?        00:08:28 ora_cjq0_maclean4
oracle   25782     1  0 Jun07 ?        00:00:00 ora_smco_maclean4

 

5.确认在问题发生时系统中仍有大量的空闲内存且未发生大量的SWAP,此外/dev/shm共享内存目录仍有27G的空闲。

 

6.在其他节点上查询全局动态性能视图gv$resource_limit发现当前故障节点上的登录进程总数上限仅为404,并不多。

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn

SQL> select * from gv$resource_limit where inst_id=4;

RESOURCE_NAME                  CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION             LIMIT_VALUE
------------------------------ ------------------- --------------- ------------------------------ -------------
processes                                       50             404       1500                           1500
sessions                                        61             616       2272                           2272
enqueue_locks                                  849            1599      31062                          31062
enqueue_resources                              846            1007      15016                      UNLIMITED
ges_procs                                       47             399       1503                           1503
ges_ress                                     65943          109281      67416                      UNLIMITED
ges_locks                                    23448           37966      92350                      UNLIMITED
ges_cache_ress                                7347           14716          0                      UNLIMITED
ges_reg_msgs                                   337            5040       3730                      UNLIMITED
ges_big_msgs                                    26             502       3730                      UNLIMITED
ges_rsv_msgs                                     0               1       1000                           1000
gcs_resources                              2008435         2876561    3446548                        3446548
gcs_shadows                                1888276         2392064    3446548                        3446548
dml_locks                                        0               0       9996                      UNLIMITED
temporary_table_locks                            0              45  UNLIMITED                      UNLIMITED
transactions                                     0               0       2499                      UNLIMITED
branches                                         0               2       2499                      UNLIMITED
cmtcallbk                                        0               3       2499                      UNLIMITED
max_rollback_segments                          109             129       2499                          65535
sort_segment_locks                               0              14  UNLIMITED                      UNLIMITED
k2q_locks                                        0               2       4544                      UNLIMITED
max_shared_servers                               1               1  UNLIMITED                      UNLIMITED
parallel_max_servers                             1              19        160                           3600

 

7. Exadata节点系统内核参数文件sysctl.conf中的配置正确:

# Controls the maximum shared segment size, in bytes
kernel.shmmax = 68719476736

# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 4294967296

########### BEGIN DO NOT REMOVE Added by Oracle Exadata ###########
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
# bug 8311668 file-max and aio-max-nr
fs.file-max = 6815744
# DB install guide says the above
fs.aio-max-nr = 1048576
# 8976963
net.ipv4.neigh.bond0.locktime=0
net.ipv4.ip_local_port_range = 9000 65500
# DB install guide says the above
net.core.rmem_default = 4194304
net.core.wmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_max = 2097152
# The original DB deployment was net.core.wmem_max = 1048586 but IB works
# best for Exadata at the above net.core settings
# bug 8268393 remove vm.nr_hugepages = 2048
# bug 8778821 system reboots after 60 sec on panic
kernel.panic=60
########### END DO NOT REMOVE Added by Oracle Exadata ###########

########### BEGIN DO NOT REMOVE Added by Oracle Exadata ###########
kernel.shmmax = 64547735961
kernel.shmall = 15758724
########### END DO NOT REMOVE Added by Oracle Exadata ###########

 

8. 至此问题还是显得扑朔迷离,主要后台进程和SGA内存的完好,而且操作系统上也仍有大量空闲内存,实例上的资源也没有达到一个临界点。到底是什么造成了无法分配新进程!?

出于谨慎我最后还是检查了系统上的/etc/security/limits.conf参数文件,该参数文件控制了shell的一些ulimit的上限。因为Exadata一体机是由Oracle安装配置后直接交付使用的,我最初的认识是这些配置文件都毫无疑问都应当是最佳配置,遵循Oracle的Best Practices。

但是当我实际打开这个文件后我立即意识到这个配置有问题,似乎少了点什么,以下为该Exadata上的limits.conf文件:

########### BEGIN DO NOT REMOVE Added by Oracle Deployment Scripts ###########

oracle     soft    nproc       2047
oracle     hard    nproc       16384
oracle     soft    nofile      65536
oracle     hard    nofile      65536

########### END DO NOT REMOVE Added by Oracle Deployment Scripts ###########

显然上述limits.conf中缺少了对memlock参数的设置,在不设置memlock参数的情况下使用缺省的memlock为32,以下为Exadata host上的ulimit输出:

[oracle@maclean4 shm]$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 606208
max locked memory (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 2047
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

可以观察到这里的max locked memory确实是缺省的32,而Oracle所推荐的memlock参数却要远大于32。

在Oracle validated Configuration中经过验证的memlock推荐值为50000000,关于Oracle Validated Configuration详见拙作<Understand Oracle Validated Configurations>

[oracle@rh2 ~]$ cat /etc/security/limits.conf

# Oracle-Validated setting for nofile soft limit is 131072
oracle   soft   nofile    131072

# Oracle-Validated setting for nofile hard limit is 131072
oracle   hard   nofile    131072

# Oracle-Validated setting for nproc soft limit is 131072
oracle   soft   nproc    131072

# Oracle-Validated setting for nproc hard limit is 131072
oracle   hard   nproc    131072

# Oracle-Validated setting for core soft limit is unlimited
oracle   soft   core    unlimited

# Oracle-Validated setting for core hard limit is unlimited
oracle   hard   core    unlimited

# Oracle-Validated setting for memlock soft limit is 50000000
oracle   soft   memlock    50000000 

# Oracle-Validated setting for memlock hard limit is 50000000
oracle   hard   memlock    50000000

搜索Mos可以发现Note[Ora-27102: Out Of Memory: Linux Error: 12: Cannot Allocate Memory with LOCK_SGA=TRUE [ID 401077.1]:指出了因max locked memory过小可能引发Linux Error: 12: Cannot Allocate Memory内存无法分配的问题。

因为修改limits.conf配置文件对已经启动的实例是无效的,所以我们无法通过纠正参数来解决现有的问题。

实际我采用了释放一些资源的方法来workaround了这个问题,通过以下脚本将实例内的所有前台服务进程杀死以释放资源。

ps -ef|grep $SID|grep LOCAL=NO|grep -v grep| awk ‘{print $2}’|xargs kill -9

完成以上命令后出现了终端有点卡的现象,之后恢复正常。尝试使用sysdba本地和远程登录实例均成功,应用的链接也恢复正常。

虽然修复了问题,但是还需要和客户做详尽的说明。我在邮件中阐明了该Exadata一体机上配置文件存在的问题,并提出了几点建议:

1.要求Oracle Support确认该/etc/security/limits.conf中的配置是否合理,是否需要修改
2.设置vm.min_free_kbytes = 51200 内核参数,避免因空闲内存不足引起的性能问题
3.安装OSWatcher监控软件,监控必要的系统资源

客户对我的说法也比较信服,但还是将邮件抄送了原厂Exadata一体机的售前人员。

之后售前人员也曾联系过我,我也做了相同的说明。但原厂售前认为在Exadata一体机是在Oracle美国原厂进行配置安装的,在配置上肯定是最优的,而且该limits.conf中的memlock参数的当前值(32)和推荐值(50000000)之间有如此大的差距,他们认为美国原厂的部署人员不可能犯这么低级的错误。

所以实际他们对我对该memlock参数的说明持一种怀疑的态度,我的建议是就该memlock参数和MOS进行进一步的沟通,以确认该问题。当然这不是我需要完成的工作了。因为对该memlock参数存在分歧,所以短期内也没有修改该参数。

这个case就这样过去了,时间过得很快,转眼已经2个月了。恰巧最近有升级Exadata上数据库到11.2.0.2的项目,所以翻阅了相关patch的readme文档,因为升级RAC到11.2.0.2的前提是Exadata Storage Server Software、InfiniBand Switch Software Version软件版本能够兼容,所以查阅了其兼容列表:

Version Compatibility

The following table lists the Exadata Storage Server software versions that are compatible with each supported Oracle Database 11g Release 2 software version.

Oracle Database Software version Required Exadata Storage Server Software version
11g Release 2 (11.2.0.2.0) Patch Set 1 11.2.2.x
11g Release 2 (11.2.0.1.0) 11.2.2.x
11.2.1.x

The following table lists the InfiniBand Switch software versions that are compatible with each supported Exadata Storage Server software version.

Exadata Storage Server Software version Required InfiniBand Switch software version
11.2.2.2.2 and later Exadata Database Machine – Sun Datacenter InfiniBand Switch 36
Switch software version 1.1.3-2 or laterHP Oracle Database Machine – Voltaire ISR 9024D-M and ISR 9024D
Switch software 5.1.1 build ID 872 (ISR 9024D-M only)
Switch firmware 1.0.0 or higher
11.2.2.2.0 or earlier Exadata Database Machine – Sun Datacenter InfiniBand Switch 36
Switch software version 1.0.1-1 or laterHP Oracle Database Machine – Voltaire ISR 9024D-M and ISR 9024D
Switch software 5.1.1 build ID 872 (ISR 9024D-M only)
Switch firmware 1.0.0 or higher

 

为了将Exadata上的RAC数据库升级到11.2.0.2,首先要将Exadata Storage Server Software升级到11.2.2.x,Oracle官方目前推荐的版本是11.2.2.3.2。

所以随后我也翻阅了Exadata Storage Server Software 11.2.2.3.2 的update readme文档,即<Oracle Exadata Database Machine README for patch 12577723 (Support note 1323958.1)>。

该Patch的升级主要分成”Applying the Patch to Exadata Cells”和”Applying the Patch to the Database Server” 2个阶段,即不仅需要在Exadata Cell上实施补丁,还需要在Database节点上实施一个小补丁。

 

查看”Applying the Patch to the Database Server”章节可以发现存在这样一个步骤:

Repeat the following steps for each database host. If you are taking deployment-wide downtime for the patching, then these steps may be performed in parallel on all database hosts.

  1. Update the resource limits for the database and the grid users:
    Note:

    • This step does not apply if you have customized the values for your specific deployment and database requirements.
    WARNING:

    • Do not run this step if you have specific customized values in use for your deployment.
    1. Calculate 75% of the physical memory on the machine using the following command:.
      let -i x=($((`cat /proc/meminfo | grep 'MemTotal:' | awk '{print $2}'` * 3 / 4))); echo $x
    2. Edit the /etc/security/limits.conf file to update or add following limits for the database owner (orauser) and the grid infrastructure user (griduser). Your deployment may use the same operating system user for both and it may be named as oracle user. Adjust the following as needed.
      ########## BEGIN DO NOT REMOVE Added by Oracle ###########
      orauser     soft    core       unlimited
      orauser     hard    core       unlimited
      orauser     soft    nproc       131072
      orauser     hard    nproc       131072
      orauser     soft    nofile      131072
      orauser     hard    nofile      131072
      orauser     soft    memlock     <value of x from step 01.a>
      orauser     hard    memlock     <value of x from step 01.a>
      
      griduser     soft    core       unlimited
      griduser     hard    core       unlimited
      griduser     soft    nproc       131072
      griduser     hard    nproc       131072
      griduser     soft    nofile      131072
      griduser     hard    nofile      131072
      griduser     soft    memlock     <value of x from step 01.a>
      griduser     hard    memlock     <value of x from step 01.a>
      
      ########### END DO NOT REMOVE Added by Oracle ###########

 

以上可以看到在正式实施Patch to Database server前做了一个补救措施,那就是为oracle和grid用户添加memlock参数,这里的memlock参数是通过将/proc/meminfo中的MemTotal取75%获得,在<Exadata Server Hardware Details>中我列出了Exadata Database Host的一些硬件参数,其中总内存MemTotal一般为70GB(74027752 kB),换算过来74027752*75%=55520814,也就是说Oracle实际推荐在Exadata上使用的memlock参数应当为55520814,甚至要高于我之前所说的50000000的验证值。

至此该问题终于真相大白!而我们也可以从中学到很多东西:

1.首先我大胆的猜测,实际部署Sun Exadata Machine的因该是Oracle硬件部门,也就是以前Sun的部门。实际在部署过程中,部门与部门之间的充分交流是很重要的,而这里09年匆匆上线的Oracle-Sun Exadata V2显然没有做好,而直到2011 5月发布的Oracle Exadata Database Machine 11g Release 2 (11.2) 11.2.2.3.2 patch 12577723中才反应并解决了该问题

2.IT始终是以人为本,不管是多么高端的服务器、多么先进的技术,如果没有与之相匹配的人和团队来驾驭的话,那么至多只能发挥出50%的效益,在人员对先进技术极端不熟悉的情况下,智能化只是空谈!

Exadata. Are you ready?

Exadata的宣传视频,看着还挺酷的!

Take a fresh look at the Exadata Database Machine and the extreme performance it offers your enterprise.

沪ICP备14014813号-2

沪公网安备 31010802001379号