- Verify cable connections via the following steps
Visually inspect all cables for proper connectivity.
确认缆线链接正常
[root@dm01db01 ~]# cat /sys/class/net/ib0/carrier
1
[root@dm01db01 ~]# cat /sys/class/net/ib1/carrier
1
确认输出是1
检查这些命令,
ls -l /sys/class/infiniband/*/ports/*/*errors*
/opt/oracle.SupportTools/ibdiagtools 目录包含了verify_topology 和infinicheck工具 运行并确认网络。下面是这些工具的信息:
[root@dm01db01 ~]# cd /opt/oracle.SupportTools/
[root@dm01db01 oracle.SupportTools]# ls
asrexacheck defaultOSchoose.pl firstconf make_cellboot_usb PS4ES sys_dirs.tar
CheckHWnFWProfile diagnostics.iso flush_cache.sh MegaSAS.log reclaimdisks.sh
CheckSWProfile.sh em harden_passwords_reset_root_ssh ocrvothostd setup_ssh_eq.sh
dbserver_backup.sh exachk ibdiagtools onecommand sundiag.sh
[root@dm01db01 oracle.SupportTools]# cd ibdiagtools/
[root@dm01db01 ibdiagtools]# ls
cells_conntest.log dcli ibqueryerrors.log perf_cells.log0 perf_mesh.log1 subnet_cells.log VERSION_FILE
cells_user_equiv.log diagnostics.output infinicheck perf_cells.log1 perf_mesh.log2 subnet_hosts.log xmonib.sh
checkbadlinks.pl hosts_conntest.log monitord perf_cells.log2 README topologies
cleanup_remote.log hosts_user_equiv.log netcheck perf_hosts.log0 SampleOutputs.txt topology-zfs
clearcounters.log ibping_test netcheck_scratch perf_mesh.log0 setup-ssh verify-topology
[root@dm01db01 ibdiagtools]# ./verify-topology -h
[ DB Machine Infiniband Cabling Topology Verification Tool ]
[Version IBD VER 2.c 11.2.3.1.1 120607]
Usage: ./verify-topology [-v|–verbose] [-r|–reuse (cached maps)] [-m|–mapfile]
[-ibn|–ibnetdiscover (specify location of ibnetdiscover output)]
[-ibh|–ibhosts (specify location of ibhosts output)]
[-ibs|–ibswitches (specify location of ibswitches output)]
[-t|–topology [torus | quarterrack ] default is fattree]
[-a|–additional [interconnected_quarterrack]
[-factory|–factory non-exadata machines are treated as error]
Please note that halfrack is now redundant. Checks for Half Racks
are now done by default.
-t quarterrack
option is needed to be used only if testing on a stand alone quarterrack
-a interconnected_quarterrack
option is to be used only when testing on large multi-rack setups
-t fattree
option is the default option and not required to be specified
Example : perl ./verify-topology
Example : ././verify-topology -t quarterrack
Example : ././verify-topology -t torus
Example : ././verify-topology -a interconnected_quarterrack
——— Some Important properties of the fattree cabling topology————–
(1) Every internal switch must be connected to every external switch
(2) No 2 external switches must be connected to each other
——————————————————————————-
Please note that switch guid can be determined by logging in to a switch and
trying either of these commands, depending on availability –
>module-firmware show
OR
>opensm
[root@dm01db01 ibdiagtools]# ./verify-topology -t fattree
[ DB Machine Infiniband Cabling Topology Verification Tool ]
[Version IBD VER 2.c 11.2.3.1.1 120607]
External non-Exadata-image nodes found: check for ZFS if on T4-4 – else ignore
Leaf switch found: dmibsw03.acs.oracle.com (212846902ba0a0)
Spine switch found: 10.146.24.251 (2128469c74a0a0)
Leaf switch found: dmibsw02.acs.oracle.com (21284692d4a0a0)
Spine switch found: 10.146.24.252 (2128b7f744c0a0)
Spine switch found: dmibsw01.acs.oracle.com (21286cc7e2a0a0)
Spine switch found: 10.146.24.253 (2128b7ac44c0a0)
Found 2 leaf, 4 spine, 0 top spine switches
Check if all hosts have 2 CAs to different switches……………[SUCCESS]
Leaf switch check: cardinality and even distribution…………..[SUCCESS]
Spine switch check: Are any Exadata nodes connected …………..[SUCCESS]
Spine switch check: Any inter spine switch links………………[ERROR]
Spine switches 10.146.24.251 (2128469c74a0a0) & 10.146.24.252 (2128b7f744c0a0) should not be connected
[ERROR]
Spine switches 10.146.24.251 (2128469c74a0a0) & 10.146.24.253 (2128b7ac44c0a0) should not be connected
[ERROR]
Spine switches 10.146.24.252 (2128b7f744c0a0) & dmibsw01.acs.oracle.com (21286cc7e2a0a0) should not be connected
[ERROR]
Spine switches 10.146.24.252 (2128b7f744c0a0) & 10.146.24.253 (2128b7ac44c0a0) should not be connected
[ERROR]
Spine switches dmibsw01.acs.oracle.com (21286cc7e2a0a0) & 10.146.24.253 (2128b7ac44c0a0) should not be connected
Spine switch check: Any inter top-spine switch links…………..[SUCCESS]
Spine switch check: Correct number of spine-leaf links…………[ERROR]
Leaf switch dmibsw03.acs.oracle.com (212846902ba0a0) must be linked
to spine switch 10.146.24.252 (2128b7f744c0a0) with
at least 1 links…0 link(s) found
[ERROR]
Leaf switch dmibsw02.acs.oracle.com (21284692d4a0a0) must be linked
to spine switch 10.146.24.252 (2128b7f744c0a0) with
at least 1 links…0 link(s) found
[ERROR]
Spine switch 10.146.24.252 (2128b7f744c0a0) has fewer than 2 links to leaf switches.
It has 0
[ERROR]
Leaf switch dmibsw03.acs.oracle.com (212846902ba0a0) must be linked
to spine switch 10.146.24.253 (2128b7ac44c0a0) with
at least 1 links…0 link(s) found
[ERROR]
Leaf switch dmibsw02.acs.oracle.com (21284692d4a0a0) must be linked
to spine switch 10.146.24.253 (2128b7ac44c0a0) with
at least 1 links…0 link(s) found
[ERROR]
Spine switch 10.146.24.253 (2128b7ac44c0a0) has fewer than 2 links to leaf switches.
It has 0
Leaf switch check: Inter-leaf link check……………………..[ERROR]
Leaf switches dmibsw03.acs.oracle.com (212846902ba0a0) & dmibsw02.acs.oracle.com (21284692d4a0a0) have 0 links between them
They should have 7 links instead.
Leaf switch check: Correct number of leaf-spine links………….[SUCCESS]
确认硬件和固件
cd /opt/oracle.cellos/
[root@dm01db01 oracle.cellos]# ./CheckHWnFWProfile
[SUCCESS] The hardware and firmware profile matches one of the supported profiles
确认平台软件
[root@dm01db01 oracle.cellos]# cd /opt/oracle.SupportTools/
[root@dm01db01 oracle.SupportTools]# ./CheckSWProfile.sh
usage: ./CheckSWProfile.sh options
This script returns 0 when the platform and software on the
machine on which it runs matches one of the suppored platform and
software profiles. It will return nonzero value in all other cases.
The check is applicable both to Exadata Cells and Database Nodes
with Oracle Enterprise Linux (OEL) and RedHat Enterprise Linux (RHEL).
OPTIONS:
-h Show this message
-s Show supported platforms and software profiles for this machine
-c Check this machine for supported platform and software profiles
-I <No space comma separated list of Infiniband switch names/ip addresses>
To check configuration for SPINE switch prefix the switch host name or
ip address with IS_SPINE.
Example: CheckSWProfile.sh -I IS_SPINEswitch1.company.com,switch2.company.com
Check for the software revision on the managed Infiniband switches
in the Database Machine. You will need to supply the password for
admin user.
-S <No space comma separated list of Infiniband switch names/ip addresses>
Example: CheckSWProfile.sh -S switch1.company.com,switch2.company.com
Prints the Serial number and Hardware version for the switches
in the Database Machine. You will need to supply the password for
admin user for Voltaire switches and root user for Sun switches.
[root@dm01db01 oracle.SupportTools]# ./CheckSWProfile.sh -c
[INFO] Software checker check option is only available on Exadata cells.
[root@dm01db01 oracle.SupportTools]# ssh dm01cel01-priv
[root@dm01cel01 oracle.SupportTools]# ./CheckSWProfile.sh -c
[INFO] SUCCESS: Meets requirements of operating platform and InfiniBand software.
[INFO] Check does NOT verify correctness of configuration for installed software.
[root@dm01cel01 oracle.SupportTools]# cd /opt/oracle.cellos/
[root@dm01cel01 oracle.cellos]# ./CheckHWnFWProfile
[SUCCESS] The hardware and firmware profile matches one of the supported profiles
If hardware is replaced, rerun the /opt/oracle.cellos/CheckHWnFWProfile script.
Comment