XX移动网管中心CMNET三期扩容T5120*4+ST2530*1安装。安装系统完成后配置probe-based IPMP后,首次网卡切换测试,拔出e1000g0网线后,浮动IP自动切换到e1000g1,再插入e1000g0网线后,浮动IP无法切回,e1000g0网卡显示failed,重启系统后,系统始终检测到第二块网卡failed,导致ipmp无法切换。
Jul 29 16:31:42 HB01-DNS-SV02 in.mpathd[249]: NIC failure detected on e1000g1 of group sc_ipmp0
Jul 29 16:31:42 HB01-DNS-SV02 in.mpathd[240]: Successfully failed over from NIC e1000g1 to NIC e1000g0
root@HB01-DNS-SV02 # ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 192.168.100.6 netmask ffffff00 broadcast 192.168.100.255
groupname sc_ipmp0
ether 0:21:28:3a:fa:36
e1000g0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2
inet 192.168.100.5 netmask ffffff00 broadcast 192.168.100.255
e1000g1: flags=19040803<UP,BROADCAST,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,FAILED> mtu 1500 index 3
inet 192.168.100.4 netmask ffffff00 broadcast 192.168.100.255
groupname sc_ipmp0
bash-3.00# more /etc/host*
/etc/hostname.e1000g0
::::::::::::::
HB01-DNS-SV02 netmask + broadcast + group sc_ipmp0 up /
addif HB01-DNS-SV02-e1000g0-test netmask + broadcast + deprecated -failover up
::::::::::::::
/etc/hostname.e1000g1
::::::::::::::
HB01-DNS-SV02-e1000g1-test netmask + broadcast + group sc_ipmp0 deprecated -failover up
::::::::::::::
/etc/hosts
::::::::::::::
#
# Internet host table
#
::1 localhost
127.0.0.1 localhost
192.168.100.6 HB01-DNS-SV02 HB01-DNS-SV02. loghost
192.168.100.4 HB01-DNS-SV02-e1000g0-test
192.168.100.5 HB01-DNS-SV02-e1000g1-test
检查local-mac-address?=true
使用e1000g2和e1000g3配置ipmp后,故障依旧,系统也检测到e1000g3为failed,同时另外3台T5120也有同样的问题。因此排除了主机网卡故障
找用户要了两台cisco2950 switch和dplink hub,将主机两个网卡连接switch或hub上,自己笔记本设置网关,然后做切换测试,问题依然存在,排除了网关故障
分析有可能是ipmp的配置或者有bug导致
之前配置probe-based IPMP基于IP的OSI模型第三层网络层,需ping通网关.在solairs10之后推出了link-base d IPMP基于OSI模型第二层链路层,无需测试IP,也无需ping通网关.建议大家采用link-based IPMP,即可节约IP资源,也可省去很多麻烦事。
root@HB01-DNS-SV02# more /etc/host*
::::::::::::::
/etc/hostname.e1000g0
::::::::::::::
HB01-DNS-SV02 netmask + broadcast + group ipmp_group0 up
::::::::::::::
/etc/hostname.e1000g1
::::::::::::::
HB01-DNS-SV02-e1000g1-test netmask + broadcast + group ipmp_group0 up
::::::::::::::
/etc/hosts
::::::::::::::
#
# Internet host table
#
::1 localhost
127.0.0.1 localhost
192.168.100.6 HB01-DNS-SV02 HB01-DNS-SV02. loghost
#192.168.100.4 HB01-DNS-SV02-e1000g0-test
192.168.100.5 HB01-DNS-SV02-e1000g1-test
root@HB01-DNS-SV02 # ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 192.168.100.6 netmask ffffff00 broadcast 192.168.100.255
groupname ipmp_group0
ether 0:21:28:3a:fa:34
e1000g1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
inet 192.168.100.5 netmask ffffff00 broadcast 192.168.100.255
groupname ipmp_group0
ether 0:21:28:3a:fa:35
后经查资料发现此为solaris 10 u8的一个bug ,bug id为271519 Solaris 10 Kernel Patches 141444-09 and 141445-09 May CauseInterfaceFailure in IP Multipathing (IPMP)
经测试打EIS 2.2.4 2010.06可解决此故障。
建议大家采用link-based IPMP,link-based IPMP支持以下网卡:
Solaris OS:
hme
eri
ce
ge
bge
qfe
dmfe