XX省移动CMNET三期扩容项目遇到IPMP BUG解决过程

    技术2022-05-19  21

    1.故障现象

    XX移动网管中心CMNET三期扩容T5120*4+ST2530*1安装。安装系统完成后配置probe-based IPMP后,首次网卡切换测试,拔出e1000g0网线后,浮动IP自动切换到e1000g1,再插入e1000g0网线后,浮动IP无法切回,e1000g0网卡显示failed,重启系统后,系统始终检测到第二块网卡failed,导致ipmp无法切换。

     

    Jul 29 16:31:42 HB01-DNS-SV02 in.mpathd[249]: NIC failure detected on e1000g1 of group sc_ipmp0

    Jul 29 16:31:42 HB01-DNS-SV02 in.mpathd[240]: Successfully failed over from NIC e1000g1 to NIC e1000g0

    root@HB01-DNS-SV02 # ifconfig -a

    lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1

            inet 127.0.0.1 netmask ff000000

    e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2

            inet 192.168.100.6 netmask ffffff00 broadcast 192.168.100.255

            groupname sc_ipmp0

            ether 0:21:28:3a:fa:36

    e1000g0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2

            inet 192.168.100.5 netmask ffffff00 broadcast 192.168.100.255

    e1000g1: flags=19040803<UP,BROADCAST,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,FAILED> mtu 1500 index 3

            inet 192.168.100.4 netmask ffffff00 broadcast 192.168.100.255

            groupname sc_ipmp0

    bash-3.00# more /etc/host*

    /etc/hostname.e1000g0

    ::::::::::::::

    HB01-DNS-SV02 netmask + broadcast + group sc_ipmp0 up /

    addif HB01-DNS-SV02-e1000g0-test netmask + broadcast + deprecated -failover up

    ::::::::::::::

    /etc/hostname.e1000g1

    ::::::::::::::

    HB01-DNS-SV02-e1000g1-test netmask + broadcast + group sc_ipmp0 deprecated -failover up

    ::::::::::::::

    /etc/hosts

    ::::::::::::::

    #

    # Internet host table

    #

    ::1     localhost      

    127.0.0.1       localhost      

    192.168.100.6   HB01-DNS-SV02   HB01-DNS-SV02.  loghost

    192.168.100.4   HB01-DNS-SV02-e1000g0-test     

    192.168.100.5   HB01-DNS-SV02-e1000g1-test

     

    2 解决过程

    2.1排除硬件故障

    2.1.1排除网卡故障

    检查local-mac-address?=true

    使用e1000g2e1000g3配置ipmp后,故障依旧,系统也检测到e1000g3failed,同时另外3T5120也有同样的问题。因此排除了主机网卡故障

    2.1.2排除网关故障

       找用户要了两台cisco2950 switchdplink hub,将主机两个网卡连接switchhub上,自己笔记本设置网关,然后做切换测试,问题依然存在,排除了网关故障

     分析有可能是ipmp的配置或者有bug导致

    2.2解决方案

    2.2.1配置基于link-based IPMP

    之前配置probe-based IPMP基于IPOSI模型第三层网络层,ping通网关.solairs10之后推出了link-base d IPMP基于OSI模型第二层链路层,无需测试IP,也无需ping通网关.建议大家采用link-based IPMP,即可节约IP资源,也可省去很多麻烦事。

     

    root@HB01-DNS-SV02# more /etc/host*

    ::::::::::::::

    /etc/hostname.e1000g0

    ::::::::::::::

    HB01-DNS-SV02 netmask + broadcast + group ipmp_group0 up

    ::::::::::::::

    /etc/hostname.e1000g1

    ::::::::::::::

    HB01-DNS-SV02-e1000g1-test netmask + broadcast + group ipmp_group0 up

    ::::::::::::::

    /etc/hosts

    ::::::::::::::

    #

    # Internet host table

    #

    ::1     localhost      

    127.0.0.1       localhost      

    192.168.100.6   HB01-DNS-SV02   HB01-DNS-SV02.  loghost

    #192.168.100.4   HB01-DNS-SV02-e1000g0-test     

    192.168.100.5   HB01-DNS-SV02-e1000g1-test

    root@HB01-DNS-SV02 # ifconfig -a

    lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1

            inet 127.0.0.1 netmask ff000000

    e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2

            inet 192.168.100.6 netmask ffffff00 broadcast 192.168.100.255

            groupname ipmp_group0

            ether 0:21:28:3a:fa:34

    e1000g1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3

            inet 192.168.100.5 netmask ffffff00 broadcast 192.168.100.255

            groupname ipmp_group0

            ether 0:21:28:3a:fa:35

    2.2.2此为系统BUG,升级到142900-02以后

    后经查资料发现此为solaris 10 u8的一个bug bug id271519 Solaris 10 Kernel Patches 141444-09 and 141445-09 May CauseInterfaceFailure in IP Multipathing (IPMP)

    经测试打EIS 2.2.4 2010.06可解决此故障。

    3 总结

    建议大家采用link-based IPMPlink-based IPMP支持以下网卡:

    Solaris OS:

     hme

     eri

    ce

     ge

    bge

     qfe

    dmfe


    最新回复(0)