167

记一次ubuntu环境网络故障排查

乐果   发表于   2025 年 05 月 23 日 标签:ubuntu

最近在集成百度人脸识别算法linux离线sdk,公司采购了一台arm架构(瑞芯微RK3588芯片)的工控机。

厂商预装了ubuntu22.04系统,在准备植入程序调试时发现网络存在问题。

通过ping命令发现跟局域网与外网都不通(报文无法出网),查看系统路由表果然发现了诡异:

root@localhost:/data# ip route
default via 192.168.1.1 dev eth0 proto dhcp src 192.168.1.58 metric 202
default via 192.168.1.1 dev eth0 proto dhcp metric 20100
169.254.0.0/16 dev eth0 scope link metric 1000
192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.24 metric 100
192.168.1.0/24 dev eth0 proto dhcp scope link src 192.168.1.58 metric 202

可以看出当前系统路由表存在以下特点和潜在问题:

重复默认路由问题

系统存在两条默认路由(default via 192.168.1.1),区别在于:

  • 第一条:metric 202,指定了源地址src 192.168.1.58
  • 第二条:metric 20100,无源地址指定

这种重复配置可能导致路由选择冲突,通常应保留metric值较小的有效路由17。

IP地址冲突迹象

路由表中出现两个不同源地址:

192.168.1.24(proto kernel)
192.168.1.58(proto dhcp)

查看ip发现有两个:

ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 40:62:31:31:16:35 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.24/24 brd 192.168.1.255 scope global dynamic noprefixroute eth0
       valid_lft 86354sec preferred_lft 86354sec
    inet 192.168.1.58/24 brd 192.168.1.255 scope global secondary noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::40cc:4ee3:f3a2:db1b/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
3: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 40:62:31:31:16:36 brd ff:ff:ff:ff:ff:ff
4: eth2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 40:62:31:31:16:37 brd ff:ff:ff:ff:ff:ff
5: eth3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 40:62:31:31:16:38 brd ff:ff:ff:ff:ff:ff

因为是带桌面的操作系统,所以以为是NetworkManagersystemd-networkd冲突导致的,但最终排出发现实际上不是,而是因为dhcpcddhclient同时使用导致的。

最终关闭dhcpcd解决了问题:

update-rc.d dhcpcd disable
service dhcpcd stop

NetworkManager

一般来说,NetworkManagersystemd-networkd建议只使用其中一个。

因此如果是桌面ubuntu系统,建议启用NetworkManager后就关闭systemd-networkd服务,避免两个ip管理服务之间冲突。

networkctl list 命令查看 systemd-networkd服务有没有启用,如下则说明没有启用。

root@localhost:~# networkctl list
IDX LINK TYPE     OPERATIONAL SETUP
  1 lo   loopback n/a         unmanaged
  2 eth0 ether    n/a         unmanaged
  3 eth1 ether    n/a         unmanaged
  4 eth2 ether    n/a         unmanaged
  5 eth3 ether    n/a         unmanaged

也可以systemctl status命令查看服务运行状态:

root@localhost:~# systemctl status NetworkManager systemd-networkd
● NetworkManager.service - Network Manager
     Loaded: loaded (/lib/systemd/system/NetworkManager.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2025-05-22 21:17:11 CDT; 2min 19s ago
       Docs: man:NetworkManager(8)
   Main PID: 175204 (NetworkManager)
        CPU: 1.355s
     CGroup: /system.slice/NetworkManager.service
             └─175204 /usr/sbin/NetworkManager --no-daemon

May 22 21:17:11 localhost.localdomain NetworkManager[175204]: <info>  [1747966631.3388] device (eth0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
May 22 21:17:11 localhost.localdomain NetworkManager[175204]: <info>  [1747966631.3392] policy: set-hostname: set hostname to 'localhost.localdomain' (no hostname found)
May 22 21:17:11 localhost.localdomain NetworkManager[175204]: <info>  [1747966631.3393] device (eth0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
May 22 21:17:11 localhost.localdomain NetworkManager[175204]: <info>  [1747966631.3396] manager: NetworkManager state is now CONNECTED_LOCAL
May 22 21:17:11 localhost.localdomain NetworkManager[175204]: <info>  [1747966631.3400] manager: NetworkManager state is now CONNECTED_SITE
May 22 21:17:11 localhost.localdomain NetworkManager[175204]: <info>  [1747966631.3401] policy: set 'netplan-eth0' (eth0) as default for IPv4 routing and DNS
May 22 21:17:11 localhost.localdomain NetworkManager[175204]: <info>  [1747966631.3408] device (eth0): Activation: successful, device activated.
May 22 21:17:17 localhost.localdomain NetworkManager[175204]: <info>  [1747966637.2713] manager: startup complete
May 22 21:17:17 localhost.localdomain NetworkManager[175204]: <info>  [1747966637.8349] manager: NetworkManager state is now CONNECTED_GLOBAL
May 22 21:17:18 localhost.localdomain NetworkManager[175204]: <info>  [1747966638.4699] policy: set-hostname: set hostname to 'localhost.localdomain' (no hostname found)

● systemd-networkd.service - Network Configuration
     Loaded: loaded (/lib/systemd/system/systemd-networkd.service; disabled; vendor preset: enabled)
     Active: active (running) since Thu 2025-05-22 21:16:04 CDT; 3min 26s ago
TriggeredBy: ● systemd-networkd.socket
       Docs: man:systemd-networkd.service(8)
   Main PID: 174807 (systemd-network)
     Status: "Processing requests..."
        CPU: 68ms
     CGroup: /system.slice/systemd-networkd.service
             └─174807 /lib/systemd/systemd-networkd

May 22 21:16:04 localhost.localdomain systemd-networkd[174807]: eth2: Link UP
May 22 21:16:04 localhost.localdomain systemd-networkd[174807]: eth1: Link UP
May 22 21:16:04 localhost.localdomain systemd-networkd[174807]: eth0: Link UP
May 22 21:16:04 localhost.localdomain systemd-networkd[174807]: eth0: Gained carrier
May 22 21:16:04 localhost.localdomain systemd-networkd[174807]: lo: Link UP
May 22 21:16:04 localhost.localdomain systemd-networkd[174807]: lo: Gained carrier
May 22 21:16:04 localhost.localdomain systemd-networkd[174807]: eth0: Gained IPv6LL
May 22 21:16:04 localhost.localdomain systemd-networkd[174807]: Enumeration completed
May 22 21:16:04 localhost.localdomain systemd[1]: Started Network Configuration.
May 22 21:17:12 localhost.localdomain systemd-networkd[174807]: eth0: Gained IPv6LL
root@localhost:~# systemctl stop systemd-networkd
Warning: Stopping systemd-networkd.service, but it can still be activated by:
  systemd-networkd.socket

systemd-networkd 关闭:

systemctl stop systemd-networkd
systemctl stop systemd-networkd.socket
systemctl disable systemd-networkd

nmcli

nmcli命令是NetworkManager服务的命令行管理命令。

netplan

ubuntu系统从18.04版本以后,就不再用/etc/network/interface的配置来管理网卡ip了,而改用netplan服务来管理网卡配置。

例如,编辑/etc/netplan/01-network-manager-all.yaml 配置文件:

network:
  version: 2
  renderer: NetworkManager
  #renderer: networkd
  ethernets:
    eth0:
      dhcp4: true
      #addresses: [192.168.100.160/24]
      #gateway4: 192.168.100.1
      #nameservers:
      #addresses: [114.114.114.114, 8.8.8.8]
    eth1:
      dhcp4: true
      #addresses: [192.168.100.161/24]
      #gateway4: 192.168.100.1
      #nameservers:
      #addresses: [114.114.114.114, 8.8.8.8]
    eth2:
      dhcp4: true
      #addresses: [192.168.100.162/24]
      #gateway4: 192.168.100.1
      #nameservers:
      #addresses: [114.114.114.114, 8.8.8.8]
    eth3:
      dhcp4: false
      addresses: [192.168.100.163/24]
      routes:
        - to: 192.168.100.1
          via: 192.168.100.1
      nameservers:
        addresses: [114.114.114.114, 8.8.8.8]

编辑后,使用netplan applynetplan --debug apply命令让配置生效,使用netplan generate命令检查配置文件是否正确。

使用netplan status命令查看配置生效状态:

root@localhost:~# netplan status
     Online state: offline
    DNS Addresses: 127.0.0.53 (stub)
       DNS Search: .

●  1: lo ethernet UNKNOWN/UP (unmanaged)
      MAC Address: 00:00:00:00:00:00
        Addresses: 127.0.0.1/8
                   ::1/128
           Routes: ::1 metric 256

●  2: eth0 ethernet UP (NetworkManager: eth0)
      MAC Address: 40:62:31:31:16:35
        Addresses: 192.168.1.24/24
                   fe80::4262:31ff:fe31:1635/64 (link)
           Routes: default via 192.168.1.1 metric 100 (dhcp)
                   169.254.0.0/16 metric 1000 (boot, link)
                   192.168.1.0/24 from 192.168.1.24 metric 100 (link)
                   fe80::/64 metric 256
                   ff00::/8 metric 256 (multicast)

3 inactive interfaces hidden. Use "--all" to show all.

乐果   发表于   2025 年 05 月 23 日 标签:ubuntu

0

文章评论