I have assembled my desktop PC about 2 years ago. It’s fairly beefy (AMD Ryzen 9 3950X 16-Core Processor, 128Go RAM, nVidia RTX 3080 Ti). It’s running debian stable.

Once in a while (not that often, but like every 2 weeks or so), seemingly at random times, not especially under heavy loads, the system crash and freeze, irresponsive to even the linux sysrq magic keys. I never manage to find what was the cause. One interesting fact is that when it happens, for some reason it seems to “freeze my network” too, ie, other (ethernet) devices on my local network have no connectivity anymore. They’re all connected to the same router, but not through this crashing PC. Connectivity comes back as soon as I force shutdown the crashing PC.

What can cause this and how could I fix these freezes?

  • Test_Tickles@lemmynsfw.com
    link
    fedilink
    English
    arrow-up
    5
    ·
    10 months ago

    Your network card is hammering the network with packets and not taking a break. It doesn’t give the rest of they network a chance to talk.
    If this was a windows machine, I would start by reinstalling the network driver, but I don’t know Linux well enough to say.

    • nicocool84@sh.itjust.worksOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      Oh this would explain why it kills the connectivity of all ethernet-connected devices. The ethernet interface is the one on the mobo. Drivers are included by the linux kernel AFAIK. The problem persisted across 2 debian versions so I am not sure re-installing drivers would do anything here. But thanks for the plausible explanation about the network issue!

      • d3Xt3r@lemmy.nz
        link
        fedilink
        arrow-up
        1
        ·
        10 months ago

        You won’t have much luck with doing anything to the driver part of it, but you could try a custom kernel. There’s two advantages to that, one is it would be more recent than whatever kernel that Debian is using, and the second is the optimized networking stack, which speeds up processing of packets and improves the congestion handling algorithm. I’d recommend the Xanmod kernel for this: https://xanmod.org/

        Alternatively, if we suspect your network is the culprit then the solution could be as simple a buying a new card and disabling the builtin one.

        • nicocool84@sh.itjust.worksOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          10 months ago

          I like my debian vanilla but thanks for the suggestion. The other network card would be interesting to try out. I don’t really suspect the network card, since I have no idea whether the network block is a consequence or a cause here.

  • longshanks197@kbin.social
    link
    fedilink
    arrow-up
    3
    ·
    10 months ago

    I can only offer some additional troubleshooting steps.

    1. Your network connection is fairly simple so I would suggest you take NM (NetworkManager) out of the equation and setup your network device manually to see if that eliminates your issue. This goes back to the comment (@despotic_machine) and log listing the p2p and wireless interfaces. Seems like the NM may be trying to setup your wifi interfaces. Though looking at the log you provided, it seems NM sees the wireless interface, identifies that it is not connected, and sets it to inactive. So, there may not be an issue. I had issues with NM many years ago on a laptop and preferred wicd; however, it seems that development has stalled on wicd. Regardless, I do not run NetworkManager at all on my desktop (just isc-dhcp-client and entry in /etc/networks/interfaces) since it is not roaming (plugged into a switch). It seems you don’t even need to uninstall anything, just setup the network manually and NM should leave the interface alone. If you want it to be clean, make sure NM is not running, or purge it from the system and setup your networking manually. The assumption of manual setup is based on the debian wiki:

    https://wiki.debian.org/NetworkManager#Wired_Networks_are_Unmanaged

    NOTE: Unless you know networking, this is probably going to take you down a networking rabbit hole, so glhf.

    Some Debian references regarding networking and different configurations:
    https://www.debian.org/doc/manuals/debian-reference/ch05.en.html
    https://www.debian.org/doc/manuals/debian-handbook/sect.network-config

    1. If you want to stick with NM, it seems you can change the logging level to see if you get more details. I would check the man page or documentation for NM for instructions for debugging. I would expect that you can disable interfaces in NM to reduce the likelihood of some fringe case that is plaguing your setup. Since I don’t run NM, I can’t provide any detailed suggestions.

    2. More of a question but is the switch or router also the same device for the last 2 years? Is it possible that the network device is misbehaving and causing the desktop to lock up? This would feed into @0v0 request to wireshark/tcpdump from a laptop or other device connected to the router/switch to see what’s going on traffic wise.

  • 0v0@sopuli.xyz
    link
    fedilink
    arrow-up
    3
    ·
    10 months ago

    Have you tried running tcpdump / wireshark on another device in the network when this happened?

    • nicocool84@sh.itjust.worksOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      10 months ago

      Nope, I don’t know the first thing about these tools, but now I’m kind of impatient and hope that the next freeze happens soon so I can try. :-)

    • nicocool84@sh.itjust.worksOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      No, good old SSID+passphrase. But this PC is connected via ethernet (although the mobo does have a wifi chip, that I don’t use).

  • d3Xt3r@lemmy.nz
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    10 months ago

    Check your system logs such as dmesg and journalctl immediately after the freeze (if it’s still occurring). You could filter journalctl log to show, say the last 5 minutes since the last boot, like this:

    journalctl --boot=-1 --since="5 min ago" --priority=0..3

    • nicocool84@sh.itjust.worksOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      10 months ago

      It happened yesterday, and here are the latest log lines before the freeze:

      Sep 14 23:30:30 licorne NetworkManager[1291]:   [1694727030.1207] device (wlp4s0): set-hw-addr: set MAC address to CA:D0:86:5F:F9:85 (scanning)
      Sep 14 23:30:30 licorne NetworkManager[1291]:   [1694727030.1478] device (wlp4s0): supplicant interface state: inactive -> disconnected
      Sep 14 23:30:30 licorne NetworkManager[1291]:   [1694727030.1478] device (p2p-dev-wlp4s0): supplicant management interface state: inactive -> disconnected
      Sep 14 23:30:30 licorne NetworkManager[1291]:   [1694727030.1530] device (wlp4s0): supplicant interface state: disconnected -> inactive
      Sep 14 23:30:30 licorne NetworkManager[1291]:   [1694727030.1530] device (p2p-dev-wlp4s0): supplicant management interface state: disconnected -> inactive
      Sep 14 23:30:58 licorne syncthing[3169286]: [VY2L4] INFO: Established secure connection to REDACTED1 at [::]:22000-192.168.0.14:22000/quic-client/TLS1.3-TLS_CHACHA20_POLY1305_SHA256/LAN-P20
      Sep 14 23:30:58 licorne syncthing[3169286]: [VY2L4] INFO: Device REDACTED1 client is "syncthing v1.23.4" named "REDACTED2.lan" at [::]:22000-192.168.0.14:22000/quic-client/TLS1.3-TLS_CHACHA20_POLY1305_SHA256/LAN-P20
      Sep 14 23:31:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
      Sep 14 23:31:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
      Sep 14 23:31:11 licorne syncthing[3169286]: [VY2L4] INFO: Established secure connection to REDACTED1 at 192.168.0.98:22000-192.168.0.14:22000/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P10
      Sep 14 23:31:11 licorne syncthing[3169286]: [VY2L4] INFO: Replacing old connection [::]:22000-192.168.0.14:22000/quic-client/TLS1.3-TLS_CHACHA20_POLY1305_SHA256/LAN-P20 with 192.168.0.98:22000-192.168.0.14:22000/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P10 for REDACTED1
      Sep 14 23:31:11 licorne syncthing[3169286]: [VY2L4] INFO: Connection to REDACTED1 at [::]:22000-192.168.0.14:22000/quic-client/TLS1.3-TLS_CHACHA20_POLY1305_SHA256/LAN-P20 closed: replacing connection
      Sep 14 23:31:11 licorne syncthing[3169286]: [VY2L4] INFO: Device REDACTED1 client is "syncthing v1.23.4" named "REDACTED2.lan" at 192.168.0.98:22000-192.168.0.14:22000/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P10
      Sep 14 23:32:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
      Sep 14 23:32:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
      Sep 14 23:33:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
      Sep 14 23:33:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
      Sep 14 23:33:28 licorne systemd[1]: Started anacron.service - Run anacron jobs.
      Sep 14 23:33:28 licorne anacron[4171587]: Anacron 2.3 started on 2023-09-14
      Sep 14 23:33:28 licorne anacron[4171587]: Normal exit (0 jobs run)
      Sep 14 23:33:28 licorne systemd[1]: anacron.service: Deactivated successfully.
      Sep 14 23:34:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
      Sep 14 23:34:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
      Sep 14 23:35:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
      Sep 14 23:35:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
      Sep 14 23:36:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
      Sep 14 23:36:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
      Sep 14 23:37:04 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
      Sep 14 23:37:04 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
      Sep 14 23:37:25 licorne NetworkManager[1291]:   [1694727445.1045] device (wlp4s0): set-hw-addr: set MAC address to EE:65:E2:6E:73:D1 (scanning)
      Sep 14 23:38:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
      Sep 14 23:38:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
      
      
  • Longpork_afficianado@lemmy.nz
    link
    fedilink
    arrow-up
    1
    ·
    10 months ago

    Is it possible that the freeze you’re seeing on that machine is actually caused by a network failure, rather than the other way around?

    I have encountered many times what appears to be a system freeze which is actually the result of background processes trying to access a network resource which no longer exists(eg, mounted a disk via VPN connection, but the VPN has dropped out)

      • sin_free_for_00_days@sopuli.xyz
        link
        fedilink
        arrow-up
        2
        ·
        10 months ago

        I had a similar issue which seemed to pop up anytime from an hour to a couple days after I used flameshot. It took me a long time to figure out what was triggering it. I stopped using flameshot and the freezes stopped. I’ve mentioned this a couple other times to people who ended up having the same problems and fix. But if you aren’t using it, I don’t have anything else to suggest.

  • 🧟‍♂️ Cadaver@lemmy.one
    link
    fedilink
    arrow-up
    1
    ·
    10 months ago

    Uninstall (I don’t know how, on debian) NetworkManager and reinstall it (better get a .deb)

    Then sudo systemctl enable NetworkManager.service

    Reboot and hope for the best.

    • nicocool84@sh.itjust.worksOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      This has been happening for 2 years, with the previous debian version too, so I doubt this would do anything?

      • 🧟‍♂️ Cadaver@lemmy.one
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        10 months ago

        Have you been updating or reinstalling ?

        Parce que si c’est update sur update ça pourrait venir de là. Dans ce cas réinstalle peut etre ?

        • nicocool84@sh.itjust.worksOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          10 months ago

          Updating. I’m willing to try your solution but I am a little bit worried about not being able to reinstall anything after I sudo apt remove network-manager. Why would a package reinstallation help? Wouldn’t resetting the config files be more efficient btw?

          EDIT: Ce n’est pas update sur update, y a juste eu bullseye (d’abord testing, puis stable), puis récemment je suis passé à bookworm. Mais le soucis est là depuis le début. Il est pas trop chiant parce que c’est rare, mais quand même ça m’enquiquine.

          • 🧟‍♂️ Cadaver@lemmy.one
            link
            fedilink
            arrow-up
            2
            ·
            edit-2
            10 months ago

            Thing is, I really haven’t used debian based distros for the better part of the last two years so I’m not sure how to reinstall it if something goes south. With arch you just have to do a pacstrap with a liveUSB.

            So… it seems kinda dangerous if you don’t have a backup .deb. I’m not sure I would advise you to go this way.

            I looked at your journalctl. The error might come from your wireless card. If that is the case, and since you don’t use it at all there is a simple trick : sudo systemctl disable wpa_supplicant then reboot.

            It won’t have any incidence on the ethernet but will somewhat disable your wifi card. (Not exactly but you get the gist of it).

            If I’m right it should make all of your problems go away. It might be worth a try. And if it doesn’t work a simple sudo systemctl enable wpa_supplicant will reverse it back to the way it was.

            Ça demeure chiant, même si c’est pas quotidien.

      • CameronDev@programming.dev
        link
        fedilink
        arrow-up
        1
        ·
        10 months ago

        If its been happening for multiple years and os’s, maybe your network card is dead/dying? Buy a new network card and see if that helps?

        • nicocool84@sh.itjust.worksOP
          link
          fedilink
          English
          arrow-up
          2
          ·
          10 months ago

          Everything is 2 yo, so this would mean the mobo (well, the onboard ethernet thing) was malfunctioning from the start. Maybe!

          I might try disabling and using the onboard wifi chip temporarily instead, just to see if I notice a new freeze. The issue is, I’ve never understood what triggers it, and it’s quite rare (less than once a week), so it’s really annoying to debug…

  • seasick@lemmy.world
    link
    fedilink
    arrow-up
    2
    arrow-down
    1
    ·
    10 months ago

    I have no idea, but it seems like interesting problem. Good luck finding a solution. (Just commenting to get notified of someone has a solution)