Hi all. I’ve been having some problems keeping fedia.io running - at the moment, either the message workers or the php web server processes are dying after an hour or so and I have to restart everything. I have been working with the mbin team and installed some updates that we hoped would fix the problems, but no luck. I am going to work on a cron job to automatically restart things once an hour. The down side, is that you’ll likely see some error 500’s if you happen to hit it when the processes are restarting, but it should happen quickly and refreshing the page should make it work again.

  • jerry@fedia.ioOPM
    link
    fedilink
    arrow-up
    8
    ·
    3 months ago

    ok - I took a bit different approach. Since I know what error in rabbitmq’s log file is associated with things coming to a stop on fedia.io, I installed swatchdog and set it up to look for that word (which is, btw, “timeout”). I created a script that stops all the messengers, then stops php-fpm, keydb, and rabbitmq. Then it start rabbit, keydb, and php-fpm in order. Finally, it restarts the messengers.

    I will be surprised if it works first time, so it may still crash again but I’ll be watching

  • tiredofsametab@fedia.io
    link
    fedilink
    arrow-up
    2
    ·
    2 months ago

    I’ve noticed recently that I’m getting errors trying to vote on any posts in a discussion I’ve had open for more than maybe a minute (I haven’t actually timed it). I don’t remember it from before these issues, but I also switched to this instance just before. Might it be related?

      • tiredofsametab@fedia.io
        link
        fedilink
        arrow-up
        2
        ·
        2 months ago

        This always fails for me: https://fedia.io/ecf/7236913?choice=1

        Normally, if I refresh a page once and immediately vote, it works. In this case, it has never worked.

        This happens periodically and it does not seem to be specific to any instance (I’ve seen across posts from several both in terms of the OP or the instance of the commenter).

        My gut says potentially issues with timezone somewhere and my offset (UTC+9) is potentially far enough out that it’s an issue. I have no evidence for that. Looking at the request and response in dev tools hasn’t yielded anything particularly useful so far as I can tell.

        • jerry@fedia.ioOPM
          link
          fedilink
          arrow-up
          4
          ·
          2 months ago

          I moved fedia.io away from fastly. I have a nagging feeling it has something to do with fastly. Can you let me know if you continue to see this?

          • melroy@kbin.melroy.org
            link
            fedilink
            arrow-up
            2
            ·
            2 months ago

            I found:

            [2024-09-12T20:42:54.414611+02:00] request.ERROR: Uncaught PHP Exception Symfony\Component\HttpKernel\Exception\BadRequestHttpException: "Invalid CSRF token" at AbstractController.php line 39 {"exception":"[object] (Symfony\\Component\\HttpKernel\\Exception\\BadRequestHttpException(code: 0): Invalid CSRF token at /var/www/kbin.melroy.org/html/src/Controller/AbstractController.php:39)
            [stacktrace]
            #0 /var/www/kbin.melroy.org/html/src/Controller/FavouriteController.php(24): App\\Controller\\AbstractController->validateCsrf()
            #1 /var/www/kbin.melroy.org/html/vendor/symfony/http-kernel/HttpKernel.php(183): App\\Controller\\FavouriteController->__invoke()
            #2 /var/www/kbin.melroy.org/html/vendor/symfony/http-kernel/HttpKernel.php(76): Symfony\\Component\\HttpKernel\\HttpKernel->handleRaw()
            #3 /var/www/kbin.melroy.org/html/vendor/symfony/http-kernel/Kernel.php(182): Symfony\\Component\\HttpKernel\\HttpKernel->handle()
            #4 /var/www/kbin.melroy.org/html/vendor/symfony/runtime/Runner/Symfony/HttpKernelRunner.php(35): Symfony\\Component\\HttpKernel\\Kernel->handle()
            #5 /var/www/kbin.melroy.org/html/vendor/autoload_runtime.php(29): Symfony\\Component\\Runtime\\Runner\\Symfony\\HttpKernelRunner->run()
            #6 /var/www/kbin.melroy.org/html/public/index.php(7): require_once('...')
            #7 {main}
            "} []
            

            And you found:

            {"message":"Uncaught PHP Exception Symfony\\Component\\HttpKernel\\Exception\\BadRequestHttpException: \"Invalid CSRF token\" at AbstractController.php line 39","context":{"exception":{"class":"Symfony\\Component\\HttpKernel\\Exception\\BadRequestHttpException","message":"Invalid CSRF token","code":0,"file":"/var/www/mbin/src/Controller/AbstractController.php:39"}},"level":400,"level_name":"ERROR","channel":"request","datetime":"2024-09-12T18:54:45.620576+00:00","extra":{}}
            {"message":"Uncaught PHP Exception Symfony\\Component\\HttpKernel\\Exception\\BadRequestHttpException: \"Invalid CSRF token\" at AbstractController.php line 39","context":{"exception":{"class":"Symfony\\Component\\HttpKernel\\Exception\\BadRequestHttpException","message":"Invalid CSRF token","code":0,"file":"/var/www/mbin/src/Controller/AbstractController.php:39"}},"level":400,"level_name":"ERROR","channel":"request","datetime":"2024-09-12T18:54:45.803347+00:00","extra":{}}
            

            Not sure yet what the root-cause is. But it’s on our radar now.

            • tiredofsametab@fedia.io
              link
              fedilink
              arrow-up
              3
              ·
              2 months ago

              Y’all are great. Feel free to ask if you need me to try anything. I haven’t touched PHP in years, but I am a software engineer, so feel free to be as technical as you’d like.

              • melroy@kbin.melroy.org
                link
                fedilink
                arrow-up
                3
                ·
                2 months ago

                We can definitively use more developers. We are currently with only two: me and bentigorlich (recently debounced left as well as e-five). I also didn’t use Symfony (the PHP framework behind it), but I now also got those skills in place… So no worries, we are happy to help you. You can join us at Matrix, so it’s easier to chat and discuss: Mbin Matrix space. I hope to see you there!

                EDIT: GitHub repo is at: https://github.com/MbinOrg/mbin

                • melroy@kbin.melroy.org
                  link
                  fedilink
                  arrow-up
                  2
                  ·
                  2 months ago

                  Sorry you also went through this: -> kbin.social (died) -> kbin.run (died) -> fedia. Kbin.run was the instance of debounced, mentioned earlier…

            • melroy@kbin.melroy.org
              link
              fedilink
              arrow-up
              2
              ·
              2 months ago

              For now try Firefox or a fork: Floorp, LibreWolf, etc. I heard that works better… I know this isn’t the solution, but that is the best workaround atm.

              • jerry@fedia.ioOPM
                link
                fedilink
                arrow-up
                2
                ·
                2 months ago

                Most interesting: the problem had only been happening on MS Edge on my laptop. I have been using safari on my phone without issue. Just a bit ago, i refreshed the page and now every time I revisit the site, I have to log back in, just like on Edge. It’s like the old session expired and the new ones aren’t sticking. I’ll try FF on my phone.

                Note: even in the time I started typing this reply to when I hit the “add comment” button, I got logged out

                • melroy@kbin.melroy.org
                  link
                  fedilink
                  arrow-up
                  1
                  ·
                  2 months ago

                  Note: even in the time I started typing this reply to when I hit the “add comment” button, I got logged out

                  That is really bad indeed. And the only error you see on the server side is only “Invalid CSRF token”?

              • tiredofsametab@fedia.io
                link
                fedilink
                arrow-up
                1
                ·
                2 months ago

                Will do. This morning I have work to do outside.

                I will also note that there are three patterns when I post a comment that may or may not be related:

                • it just publishes when I hit the button
                • I hit the button, it thinks for a second, and then the button is intractable again. Pushing it again works so far in every case (i.e. it seems something goes wrong but no UI error. I haven’t had dev tools open to see what happens there. This feels like it took to long for me to reply in some cases, but not all).
                • I hit post and get moved to a new page which is just my post with a preview. I’m not sure if this is just how it works with certain sites or something or also related.
        • melroy@kbin.melroy.org
          link
          fedilink
          arrow-up
          2
          ·
          2 months ago

          We need server error logs. So when such a problem happens. And you can fully replicate the issue. I hope you can test it with @[email protected] and see if there is some error log at the server side happening as well.

          That allows us (developers) to find hopefully the root-cause of this issue. If it’s still present.

      • tiredofsametab@fedia.io
        link
        fedilink
        arrow-up
        1
        ·
        2 months ago

        It might only be with certain instances. I just noticed it wasn’t happening on a lemmy.world post I’d had open for a while. It could also have been something temporary. I’ll try to sport/report any patterns.

  • ciferecaNinjo@fedia.io
    link
    fedilink
    arrow-up
    1
    ·
    3 months ago

    I noticed that when I visit my profile page (https://fedia.io/u/ciferecaNinjo) while logged out, I get a 504 gateway error, but if I login then my profile page renders fine. It has been this way the past few days. If I view my profile from a logged-out browser while logged in in another browser, the logged out browser sees the profile fine. So to reproduce it would be interesting to visit anyone’s profile who is logged out.

      • ciferecaNinjo@fedia.io
        link
        fedilink
        arrow-up
        1
        ·
        3 months ago

        Thanks!

        Apparently it’s not as reproduceable as I thought. I was just now able to render my profile before logging in.

        • jerry@fedia.ioOPM
          link
          fedilink
          arrow-up
          2
          ·
          3 months ago

          Fedia is still having some stability issues. I set up some automation to detect the problem and then restart the services. If you were to look at that page while the services were restarting, you will see that error