Posts tagged with “monitoring”

self hosting is a serious matter - but fun

I've had my hands full in the last few weeks.. let's see what's new for @wpn

First of all, I removed some #DNS records (and related services), namely:

laltrowiki (our (me and old friends) wiki didn't really fit with @wpn - I'm keeping it for myself)

trilium (note-taking app which I kept on @wpn, but was only personal.. Now it's hosted at home)

grocy (pantry/shopping aid app, same as trilium above)

Have upgraded plugins for roundcube #webmail and FreshRSS, #RSS reader. Main services are always up-to-date, while these were some additions due from time to time.
Have done some tinkering with this blog's and xmpp-it homepage's #sqlite DBs: if you're interested, follow here
Have installed and then removed 2 #gemini to #html #proxies, because I believe the one I've been (and still am) using so far, is the best one. I also tweaked its stylesheet a bit. The two proxies I tried are september and kineto, while the one currently in use is loxy
I have first upgraded uptime-kuma to V2 Beta but then backed it up and moved to gatus, which lacks some features but it's also lightweight and straightforward. During the process we lost webhooks live notifications (about services status) for XMPP chatroom, so I have also installed ntfy server and related Android app, but, obviously, I'm currently the only one to see those. If any of @wpn users is interested, I can share info about the "topic", server's "address:port" and such. For the time being everything is in plain and without authentication.. Don't think that @wpn's services status notifications are so "sensible" info to require encryption, so I didn't even bother 😀
It's "6" already.. wow!

transmission-daemon was replaced by good-old rtorrent and that also got a web-ui, which only myself can access so far.

All of the aforementioned changes and fixes were mainly meant to reduce the load on the machine, in order to provide a better experience for everyone. Things still look complicated though, because even if there was a significant decrease in RAM usage, CPU, on the other side, seems almost worsened - and I still can't explain that, apart from thinking contabo maybe doing oversubscription of resources on their host.

Last 2 things, then I'll shut-up! 😁

Some of the users' home directories (only the ones which belongs to me or are system-related users account) are now backed-up via rsnapshot to my #homelab. I intend to write an how-to on that topic, further ahead in time, because I'm still testing/figuring it out.
The onboarding tool has got a new checkbox for (legal) age verification.

That's it. Feel free to tale a "tour" if you're new or haven't had the chance yet: https://woodpeckersnest.space/

See you soon!

Downtimes

It's been a few days now that I'm experiencing downtimes at night, early mornings.

When I wake up, connect to the VPS and attach to tmux, I am welcomed by these messages in console:

        Message from syslogd@pandora at Nov 3 05:37:13 ...
        kernel:[1586232.350737] Dazed and confused, but trying to continue

        Message from syslogd@pandora at Nov 3 05:37:24 ...
        kernel:[1586235.049143] Uhhuh. NMI received for unknown reason
        30 on CPU 1.

        Message from syslogd@pandora at Nov 3 05:37:24 ...
        kernel:[1586235.049145] Dazed and confused, but trying to continue

        Message from syslogd@pandora at Nov 3 05:37:55 ...
        kernel:[1586273.642163] watchdog: BUG: soft lockup - CPU#2 stuck
        for 27s! [dockerd:526408]

        Message from syslogd@pandora at Nov 3 05:38:00 ...
        kernel:[1586278.545172] watchdog: BUG: soft lockup - CPU#1 stuck
        for 24s! [systemd-journal:257]

        Message from syslogd@pandora at Nov 3 05:38:02 ...
        kernel:[1586281.187611] watchdog: BUG: soft lockup - CPU#3 stuck
        for 35s! [lua5.4:1702]

There's no need to say that when this happens, the server is completely frozen and doesn't respond to anything.

I already contacted support, but they didn't investigate at all, I believe. They manually restarted my VPS once and did some pings and connection tests (VNC, SSH) afterwards.. "everything is working fine!"

This last Saturday I was up when it happened, so I did a mtr from my PC to the VPS's IP and logged it, then I sent another email with the output to support.. Still waiting for them to reply, I guess tomorrow (Monday).

Friends like lorenzo and shai are having difficulties too, with the same provider, so I'm not imagining things.

Well, that's all I got to say, will keep you posted if any news.

beszel

Beszel Setup

Interesting project at https://github.com/henrygd/beszel

Collects resource statistics from one or more systems, display CPU/RAM/DISK/NET/DOCKER information and be alerted in case "event" happens.

These days I set up beszel HUB on my VPS and beszel-agent on the same VPS, on our chatmail server and even on my desktop PC at home on WSL2: so now I'm monitoring 3 systems from a web interface and I'm being notified if one of them becomes unreachable or has exceeded %resources for every type of monitor.

This is my compose.yaml for the HUB and agent on "woodpeckersnest.eu":

services:
  beszel:
    image: 'henrygd/beszel'
    container_name: 'beszel'
    networks:
      beszel:
        ipv4_address: 172.30.0.2
    restart: unless-stopped
    ports:
      - '8090:8090'
    volumes:
      - beszel_data:/beszel_data
    environment:
      DISABLE_PASSWORD_AUTH: false

  beszel-agent:
    image: "henrygd/beszel-agent"
    container_name: "beszel-agent"
    networks:
      beszel:
        ipv4_address: 172.30.0.3
    restart: unless-stopped
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    environment:
      PORT: 45876
      KEY: "SECRET"
      FILESYSTEM: /dev/sda3
      
networks:
  beszel:
    name: "beszel"
    external: true

volumes:
   beszel_data:
       external: true
       name: "beszel_data"

I created a named docker volume and a custom network beforehand:

docker volume create --name beszel_data

docker network create --subnet=172.30.0.0/16 --gateway=172.30.0.1 beszel

I didn't want to run the agent with network_mode: host, so here's a bridged setup. Network stats on my VPS won't be relevant, since the only net beszel can monitor is the docker bridged one, but I don't care very much.

For chatmail and home desktop I'm running the binary agent, respectively in a systemd unit for chatmail and with a bash script for my PC.

The systemd unit looks like this:

# /etc/systemd/system/beszel-agent.service
[Unit]
Description=Beszel Agent Service
After=network.target

[Service]
Environment="PORT=45876"
Environment="KEY=SECRET"
ExecStart=/home/chatmail/bin/beszel-agent
User=chatmail
Restart=always

[Install]
WantedBy=multi-user.target

The HUB has also got (automatic) backups, locally or on S3 - I tested both without issues. Beszel (both HUB and agent) take almost no CPU/RAM to do their job and stats are easy to read. Even the management tool is quite straightforward and has got the basic stuff right - I skipped OAUTH entirely though, while managing users manually.

Still young project but already recommended 😍

If you're interested in setting this up on WSL2, here I opened a discussion about this topic.. No need to repeat myself 😀

shout-out to "mforester@rollenspiel.social" who posted about beszel on the fediverse and gave me the idea to try it.