Posts tagged with “monitoring”

Downtimes

Written by Simone

It's been a few days now that I'm experiencing downtimes at night, early mornings.

When I wake up, connect to the VPS and attach to tmux, I am welcomed by these messages in console:

        Message from syslogd@pandora at Nov 3 05:37:13 ...
        kernel:[1586232.350737] Dazed and confused, but trying to continue

        Message from syslogd@pandora at Nov 3 05:37:24 ...
        kernel:[1586235.049143] Uhhuh. NMI received for unknown reason
        30 on CPU 1.

        Message from syslogd@pandora at Nov 3 05:37:24 ...
        kernel:[1586235.049145] Dazed and confused, but trying to continue

        Message from syslogd@pandora at Nov 3 05:37:55 ...
        kernel:[1586273.642163] watchdog: BUG: soft lockup - CPU#2 stuck
        for 27s! [dockerd:526408]

        Message from syslogd@pandora at Nov 3 05:38:00 ...
        kernel:[1586278.545172] watchdog: BUG: soft lockup - CPU#1 stuck
        for 24s! [systemd-journal:257]

        Message from syslogd@pandora at Nov 3 05:38:02 ...
        kernel:[1586281.187611] watchdog: BUG: soft lockup - CPU#3 stuck
        for 35s! [lua5.4:1702]

There's no need to say that when this happens, the server is completely frozen and doesn't respond to anything.

I already contacted support, but they didn't investigate at all, I believe. They manually restarted my VPS once and did some pings and connection tests (VNC, SSH) afterwards.. "everything is working fine!"

This last Saturday I was up when it happened, so I did a mtr from my PC to the VPS's IP and logged it, then I sent another email with the output to support.. Still waiting for them to reply, I guess tomorrow (Monday).

Friends like lorenzo and shai are having difficulties too, with the same provider, so I'm not imagining things.

Well, that's all I got to say, will keep you posted if any news.

beszel

Written by Simone

Beszel Setup

Interesting project at https://github.com/henrygd/beszel

Collects resource statistics from one or more systems, display CPU/RAM/DISK/NET/DOCKER information and be alerted in case "event" happens.

These days I set up beszel HUB on my VPS and beszel-agent on the same VPS, on our chatmail server and even on my desktop PC at home on WSL2: so now I'm monitoring 3 systems from a web interface and I'm being notified if one of them becomes unreachable or has exceeded %resources for every type of monitor.

This is my compose.yaml for the HUB and agent on "woodpeckersnest.eu":

services:
  beszel:
    image: 'henrygd/beszel'
    container_name: 'beszel'
    networks:
      beszel:
        ipv4_address: 172.30.0.2
    restart: unless-stopped
    ports:
      - '8090:8090'
    volumes:
      - beszel_data:/beszel_data
    environment:
      DISABLE_PASSWORD_AUTH: false

  beszel-agent:
    image: "henrygd/beszel-agent"
    container_name: "beszel-agent"
    networks:
      beszel:
        ipv4_address: 172.30.0.3
    restart: unless-stopped
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    environment:
      PORT: 45876
      KEY: "SECRET"
      FILESYSTEM: /dev/sda3
      
networks:
  beszel:
    name: "beszel"
    external: true

volumes:
   beszel_data:
       external: true
       name: "beszel_data"

I created a named docker volume and a custom network beforehand:

docker volume create --name beszel_data

docker network create --subnet=172.30.0.0/16 --gateway=172.30.0.1 beszel

I didn't want to run the agent with network_mode: host, so here's a bridged setup. Network stats on my VPS won't be relevant, since the only net beszel can monitor is the docker bridged one, but I don't care very much.

For chatmail and home desktop I'm running the binary agent, respectively in a systemd unit for chatmail and with a bash script for my PC.

The systemd unit looks like this:

# /etc/systemd/system/beszel-agent.service
[Unit]
Description=Beszel Agent Service
After=network.target

[Service]
Environment="PORT=45876"
Environment="KEY=SECRET"
ExecStart=/home/chatmail/bin/beszel-agent
User=chatmail
Restart=always

[Install]
WantedBy=multi-user.target

The HUB has also got (automatic) backups, locally or on S3 - I tested both without issues. Beszel (both HUB and agent) take almost no CPU/RAM to do their job and stats are easy to read. Even the management tool is quite straightforward and has got the basic stuff right - I skipped OAUTH entirely though, while managing users manually.

Still young project but already recommended 😍

If you're interested in setting this up on WSL2, here I opened a discussion about this topic.. No need to repeat myself 😀

shout-out to "mforester@rollenspiel.social" who posted about beszel on the fediverse and gave me the idea to try it.