Monitoring a process for high memory consumption using Monit

I run Pi-hole on an old PogoPlug E02 with a custom compiled dnsmasq (or pihole-FTL, as they now call their customised version of it). Lately I have been noticing my DNS queries becoming slow erratically, and upon further investigation it looked like pihole-FTL has a memory balloon, and it consumes all of the 256 MBs of memory available and starts swapping, bringing everything to an almost standstill.

In comes Monit, a highly configurable process supervisor. This is how I set up monitoring for the errant pihole-FTL process. It checks whether the process consumes more than 100 MB of memory for more than three cycles, and if it does, it restarts it. This has taken care of any sort of manual tinkering I need to do whenever there’s complaints of the internet being slow.

check process pihole-FTL with pidfile /run/
start program = "/usr/sbin/service pihole-FTL start" with timeout 20 seconds
stop program = "/usr/sbin/service pihole-FTL stop"
if totalmem > 100.0 MB for 3 cycles then restart

PS: Monit has nice commands to check the status of the processes/files/directories, etc. it monitors. monit summary for succinct information, or monit status for more verbose output. Note that you might need to turn on the HTTP API for these to work.

soumik@pi-hole:~# monit summary
Monit 5.20.0 uptime: 32m
│ Service Name │ Status │ Type │
│ pi-hole │ Running │ System │
│ pihole-FTL │ Running │ Process │
soumik@pi-hole:~# monit status
Monit 5.20.0 uptime: 32m
Process 'pihole-FTL'
status Running
monitoring status Monitored
monitoring mode active
on reboot start
pid 6363
parent pid 1
uid 999
effective uid 999
gid 999
uptime 22h 51m
threads 6
children 0
cpu 0.2%
cpu total 0.2%
memory 8.6% [20.7 MB]
memory total 8.6% [20.7 MB]
data collected Tue, 26 Feb 2019 18:40:28
System 'pi-hole'
status Running
monitoring status Monitored
monitoring mode active
on reboot start
load average [0.00] [0.00] [0.07]
cpu 0.4%us 0.3%sy 0.3%wa
memory usage 43.1 MB [17.8%]
swap usage 8.2 MB [1.6%]
uptime 1d 20h 37m
boot time Sun, 24 Feb 2019 22:03:33
data collected Tue, 26 Feb 2019 18:40:28