I recently noticed that after I migrated my OPNSense firewall to another KVM host for some maintenance, CPU load became unusually high.  What is typically a fairly idle system was consuming 2 full cores all the time, so I set out to figure out what was going on.

I did a quick glance in the web console and confirmed the load there.  I also noticed some strange blanks in the RRDs where it appeared logging had stopped. The firewall had continued to function.

I had initially been tipped off about the load at the hypervisor level, so it was good to have confirmation, and disconcerting to have blanks in the graphs.

I am currently running the firewall as an KVM virtual appliance with OpenVSwitch.  I love the ease of administration, the portability, and the snapshotting benefits that come with virtualization.  The firewall only shares the hypervisor with a couple of locked down OpenBSD hosts.

A quick virsh console OPNSense and I logged into the firewall console, better able to begin troubleshooting. If you use OPNSense, I think you will find the console very handy.  It allows a variety of easy options for seeing information, as well as access to a shell:

OPNSense console

Dropping into a shell lets you quickly run tools like top to see what the big hitters are.  In my rush to fix the problem, I didn't grab a screenshot, but here is a view anyway.  I found 3 culprits: suricata which is the intrusion detection system, syslog, and syslog-ng, both of which should speak for themselves. Small side note, I send all of my log traffic off of the firewall to a log host as well:

Oops, forgot to capture it when load was actually high! Looks much better fixed though :D

Now that I had the likely culprits, it was time to delve in deeper.  Looking at the logs in the console, I could see that it was complaining about the pppoe0 virtual interface for suricata.  There were quite a few of these happening, seemingly at many-per-second rate. My guess is that the gaps in the monitoring that I mentioned above were related to high CPU time, but I still need to delve in more.

Feb  1 08:13:22 gateway suricata: [100240] <Error> -- [ERRCODE: SC_ERR_NETMAP_READ(264)]
- Error reading data from iface 'pppoe0': (55u) No buffer space available
Feb  1 08:13:22 gateway suricata: [100240] <Error> -- [ERRCODE: SC_ERR_NETMAP_READ(264)]
- Error reading data from iface 'pppoe0': (55u) No buffer space available
Feb  1 08:13:22 gateway suricata: [100240] <Error> -- [ERRCODE: SC_ERR_NETMAP_READ(264)]
- Error reading data from iface 'pppoe0': (55u) No buffer space available
Feb  1 08:13:22 gateway suricata: [100240] <Error> -- [ERRCODE: SC_ERR_NETMAP_READ(264)]
- Error reading data from iface 'pppoe0': (55u) No buffer space available
Feb  1 08:13:22 gateway suricata: [100240] <Error> -- [ERRCODE: SC_ERR_NETMAP_READ(264)]
- Error reading data from iface 'pppoe0': (55u) No buffer space available
Feb  1 08:13:22 gateway suricata: [100240] <Error> -- [ERRCODE: SC_ERR_NETMAP_READ(264)]
- Error reading data from iface 'pppoe0': (55u) No buffer space available
Feb  1 08:13:22 gateway suricata: [100240] <Error> -- [ERRCODE: SC_ERR_NETMAP_READ(264)]
- Error reading data from iface 'pppoe0': (55u) No buffer space available
Feb  1 08:13:22 gateway suricata: [100240] <Error> -- [ERRCODE: SC_ERR_NETMAP_READ(264)]
- Error reading data from iface 'pppoe0': (55u) No buffer space available
Feb  1 08:13:22 gateway suricata: [100240] <Error> -- [ERRCODE: SC_ERR_NETMAP_READ(264)]
- Error reading data from iface 'pppoe0': (55u) No buffer space available
Feb  1 08:13:22 gateway suricata: [100240] <Error> -- [ERRCODE: SC_ERR_NETMAP_READ(264)]
- Error reading data from iface 'pppoe0': (55u) No buffer space available
Feb  1 08:13:22 gateway suricata: [100240] <Error> -- [ERRCODE: SC_ERR_NETMAP_READ(264)]
- Error reading data from iface 'pppoe0': (55u) No buffer space available
Feb  1 08:13:22 gateway suricata: [100240] <Error> -- [ERRCODE: SC_ERR_NETMAP_READ(264)]
- Error reading data from iface 'pppoe0': (55u) No buffer space available
Feb  1 08:13:22 gateway suricata: [100240] <Error> -- [ERRCODE: SC_ERR_NETMAP_READ(264)]
- Error reading data from iface 'pppoe0': (55u) No buffer space available
Feb  1 08:13:22 gateway suricata: [100240] <Error> -- [ERRCODE: SC_ERR_NETMAP_READ(264)]
- Error reading data from iface 'pppoe0': (55u) No buffer space available
Feb  1 08:13:22 gateway suricata: [100240] <Error> -- [ERRCODE: SC_ERR_NETMAP_READ(264)]
- Error reading data from iface 'pppoe0': (55u) No buffer space available
Feb  1 08:13:22 gateway suricata: [100240] <Error> -- [ERRCODE: SC_ERR_NETMAP_READ(264)]
- Error reading data from iface 'pppoe0': (55u) No buffer space available
Feb  1 08:13:22 gateway suricata: [100240] <Error> -- [ERRCODE: SC_ERR_NETMAP_READ(264)]
- Error reading data from iface 'pppoe0': (55u) No buffer space available
Feb  1 08:13:22 gateway suricata: [100240] <Error> -- [ERRCODE: SC_ERR_NETMAP_READ(264)]
- Error reading data from iface 'pppoe0': (55u) No buffer space available
Feb  1 08:13:22 gateway suricata: [100240] <Error> -- [ERRCODE: SC_ERR_NETMAP_READ(264)]
- Error reading data from iface 'pppoe0': (55u) No buffer space available
Feb  1 08:13:22 gateway suricata: [100240] <Error> -- [ERRCODE: SC_ERR_NETMAP_READ(264)]
- Error reading data from iface 'pppoe0': (55u) No buffer space available
Feb  1 08:13:22 gateway suricata: [100240] <Error> -- [ERRCODE: SC_ERR_NETMAP_READ(264)]
- Error reading data from iface 'pppoe0': (55u) No buffer space available
Feb  1 08:13:22 gateway suricata: [100240] <Error> -- [ERRCODE: SC_ERR_NETMAP_READ(264)]
- Error reading data from iface 'pppoe0': (55u) No buffer space available
Feb  1 08:13:22 gateway suricata: [100240] <Error> -- [ERRCODE: SC_ERR_NETMAP_READ(264)]
- Error reading data from iface 'pppoe0': (55u) No buffer space available
Feb  1 08:13:22 gateway suricata: [100240] <Error> -- [ERRCODE: SC_ERR_NETMAP_READ(264)]
- Error reading data from iface 'pppoe0': (55u) No buffer space available

One issue I had noticed after my migration BACK to the original host was that my pppoe0 interface would come up, but not pass traffic.  A simple disable/enable of the interface would bring it back, but it made me wonder if that could be related to the root cause that began this issue.  Bouncing the interface did NOT fix the load issue with suricata and syslog.

My next action was to look at suricata's configuration itself.  Within the suricata configuration, you can specify the interfaces you want intrusion detection to monitor.  With pppoe0 being the interface that was causing me problems, I decide to remove the interface, then re-add it:

Intrusion Detection configuration

To do this, I simply unchecked my pppoe interface from the Interfaces dropdown, applied the changes, then checked the pppoe interface again and applied.

Voila!  After suricata made it through its initialization process, CPU time dropped down to an expected level (under 5% WCPU on average) and syslog quieted down as well.

At this point, I'm still not sure of the root cause, but there are several items at play here:

  • Migration of VM from one KVM host to another, and then back (I have lots of interfaces with OpenVSwitch handling all the backend stuff for me, so it made this an extremely easy process - will investigate using OPNSense HA config soon to remove need to migrate - PPPoE is where I have some questions so far)
  • I recently enabled the Emerging Threats rules by allowing the telemetry to be shared.  This added a bunch of new rules for suricata when I applied the license code.
  • The issue does appear to have started on the first migration, and continued when I migrated back.  The initial KVM host was rebuilt.

OPNSense has quickly become a favorite for me, and I highly recommend it.  The ease with which you can delve into the system and find out what it is doing, from Web Adminsitration to Console Adminsitration, makes this a tool that I can spend time on what I need to, and not waste time on things I don't need to.  

\- Haplo