5.8 KiB
Introduction
ntopng alerts are
- stateful
- stateless
Stateless alerts
Stateless alerts are stored when an issue is detected. Issues which trigger stateless alerts have no duration associated, that is, they are basically events such as the connection/disconnection of a device, or the change of the status of an SNMP device port.
To store a stateless alert, method alerts_api.store is called.
Stateless alerts lifecycle
alerts_api.storeenqueues the alert intothe ntopng internal SQLite queue (the ntopng recipients queue (ntop.pushSqliteAlert) and also intontop.pushAlertNotification).housekeeping.luadequeues, every three seconds, the alert from thentopng internal SQLite queue (ntopng recipients queue (ntop.popSqliteAlert) and also from thealert_utils.processAlertNotifications).
Alerts dequeued from the ntopng internal SQLite queue are sent to the C engine for the actual insertion into SQLite (interface.storeAlert).- Alerts dequeued from the ntopng recipients queue are enqueued again into per-recipient queues (
notification_recipients.dispatchNotification), using the host pool id carried inside the alert to choose the recipients (recipients = pools:get_recipients(message.pool_id)). Alerts are always enqueued also into the builtin SQLite recipientbuiltin_sqlite_recipient.
housekeeping.luadequeues, every three seconds, alerts from every per-recipient queue (notification_recipients.processNotifications).
- Alerts are dequeued from the per-recipient queue with a single call to
m.dequeueRecipientAlertswhich is also in charge of processing the dequeued alerts (e.g., batch and send them out in email messages).
Stateful alerts
Stateful alerts are triggered the issue is detected, and are released when the issue no longer occurs. Issues which trigger stateful alerts have a duration associated, that is, they are events such as threshold crosses (e.g., the throughput is above 1Mbps) or suspicious activities (e.g., the host is performing a TCP scan).
Methods alerts_api.trigger and alerts_api.release are called to trigger and release stateful alerts, respectively.
Stateful alerts lifecycle
alerts_api.triggersets into the C core, straight into the entity, the triggered alert (e.g.,host.storeTriggeredAlert) and enqueues the alert into the ntopng recipients queue (ntop.pushAlertNotification). NoSQLiteenqueues/dequeues/insertions comes into play.alerts_api.releaseremoves from the C core the previously triggered alert (e.g.,host.releaseTriggeredAlert), enqueues the alert into ~~the ntopng internal SQLite queue (ntop.pushSqliteAlert) and also into ~~the ntopng recipients queue (ntop.pushAlertNotification).housekeeping.luaperforms the very same operations 2 and 3 described above for stateless alerts.
Alert Queues
Queues are used to decouple the dispatch from the processing of alerts. Currently used queues are:
One in-memory queue for SQLite alerts (ntop->getSqliteAlertsQueue())- One in-memory queue for the ntopng recipients (
ntop->getAlertsNotificationsQueue()) - Multiple Redis queues for per-recipient queues (
get_endpoint_recipient_queue(recipient_id))
Queue messages format
JSON messages are queued/dequeued. The format of the JSON is undocumented and contains variable-fields which depends on the alert type. However, a minimum set of fields is constant and include alert type, entity and severity.
Queue drops
When the SQLlite queue is full, alerts are dropped and counted into the system interface dropped alerts (iface->incNumDroppedAlerts(1)).- When the ntopng recipients queue is full, alerts are dropped and counted into the system interface dropped alerts (
iface->incNumDroppedAlerts(1))but drops are NOT counted. - When any of the per-recipient queues is full, alerts are dropped with a queue trim but drops are NOT counted.
Alert Recipients
Recipients are implemented as plugins, e.g., plugins/{email,webhook}_alert_endpoint/. Recipients are loaded in memory with plugins_utils.getLoadedAlertEndpoints(). A require is used to avoid loading them more than one time per Lua VM. Recipients are not loaded when enqueuing alerts, however, they are loaded when dequeuing alerts in housekeeping.lua which VM is re-used and only recreated once every two minutes.
It would be desirable to migrate current recipients implementation to an OO implementation to ease the monitoring of queue fill levels.
Critical Points
-
Only one in-memory queue is used for SQLite alerts and ntopng recipients.
- An interface generating many alerts can jeopardize the queue and cause other interfaces alerts to be dropped.
- An high number of alerts of a certain type can jeopardize the queue and cause alerts of other types to be dropped.
-
When
the ntopng recipients queue orany of the per-recipient queues is full, alerts are dropped but drops are NOT counted. -
housekeeping.luais assumed to run every three seconds, however, it can be much slower than this, for example when it starts refreshing/downloading blacklists from the web. If the housekeeping gets stuck for a long time, alerts will not be dequeued, queues will grow, and eventually this will cause alert drops. -
notification_recipients.processNotificationsrelies onhousekeeping.luarunning every three seconds ((now % m.EXPORT_FREQUENCY) < periodic_frequency). This is assumption is wrong and can cause alerts to stay in per-recipient queues indefinitely. -
notification_recipients.processNotificationrelies ondequeueRecipientAlerts. IfdequeueRecipientAlertsis slow, or perform only one operation at time, then alerts will be processed at a much slower rate than the generation rate (e.g., currently, max 1 mail is sent out every minute). It would be ideal to process all recipients in round-robin until there's no more work to do.