Swap from Aho-Corasick to an experimental/home-grown algorithm that uses a probabilistic

approach for handling Internet domain names.

For switching back to Aho-Corasick it is necessary to edit
ndpi-typedefs.h and uncomment the line
// #define USE_LEGACY_AHO_CORASICK

[1] With Aho-Corasick
$ ./example/ndpiReader -G ./lists/ -i tests/pcap/ookla.pcap | grep Memory
nDPI Memory statistics:
nDPI Memory (once):      37.34 KB
Flow Memory (per flow):  960 B
Actual Memory:           33.09 MB
Peak Memory:             33.09 MB

[2] With the new algorithm
$ ./example/ndpiReader -G ./lists/ -i tests/pcap/ookla.pcap | grep Memory
nDPI Memory statistics:
nDPI Memory (once):      37.31 KB
Flow Memory (per flow):  960 B
Actual Memory:           7.42 MB
Peak Memory:             7.42 MB

In essence from ~33 MB to ~7 MB

This new algorithm will enable larger lists to be loaded (e.g. top 1M domans
https://s3-us-west-1.amazonaws.com/umbrella-static/index.html)

In ./lists there are file names that are named as <category>_<string>.list
With -G ndpiReader can load all of them at startup
This commit is contained in:
Luca Deri 2023-08-29 17:34:04 +02:00
parent 1f693c3f5a
commit 36abf06c6f
11 changed files with 270 additions and 43 deletions

View file

@ -367,7 +367,7 @@ typedef enum {
NDPI_PROTOCOL_HOTS = 336, /* Heroes of the Storm */
NDPI_PROTOCOL_FACEBOOK_REEL_STORY = 337,
NDPI_PROTOCOL_SRTP = 338,
NDPI_PROTOCOL_GAMBLING = 339,
NDPI_PROTOCOL_FREE = 339, /* Formerly used by gambling now a category. It can be reused in the future */
NDPI_PROTOCOL_EPICGAMES = 340,
NDPI_PROTOCOL_GEFORCENOW = 341,
NDPI_PROTOCOL_NVIDIA = 342,