Commit graph

624 commits

Author SHA1 Message Date
Ivan Nardi
2740a4f4e3
Update all IP lists (#2515)
The `suffix_id` is simply an incremental index (see
`ndpi_load_domain_suffixes`), so its value might changes every time we
update the public suffix list.
2024-08-02 15:06:08 +02:00
Ivan Nardi
65e31b0ea3
FPC: small improvements (#2512)
Add printing of fpc_dns statistics and add a general cconfiguration option.
Rework the code to be more generic and ready to handle other logics.
2024-07-22 17:42:23 +02:00
Petr
92d0b8d91f
ndpi_strncasestr: optimization, fixes, tests (#2507) 2024-07-18 19:40:09 +02:00
Petr
2f66a6a3e1
ndpi_memmem: optimized, fixed bug, added tests (#2499) 2024-07-15 08:35:10 +02:00
Petr
e059daa0f1
Optimize performance of ndpi_strnstr() and possible bugfix (#2494) 2024-07-15 08:34:08 +02:00
Ivan Nardi
843e487270
Add infrastructure for explicit support of Fist Packet Classification (#2488)
Let's start with some basic helpers and with FPC based on flow addresses.

See: #2322
2024-07-03 18:02:07 +02:00
Luca Deri
731b75b44c Modified separator from , (comma) to | (pipe) as some fields such as the HTTP user agent as sometimes they contain commas and create parsing problems 2024-07-01 09:53:38 +02:00
Nardi Ivan
556f892a56 wireshark: lua: export some metadata
Export some metadata (for the moment, SNI and TLS fingerprints) to
Wireshark/tshark via extcap.
Note that:
* metadata are exported only once per flow
* metadata are exported (all together) when nDPI stopped processing
the flow

Still room for a lot of improvements!
In particular:
* we need to add some boundary checks (if we are going to export other
attributes)
* we should try to have a variable length trailer
2024-06-25 16:39:45 +02:00
Nardi Ivan
b5afa165f0 wireshark: extcap: restore filtering mechanism 2024-06-25 16:39:45 +02:00
Mark Jeffery
aa1d7247d1
Added default port mappings to ndpiReader help -H (#2477)
Close #2125
2024-06-19 13:47:18 +02:00
Nardi Ivan
526cf6f291 Zoom: remove "stun_zoom" LRU cache
Since 070a0908b we are able to detect P2P calls directly from the packet
content, without any correlation among flows
2024-06-17 10:19:55 +02:00
Mark Jeffery
312dc424bd
Added NDPI_PROTOCOL_NTOP assert and removed percentage comparison (#2460)
Close #2413
2024-06-10 19:45:19 +02:00
Ivan Nardi
0109014f2c
Follow-up of 2093ac5bf (#2451) 2024-05-21 12:47:25 +02:00
Luca Deri
2093ac5bf6 Minor dissector optimizations 2024-05-20 12:17:04 +02:00
Luca Deri
42dba2e4af Added dpi.compute_entropy configuration parameter 2024-05-18 09:46:15 +02:00
Ivan Nardi
a064261e85
Revert ndpi_strnstr() optimization introduced in a813121e0 (#2439)
New implementation fails tests 11b, 12 and 13.
Revert to the original (BSD) implementation (with also some basic
parameters check)
2024-05-11 23:37:31 +02:00
Vladimir Gavrilov
a813121e0a
ndpi_strnstr() optimization (#2433) 2024-05-10 22:43:59 +02:00
Toni
e9dc035c5c
Added optimized memmem/strlcpy version (#2424)
* credits goes to Vladimir Gavrilov

Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
2024-05-08 11:38:53 +02:00
Ivan Nardi
95fe21015d
Remove "zoom" cache (#2420)
This cache was added in b6b4967aa, when there was no real Zoom support.
With 63f349319, a proper identification of multimedia stream has been
added, making this cache quite useless: any improvements on Zoom
classification should be properly done in Zoom dissector.

Tested for some months with a few 10Gbits links of residential traffic: the
cache pretty much never returned a valid hit.
2024-05-06 12:51:45 +02:00
0x41CEA55
1b2e2cd968
Add strlcpy implementation (#2395) 2024-04-19 17:16:40 +02:00
Luca Deri
ad117bfaab
Domain Classification Improvements (#2396)
* Added
size_t ndpi_compress_str(const char * in, size_t len, char * out, size_t bufsize);
size_t ndpi_decompress_str(const char * in, size_t len, char * out, size_t bufsize);

used to compress short strings such as domain names. This code is based on
https://github.com/Ed-von-Schleck/shoco

* Major code rewrite for ndpi_hash and ndpi_domain_classify

* Improvements to make sure custom categories are loaded and enabled

* Fixed string encoding

* Extended SalesForce/Cloudflare domains list
2024-04-18 23:21:40 +02:00
Luca Deri
95897c6436 Fixed minor glitches 2024-04-15 14:25:26 +02:00
Ivan Nardi
8edb2f133c
STUN: add support for ipv6 in some metadata (#2389) 2024-04-13 14:12:20 +02:00
Luca Deri
b83eb7c7a2 Implemented STUN peer_address, relayed_address, response_origin, other_address parsing
Added code to ignore invalid STUN realm
Extended JSON output with STUN information
2024-04-12 19:50:04 +02:00
Ivan Nardi
1b3ef7d7b2
STUN: improve extraction of Mapped-Address metadata (#2370)
Enable parsing of Mapped-Address attribute for all STUN flows: that
means that STUN classification might require more packets.

Add a configuration knob to enable/disable this feature.

Note that we can have (any) STUN metadata also for flows *not*
classified as STUN (because of DTLS).

Add support for ipv6.

Restore the correct extra dissection logic for Telegram flows.
2024-04-08 10:24:51 +02:00
Luca Deri
9185c2ccc4 Added support for STUN Mapped IP address 2024-04-03 23:03:46 +02:00
Toni
41eef9246c
Disable -Wno-unused-parameter -Wno-unused-function. (#2358)
* unused parameters and functions pollute the code and decrease readability

Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
2024-04-03 14:10:21 +02:00
Luca Deri
51f5fc7140
Added support for roaring bitmap v3 (#2355)
* Integrated RoaringBitmap v3

* Renamed ndpi_bitmap64 ro ndpi_bitmap64_fuse

* Fixes to ndpi_bitmap for new roaring library

* Fixes for bitmap serialization

* Fixed format

* Warning fix

* Conversion fix

* Warning fix

* Added check for roaring v3 support

* Updated file name

* Updated path

* Uses clang-9 (instead of clang-7) for builds

* Fixed fuzz_ds_bitmap64_fuse

* Fixes nDPI printf handling

* Disabled printf

* Yet another printf fix

* Cleaup

* Fx for compiling on older platforms

* Fixes for old compilers

* Initialization changes

* Added compiler check

* Fixes for old compilers

* Inline function is not static inline

* Added missing include
2024-03-25 08:15:19 +01:00
Ivan Nardi
21da53d3a0
ahocorasick: improve matching with subdomains (#2331)
The basic idea is to have the following logic:
* pattern "DOMAIN" matches the domain itself (i.e exact match) *and* any
subdomains (i.e. "ANYTHING.DOMAIN")
* pattern "DOMAIN." matches *also* any strings for which is a prefix
[please, note that this kind of match is handy but it is quite
dangerous...]
* pattern "-DOMAIN" matches *also* any strings for which is a postfix

Examples:
* pattern "wikipedia.it":
  * "wikipiedia.it" -> OK
  * "foo.wikipedia.it -> OK
  * "foowikipedia.it -> NO MATCH
  * "wikipedia.it.com -> NO MATCH
* pattern "wikipedia.":
  * "wikipedia.it" -> OK
  * "foo.wikipedia.it -> OK
  * "foowikipedia.it -> NO MATCH
  * "wikipedia.it.com -> OK
* pattern "-wikipedia.it":
  * "wikipedia.it" -> NO MATCH
  * "foo.wikipedia.it -> NO MATCH
  * "0001-wikipedia.it -> OK
  * "foo.0001-wikipedia.it -> OK

Bottom line:
* exact match
* prefix with "." (always, implicit)
* prefix with "-" (only if esplicitly set)
* postfix with "." (only if esplicitly set)

That means that the patterns cannot start with '.' anymore.

Close #2330
2024-03-06 19:25:59 +01:00
Ivan Nardi
c2b5b48fc8
ndpiReader: restore ndpiReader -x $DOMAIN_NAME functionality (#2329) 2024-02-26 20:50:07 +01:00
Nardi Ivan
0d36d9c8c2 Remove spurious call to exit() 2024-02-12 17:10:45 +01:00
Luca Deri
4072cb8862 Added stress test 2024-02-11 22:35:33 +01:00
Toni
ede25cc2b3
Improve ndpi_set_config error printing. (#2300)
* exit `ndpiReader` if a invalid configuration setting detected

Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
2024-02-02 15:15:30 +01:00
Ivan Nardi
400cd516b5
Allow multiple struct ndpi_detection_module_struct to share some state (#2271)
Add the concept of "global context".

Right now every instance of `struct ndpi_detection_module_struct` (we
will call it "local context" in this description) is completely
independent from each other. This provide optimal performances in
multithreaded environment, where we pin each local context to a thread,
and each thread to a specific CPU core: we don't have any data shared
across the cores.

Each local context has, internally, also some information correlating
**different** flows; something like:
```
if flow1 (PeerA <-> Peer B) is PROTOCOL_X; then
  flow2 (PeerC <-> PeerD) will be PROTOCOL_Y
```
To get optimal classification results, both flow1 and flow2 must be
processed by the same local context. This is not an issue at all in the far
most common scenario where there is only one local context, but it might
be impractical in some more complex scenarios.

Create the concept of "global context": multiple local contexts can use
the same global context and share some data (structures) using it.
This way the data correlating multiple flows can be read/write from
different local contexts.
This is an optional feature, disabled by default.

Obviously data structures shared in a global context must be thread safe.
This PR updates the code of the LRU implementation to be, optionally,
thread safe.

Right now, only the LRU caches can be shared; the other main structures
(trees and automas) are basically read-only: there is little sense in
sharing them. Furthermore, these structures don't have any information
correlating multiple flows.

Every LRU cache can be shared, independently from the others, via
`ndpi_set_config(ndpi_struct, NULL, "lru.$CACHE_NAME.scope", "1")`.

It's up to the user to find the right trade-off between performances
(i.e. without shared data) and classification results (i.e. with some
shared data among the local contexts), depending on the specific traffic
patterns and on the algorithms used to balance the flows across the
threads/cores/local contexts.

Add some basic examples of library initialization in
`doc/library_initialization.md`.

This code needs libpthread as external dependency. It shouldn't be a big
issue; however a configure flag has been added to disable global context
support. A new CI job has been added to test it.

TODO: we should need to find a proper way to add some tests on
multithreaded enviroment... not an easy task...

*** API changes ***

If you are not interested in this feature, simply add a NULL parameter to
any `ndpi_init_detection_module()` calls.
2024-02-01 15:33:11 +01:00
Luca Deri
65b9c68d7d Fixed loading of non-ICANN domains that caused false positives with ndpi_load_domain_suffixes
Minor hash optimization
2024-01-27 20:40:27 +01:00
Ivan Nardi
9b26e74bb7
example: rework code between ndpiReader.c and reader_util.c (#2273) 2024-01-22 18:12:06 +01:00
Ivan Nardi
82e8bf91dd
Improve handling of custom rules (#2276)
Avoid collisions between user-ids and internal-ids protocols in the
`example/protos.txt` file.
Add a new value for the classification confidence:
`NDPI_CONFIDENCE_CUSTOM_RULE`

With `./example/ndpiReader -p example/protos.txt -H` we now see also the
custom protocols and their internal/external ids:

```
nDPI supported protocols:
 Id Userd-id Protocol               Layer_4    Nw_Proto Breed        Category
  0        0 Unknown                TCP        X        Unrated      Unspecified

...

387      387 Mumble                 UDP        X        Fun          VoIP
388      388 iSCSI                  TCP                 Acceptable   Unspecified
389      389 Kibana                 TCP                 Acceptable   Unspecified
390      390 TestProto              TCP                 Acceptable   Unspecified
391      391 HomeRouter             TCP                 Acceptable   Unspecified
392      392 CustomProtocol         TCP                 Acceptable   Unspecified
393      393 AmazonPrime            TCP                 Acceptable   Unspecified
394      394 CustomProtocolA        TCP                 Acceptable   Unspecified
395      395 CustomProtocolB        TCP                 Acceptable   Unspecified
396      800 CustomProtocolC        TCP                 Acceptable   Unspecified
397     1024 CustomProtocolD        TCP                 Acceptable   Unspecified
398     2048 CustomProtocolE        TCP                 Acceptable   Unspecified
399     2049 CustomProtocolF        TCP                 Acceptable   Unspecified
400     2050 CustomProtocolG        TCP                 Acceptable   Unspecified
401    65535 CustomProtocolH        TCP                 Acceptable   Unspecified
```

We likely need to take a better look in general at the iteration between
internal and external protocols ids...

This PR fixes the issue observed in
https://github.com/ntop/nDPI/pull/2274#discussion_r1460674874 and in
https://github.com/ntop/nDPI/pull/2275.
2024-01-21 19:53:32 +01:00
Ivan Nardi
42d23cff6a
config: follow-up (#2268)
Some changes in the parameters names.
Add a fuzzer to fuzz the configuration file format.
Add the infrastructure to configuratin callbacks.
Add an helper to map LRU cache indexes to names.
2024-01-20 16:14:41 +01:00
Nardi Ivan
0712d496fe config: allow configuration of guessing algorithms 2024-01-18 10:21:24 +01:00
Nardi Ivan
6c85f10cd5 config: move debug/log configuration to the new API 2024-01-18 10:21:24 +01:00
Nardi Ivan
88720331ae config: remove enum ndpi_prefs 2024-01-18 10:21:24 +01:00
Nardi Ivan
1289951b32 config: remove ndpi_set_detection_preferences() 2024-01-18 10:21:24 +01:00
Nardi Ivan
311d8b6dae config: move cfg of aggressiviness and opportunistic TLS to the new API 2024-01-18 10:21:24 +01:00
Nardi Ivan
f55358973f config: move LRU cache configurations to the new API 2024-01-18 10:21:24 +01:00
Nardi Ivan
3107a95881 Make ndpi_finalize_initialization() returns an error code
We should check if the initialization was fine or not
2024-01-18 10:21:24 +01:00
Nardi Ivan
d72a760ac3 New API for library configuration
This is the first step into providing (more) configuration options in nDPI.

The idea is to have a simple way to configure (most of) nDPI: only one
function (`ndpi_set_config()`) to set any configuration parameters
(in the present or on in the future) and we try to keep this function
prototype as agnostic as possible.

You can configure the library:
* via API, using `ndpi_set_config()`
* via a configuration file, in a text format

This way, anytime we need to add a new configuration parameter:
* we don't need to add two public functions (a getter and a setter)
* we don't break API/ABI compatibility of the library; even changing
the parameter type (from integer to a list of integer, for example)
doesn't break the compatibility.

The complete list of configuration options is provided in
`doc/configuration_parameters.md`.

As a first example, two configuration knobs are provided:
* the ability to enable/disable the extraction of the sha1 fingerprint of
the TLS certificates.
* the upper limit on the number of packets per flow that will be subject
to inspection
2024-01-18 10:21:24 +01:00
Luca
ca7df1db82 Improved ndpi_get_host_domain 2024-01-16 07:25:03 +01:00
Luca
1637a991a4 Added ndpi_get_host_domain() for returning the host domain
vs ndpi_get_host_domain_prefix() that instead returnd the host TLD
2024-01-16 06:56:51 +01:00
Ivan Nardi
111015b872
ndpiReader: improve the check on max number of pkts processed per flow (#2261)
Allow to disable this check.

I don't know how much sense these limits have in the application
(especially with those default values...) since we have always had a
hard limit on the library itself (`max_packets_to_process` set to 32).
The only value might be that they provide different limits for TCP and
UDP traffic.

Keep them for the time being...
2024-01-15 20:12:57 +01:00
Nardi Ivan
b22fa558ff ndpiReader: fix memory leak
Change the working directory of `ndpiReader` in the Github Actions so
that it can load the domain suffix list during `domainsUnitTest()`
2024-01-15 19:49:27 +01:00