Commit graph

465 commits

Author SHA1 Message Date
Luca Deri
856d7d2916 Improved DGA detection skipping names containign at least 3 consecutive digits in the first word 2022-03-26 09:59:55 +01:00
Vitaly Lavrov
c390085f91
Bug fixing. (#1459)
The '--enable-debug-messages' option works again.
Fixed warning in ahocorasick.c
Fixed integer overflow in ndpiReader.c for 32bit systems.
2022-02-28 15:01:00 +01:00
Ivan Nardi
fbb9700086
fuzz: purge old sessions (#1451)
At every fuzz iteration (i.e for every trace file):
* keep the same ndpi context (`ndpi_init_detection_module` is very
slow);
* reset the flow table, otherwise it grows indefinitely.

This change should fix the "out-of-memory" errors reported by oss-fuzz.
2022-02-21 20:32:50 +01:00
Luca Deri
a2878af1ee Added newflow risk NDPI_HTTP_CRAWLER_BOT 2022-02-17 17:20:52 +01:00
Ivan Nardi
5bb5bec477
Remove struct ndpi_id_struct (#1427)
Remove the last uses of `struct ndpi_id_struct`.
That code is not really used and it has not been updated for a very long
time: see #1279 for details.

Correlation among flows is achieved via LRU caches.

This change allows to further reduce memory consumption (see also
91bb77a8).

At nDPI 4.0 (more precisly, at a6b10cf, because memory stats
were wrong until that commit):
```
nDPI Memory statistics:
	nDPI Memory (once):      221.15 KB
	Flow Memory (per flow):  2.94 KB
```
Now:
```
nDPI Memory statistics:
	nDPI Memory (once):      235.27 KB
	Flow Memory (per flow):  688 B        <--------
```
i.e. memory usage per flow has been reduced by 77%.

Close #1279
2022-01-30 19:18:12 +01:00
Luca Deri
42d74171b2 Minor cosmetic changes 2022-01-16 12:47:56 +01:00
Luca Deri
406ac7e8c8 Added the ability to specify trusted issueDN often used in companies to self-signed certificates
This allows to avoid triggering alerts for trusted albeit private certificate issuers.

Extended the example/protos.txt with the new syntax for specifying trusted issueDN.
Example:

trusted_issuer_dn:"CN=813845657003339838, O=Code42, OU=TEST, ST=MN, C=US"
2022-01-13 19:06:21 +01:00
Luca Deri
552d199d2e Removed outdated comment 2022-01-11 21:46:30 +01:00
Luca Deri
f5545a80f9 Removed legacy code 2022-01-11 21:45:27 +01:00
Ivan Nardi
3a087e951d
Add a "confidence" field about the reliability of the classification. (#1395)
As a general rule, the higher the confidence value, the higher the
"reliability/precision" of the classification.

In other words, this new field provides an hint about "how" the flow
classification has been obtained.
For example, the application may want to ignore classification "by-port"
(they are not real DPI classifications, after all) or give a second
glance at flows classified via LRU caches (because of false positives).

Setting only one value for the confidence field is a bit tricky: more
work is probably needed in the next future to tweak/fix/improve the logic.
2022-01-11 15:23:39 +01:00
Alfredo Cardigliano
23a4761276 Update copyright 2022-01-03 11:00:45 +01:00
Ivan Nardi
91bb77a880
A final(?) effort to reduce memory usage per flow (#1389)
Remove some unused fields and re-organize other ones.
In particular:
* Update the parameters of `ndpi_ssl_version2str()` function
* Zattoo, Thunder: these timestamps aren't really used.
* Ftp/mail: these protocols are dissected only over TCP.
* Attention must be paid to TLS.Bittorrent flows to avoid invalid
read/write to `flow->protos.bittorrent.hash` field.

This is the last(?) commit of a long series (see 22241a1d, 227e586e,
730c2360, a8ffcd8b) aiming to reduce library memory consumption.

Before, at nDPI 4.0 (more precisly, at a6b10cf7, because memory stats
were wrong until that commit):
```
nDPI Memory statistics:
	nDPI Memory (once):      221.15 KB
	Flow Memory (per flow):  2.94 KB
```
Now:
```
nDPI Memory statistics:
	nDPI Memory (once):      231.71 KB
	Flow Memory (per flow):  1008 B       <---------
```
i.e. memory usage per flow has been reduced by 66%, dropping below the
psychological threshold of 1 KB.

To further reduce this value, we probably need to look into #1279:
let's fight this battle another day.
2021-12-22 19:54:06 +01:00
Ivan Nardi
947896ad7d
Fix configure script (after fb85dac9) (#1381)
Fix/disable some LGTM warnings
2021-12-04 20:22:05 +01:00
Luca Deri
fe2822c6a8 Added example for finding similarities in RRDs using nDPI statistical APIs 2021-12-04 10:09:01 +01:00
Ivan Nardi
b1e9245d94
ndpiReader: slight simplificaton of the output (#1378) 2021-11-27 17:32:23 +01:00
Ivan Nardi
a8ffcd8bb0
Rework how hostname/SNI info is saved (#1330)
Looking at `struct ndpi_flow_struct` the two bigger fields are
`host_server_name[240]` (mainly for HTTP hostnames and DNS domains) and
`protos.tls_quic.client_requested_server_name[256]`
(for TLS/QUIC SNIs).

This commit aims to reduce `struct ndpi_flow_struct` size, according to
two simple observations:
 1) maximum one of these two fields is used for each flow. So it seems safe
to merge them;
 2) even if hostnames/SNIs might be very long, in practice they are rarely
longer than a fews tens of bytes. So, using a (single) large buffer is a
waste of memory for all kinds of flows. If we need to truncate the name,
we keep the *last* characters, easing domain matching.

Analyzing some real traffic, it seems safe to assume that the vast
majority of hostnames/SNIs is shorter than 80 bytes.

Hostnames/SNIs are always converted to lowercase.

Attention was given so as to be sure that unit-tests outputs are not
affected by this change.

Because of a bug, TLS/QUIC SNI were always truncated to 64 bytes (the
*first* 64 ones): as a consequence, there were some "Suspicious DGA
domain name" and "TLS Certificate Mismatch" false positives.
2021-11-24 10:46:48 +01:00
Ivan Nardi
afc2b641eb
Fix writes to flow->protos union fields (#1354)
We can write to `flow->protos` only after a proper classification.

This issue has been found in Kerberos, DHCP, HTTP, STUN, IMO, FTP,
SMTP, IMAP and POP code.
There are two kinds of fixes:
 * write to `flow->protos` only if a final protocol has been detected
 * move protocol state out of `flow->protos`
The hard part is to find, for each protocol, the right tradeoff between
memory usage and code complexity.

Handle Kerberos like DNS: if we find a request, we set the protocol
and an extra callback to further parsing the reply.

For all the other protocols, move the state out of `flow->protos`. This
is an issue only for the FTP/MAIL stuff.

Add DHCP Class Identification value to the output of ndpiReader and to
the Jason serialization.

Extend code coverage of fuzz tests.

Close #1343
Close #1342
2021-11-15 16:20:57 +01:00
Ivan Nardi
acb1de69aa
Reduce memory used by ndpiReader (#1371)
`ndpiReader` is only an example, aiming to show nDPI capabilities
and integration, without any claim about performances.

Nonetheless its memory usage per flow is *huge*, limiting the kinds
of traces that we can test on a "normal" hardware (example: scan
attacks).

The key reason of that behaviour is that we preallocate all the memory
needed for *all* the available features.

Try to reduce memory usage simply allocating some structures only
when they are really needed. Most significant example: JOY algorithms.

This way we should use a lot less memory in the two most common
user-cases:
 * `ndpiReader` invoked without any particular flag (i.e `ndpiReader -i
$FILENAME_OR_IFACE`)
 * internal unit tests

Before (on x86_64):
```
struct ndpi_flow_info {
[...]
	/* size: 7320, cachelines: 115, members: 72 */
```
After:
```
struct ndpi_flow_info {
[...]
	/* size: 2128, cachelines: 34, members: 75 */
```
2021-11-11 12:37:25 +01:00
Ivan Nardi
3e5491fa10
Add detection of OCSP (#1370)
This protocol is detected via HTTP Content-Type header.

Until 89d548f9, nDPI had a dedicated automa (`content_automa`) to
classify a HTTP flow according to this header. Since then, this automa has
been useless because it is always empty.
Re-enable it to match only a string seems overkilling.

Remove all `content_automa` leftovers.
2021-11-11 12:36:55 +01:00
Luca Deri
937357e4bc Implemented ndpi_ses_fitting() and ndpi_des_fitting()
for comuting the best alpha/beta values for exponential smoothing
2021-10-12 13:08:58 +02:00
Ivan Nardi
2dfc478ad4
Fix compilation with clang-13 or if some debug macros are enabled (#1326) 2021-10-06 18:32:15 +02:00
Luca Deri
408d78e628 Improved DGA detection for skipping potential DGAs of known/popular domain names 2021-10-05 16:51:24 +02:00
Luca Deri
bb7aff6526 Added -a <num> to ndpiReader for generating OPNsense configuration
See https://github.com/ntop/opnsense
2021-10-04 22:34:49 +02:00
Luca Deri
fd0e65cb57 Removed trace 2021-10-03 23:41:31 +02:00
Alfredo Cardigliano
721031210b Fix warning 2021-09-28 09:44:31 +02:00
Luca Deri
af35d6b963 Added unit test for bitmap iteration 2021-09-27 12:47:05 +02:00
Luca Deri
72df138a7d Warnign fix 2021-09-27 09:08:04 +02:00
Luca Deri
1efabef4cf Added API for handling compressed bitmaps
ndpi_bitmap* ndpi_bitmap_alloc();
void ndpi_bitmap_free(ndpi_bitmap* b);
u_int64_t ndpi_bitmap_cardinality(ndpi_bitmap* b);
void ndpi_bitmap_set(ndpi_bitmap* b, u_int32_t value);
void ndpi_bitmap_unset(ndpi_bitmap* b, u_int32_t value);
bool ndpi_bitmap_isset(ndpi_bitmap* b, u_int32_t value);
void ndpi_bitmap_clear(ndpi_bitmap* b);
size_t ndpi_bitmap_serialize(ndpi_bitmap* b, char **buf);
ndpi_bitmap* ndpi_bitmap_deserialize(char *buf);

based on https://github.com/RoaringBitmap/CRoaring
2021-09-26 22:55:15 +02:00
Alfredo Cardigliano
57fce4e350 Fix unused var 2021-09-03 17:46:04 +02:00
Luca Deri
a6b10cf73f Fixed memory stats 2021-08-26 12:12:22 +02:00
Luca Deri
88040f0449 Compilation fix 2021-08-20 22:47:46 +02:00
Ivan Nardi
8fdffbf3a1
Compile everything with "-W -Wall -Wno-unused-parameter" flags (#1276)
Fix all the warnings.

Getting rid of "-Wno-unused-parameter" is quite complex because some
parameters usage depends on compilation variable (i.e.
`--enable-debug-messages`).

The "-Werror" flag has been added only in Travis builds to avoid
breaking the builds to users using uncommon/untested
OS/compiler/enviroment.

Tested on:
* x86_64; Ubuntu 20.04; gcc 7,8,9,10,11; clang 7,8,9,10,11,12
* x86_64; CentOS 7.7; gcc 4.8.5 (with "--disable-gcrypt" flag)
* Raspberry 4; Debian 10.10; gcc 8.3.0
2021-08-20 18:11:13 +02:00
Toni
8d0c7b1fae
Fixed Mingw64 build, SonerCloud-CI and more. (#1273)
* Added ARM build and unit test run for SonarCloud-CI.

Signed-off-by: Toni Uhlig <matzeton@googlemail.com>

* Fixed Mingw64 build.

 * adapted to SonarCloud-CI workflow
 * removed broken and incomplete Windows example (tested on VS2017/VS2019)
 * removed unnecessary include (e.g. pthread.h for the library which does not make use of it)

Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
2021-08-18 11:34:16 +02:00
Luca Deri
a13f1fe52f Report whether a protocol is encrypted 2021-08-07 17:35:34 +02:00
Toni
6ad0d6666c
Implemented function to retrieve flow information. #1253 (#1254)
* fixed [h]euristic typo

Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
2021-07-23 10:37:20 +02:00
Ivan Nardi
57b8969a3d
Fix setting of flow risks on 32 bit machines (#1251)
Since 19a29e1e (NDPI_TLS_CERT_VALIDITY_TOO_LONG is 32), unit tests are
failing on 32 bit machines (i.e Raspberry 4)
2021-07-19 16:22:39 +02:00
Ivan Nardi
cccf794265
ndpiReader: add statistics about nDPI performance (#1240)
The goal is to have a (roughly) idea about how many packets nDPI needs
to properly classify a flow.

Log this information (and guessed flows number too) during unit tests,
to keep track of improvements/regressions across commits.
2021-07-13 12:28:39 +02:00
Vitaly Lavrov
c418b7110b
ahoсorasick. Code review. Part 2. (#1236)
Simplified the process of adding lines to AC_AUTOMATA_t.
Use the ndpi_string_to_automa() function to add patterns with domain names.
For other cases can use ndpi_add_string_value_to_automa().

ac_automata_feature(ac_automa, AC_FEATURE_LC) allows adding
and compare data in a case insensitive manner. For mandatory pattern comparison
from the end of the line, the "ac_pattern.rep.at_end=1" flag is used.
This eliminated unnecessary conversions to lowercase and adding "$" for
end-of-line matching in domain name patterns.

ac_match_handler() has been renamed ac_domain_match_handler() and has been greatly simplified.
ac_domain_match_handler() looks for the template with the highest domain level.
For special cases it is possible to manually specify the domain level.
Added test for checking ambiguous domain names like:
 - short.weixin.qq.com is QQ, not Wechat
 - instagram.faae1-1.fna.fbcdn.net is Instagram, not Facebook

If you specify a NULL handler when creating the AC_AUTOMATA_t structure,
then a pattern with the maximum length that satisfies the search conditions will be found
(exact match, from the beginning of the string, from the end of the string, or a substring).

Added debugging for ac_automata_search.
To do this, you need to enable debugging globally using ac_automata_enable_debug(1) and
enable debugging in the AC_AUTOMATA_t structure using ac_automata_name("name", AC_FEATURE_DEBUG).
The search will display "name" and a list of matching patterns.
Running "AHO_DEBUG=1 ndpiReader ..." will show the lines that were searched for templates
and which templates were found.

The ac_automata_dump() prototype has been changed. Now it outputs data to a file.
If it is specified as NULL, then the output will be directed to stdout.
If you need to get data as a string, then use open_memstream().

Added the ability to run individual tests via the do.sh script
2021-07-12 17:39:43 +02:00
Luca Deri
6a1fd9ad97 Added missing check to prevent crashes 2021-06-23 12:17:21 +02:00
Vitaly Lavrov
2234b97149
ndpiReader: memory leak (#1215)
Non-critical bugs.
If a file list is used, then all files except the last are not closed.
Opening the next file loses the memory allocated via pcap_open_offline() for the previous file.
If a bpf filter is used, then no memory is freed after pcap_compile.
2021-06-23 12:04:03 +02:00
Alfredo Cardigliano
4aefbe0c7a Call ac_automata_release with free_pattern = 1 (malloc'ed patterns expected in ndpi_add_string_to_automa) 2021-06-14 14:41:14 +02:00
Luca Deri
380286c069 Fixes https://github.com/ntop/ntopng/issues/5482 2021-06-11 22:21:03 +02:00
Ivan Nardi
9d427faafe
ndpiReader: fix collecting of risks statistics (#1192) 2021-06-01 16:50:46 +02:00
Luca
c620858671 Reworked ndpi flow risk score adding client and server score 2021-06-01 09:17:26 +02:00
Luca Deri
732bcecd17 Added flow risk score 2021-05-18 21:05:47 +02:00
Luca Deri
86f3c29d03 Typo 2021-05-18 19:52:33 +02:00
Luca Deri
ca15e3295e Added risk/score dump (ndpiReader -h)
Added ndpi_dump_risks_score() API score
2021-05-18 19:34:17 +02:00
Luca Deri
43a8576efb Reworked human readeable string search in flows
Removed fragment manager code
2021-05-17 20:55:06 +02:00
Luca Deri
ac1eaca8a6 Added browser TLS heuristic 2021-05-13 20:00:27 +02:00
Luca Deri
a62be9b8ec Implemented heuristic to detect Safari and Firefox TLS browsing 2021-05-13 12:37:07 +02:00