Commit graph

505 commits

Author SHA1 Message Date
Alfredo Cardigliano
23a4761276 Update copyright 2022-01-03 11:00:45 +01:00
Ivan Nardi
91bb77a880
A final(?) effort to reduce memory usage per flow (#1389)
Remove some unused fields and re-organize other ones.
In particular:
* Update the parameters of `ndpi_ssl_version2str()` function
* Zattoo, Thunder: these timestamps aren't really used.
* Ftp/mail: these protocols are dissected only over TCP.
* Attention must be paid to TLS.Bittorrent flows to avoid invalid
read/write to `flow->protos.bittorrent.hash` field.

This is the last(?) commit of a long series (see 22241a1d, 227e586e,
730c2360, a8ffcd8b) aiming to reduce library memory consumption.

Before, at nDPI 4.0 (more precisly, at a6b10cf7, because memory stats
were wrong until that commit):
```
nDPI Memory statistics:
	nDPI Memory (once):      221.15 KB
	Flow Memory (per flow):  2.94 KB
```
Now:
```
nDPI Memory statistics:
	nDPI Memory (once):      231.71 KB
	Flow Memory (per flow):  1008 B       <---------
```
i.e. memory usage per flow has been reduced by 66%, dropping below the
psychological threshold of 1 KB.

To further reduce this value, we probably need to look into #1279:
let's fight this battle another day.
2021-12-22 19:54:06 +01:00
Ivan Nardi
947896ad7d
Fix configure script (after fb85dac9) (#1381)
Fix/disable some LGTM warnings
2021-12-04 20:22:05 +01:00
Luca Deri
fe2822c6a8 Added example for finding similarities in RRDs using nDPI statistical APIs 2021-12-04 10:09:01 +01:00
Ivan Nardi
b1e9245d94
ndpiReader: slight simplificaton of the output (#1378) 2021-11-27 17:32:23 +01:00
Ivan Nardi
a8ffcd8bb0
Rework how hostname/SNI info is saved (#1330)
Looking at `struct ndpi_flow_struct` the two bigger fields are
`host_server_name[240]` (mainly for HTTP hostnames and DNS domains) and
`protos.tls_quic.client_requested_server_name[256]`
(for TLS/QUIC SNIs).

This commit aims to reduce `struct ndpi_flow_struct` size, according to
two simple observations:
 1) maximum one of these two fields is used for each flow. So it seems safe
to merge them;
 2) even if hostnames/SNIs might be very long, in practice they are rarely
longer than a fews tens of bytes. So, using a (single) large buffer is a
waste of memory for all kinds of flows. If we need to truncate the name,
we keep the *last* characters, easing domain matching.

Analyzing some real traffic, it seems safe to assume that the vast
majority of hostnames/SNIs is shorter than 80 bytes.

Hostnames/SNIs are always converted to lowercase.

Attention was given so as to be sure that unit-tests outputs are not
affected by this change.

Because of a bug, TLS/QUIC SNI were always truncated to 64 bytes (the
*first* 64 ones): as a consequence, there were some "Suspicious DGA
domain name" and "TLS Certificate Mismatch" false positives.
2021-11-24 10:46:48 +01:00
Ivan Nardi
afc2b641eb
Fix writes to flow->protos union fields (#1354)
We can write to `flow->protos` only after a proper classification.

This issue has been found in Kerberos, DHCP, HTTP, STUN, IMO, FTP,
SMTP, IMAP and POP code.
There are two kinds of fixes:
 * write to `flow->protos` only if a final protocol has been detected
 * move protocol state out of `flow->protos`
The hard part is to find, for each protocol, the right tradeoff between
memory usage and code complexity.

Handle Kerberos like DNS: if we find a request, we set the protocol
and an extra callback to further parsing the reply.

For all the other protocols, move the state out of `flow->protos`. This
is an issue only for the FTP/MAIL stuff.

Add DHCP Class Identification value to the output of ndpiReader and to
the Jason serialization.

Extend code coverage of fuzz tests.

Close #1343
Close #1342
2021-11-15 16:20:57 +01:00
Ivan Nardi
acb1de69aa
Reduce memory used by ndpiReader (#1371)
`ndpiReader` is only an example, aiming to show nDPI capabilities
and integration, without any claim about performances.

Nonetheless its memory usage per flow is *huge*, limiting the kinds
of traces that we can test on a "normal" hardware (example: scan
attacks).

The key reason of that behaviour is that we preallocate all the memory
needed for *all* the available features.

Try to reduce memory usage simply allocating some structures only
when they are really needed. Most significant example: JOY algorithms.

This way we should use a lot less memory in the two most common
user-cases:
 * `ndpiReader` invoked without any particular flag (i.e `ndpiReader -i
$FILENAME_OR_IFACE`)
 * internal unit tests

Before (on x86_64):
```
struct ndpi_flow_info {
[...]
	/* size: 7320, cachelines: 115, members: 72 */
```
After:
```
struct ndpi_flow_info {
[...]
	/* size: 2128, cachelines: 34, members: 75 */
```
2021-11-11 12:37:25 +01:00
Ivan Nardi
3e5491fa10
Add detection of OCSP (#1370)
This protocol is detected via HTTP Content-Type header.

Until 89d548f9, nDPI had a dedicated automa (`content_automa`) to
classify a HTTP flow according to this header. Since then, this automa has
been useless because it is always empty.
Re-enable it to match only a string seems overkilling.

Remove all `content_automa` leftovers.
2021-11-11 12:36:55 +01:00
Luca Deri
937357e4bc Implemented ndpi_ses_fitting() and ndpi_des_fitting()
for comuting the best alpha/beta values for exponential smoothing
2021-10-12 13:08:58 +02:00
Ivan Nardi
2dfc478ad4
Fix compilation with clang-13 or if some debug macros are enabled (#1326) 2021-10-06 18:32:15 +02:00
Luca Deri
408d78e628 Improved DGA detection for skipping potential DGAs of known/popular domain names 2021-10-05 16:51:24 +02:00
Luca Deri
bb7aff6526 Added -a <num> to ndpiReader for generating OPNsense configuration
See https://github.com/ntop/opnsense
2021-10-04 22:34:49 +02:00
Luca Deri
fd0e65cb57 Removed trace 2021-10-03 23:41:31 +02:00
Alfredo Cardigliano
721031210b Fix warning 2021-09-28 09:44:31 +02:00
Luca Deri
af35d6b963 Added unit test for bitmap iteration 2021-09-27 12:47:05 +02:00
Luca Deri
72df138a7d Warnign fix 2021-09-27 09:08:04 +02:00
Luca Deri
1efabef4cf Added API for handling compressed bitmaps
ndpi_bitmap* ndpi_bitmap_alloc();
void ndpi_bitmap_free(ndpi_bitmap* b);
u_int64_t ndpi_bitmap_cardinality(ndpi_bitmap* b);
void ndpi_bitmap_set(ndpi_bitmap* b, u_int32_t value);
void ndpi_bitmap_unset(ndpi_bitmap* b, u_int32_t value);
bool ndpi_bitmap_isset(ndpi_bitmap* b, u_int32_t value);
void ndpi_bitmap_clear(ndpi_bitmap* b);
size_t ndpi_bitmap_serialize(ndpi_bitmap* b, char **buf);
ndpi_bitmap* ndpi_bitmap_deserialize(char *buf);

based on https://github.com/RoaringBitmap/CRoaring
2021-09-26 22:55:15 +02:00
Alfredo Cardigliano
57fce4e350 Fix unused var 2021-09-03 17:46:04 +02:00
Luca Deri
a6b10cf73f Fixed memory stats 2021-08-26 12:12:22 +02:00
Luca Deri
88040f0449 Compilation fix 2021-08-20 22:47:46 +02:00
Ivan Nardi
8fdffbf3a1
Compile everything with "-W -Wall -Wno-unused-parameter" flags (#1276)
Fix all the warnings.

Getting rid of "-Wno-unused-parameter" is quite complex because some
parameters usage depends on compilation variable (i.e.
`--enable-debug-messages`).

The "-Werror" flag has been added only in Travis builds to avoid
breaking the builds to users using uncommon/untested
OS/compiler/enviroment.

Tested on:
* x86_64; Ubuntu 20.04; gcc 7,8,9,10,11; clang 7,8,9,10,11,12
* x86_64; CentOS 7.7; gcc 4.8.5 (with "--disable-gcrypt" flag)
* Raspberry 4; Debian 10.10; gcc 8.3.0
2021-08-20 18:11:13 +02:00
Toni
8d0c7b1fae
Fixed Mingw64 build, SonerCloud-CI and more. (#1273)
* Added ARM build and unit test run for SonarCloud-CI.

Signed-off-by: Toni Uhlig <matzeton@googlemail.com>

* Fixed Mingw64 build.

 * adapted to SonarCloud-CI workflow
 * removed broken and incomplete Windows example (tested on VS2017/VS2019)
 * removed unnecessary include (e.g. pthread.h for the library which does not make use of it)

Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
2021-08-18 11:34:16 +02:00
Luca Deri
a13f1fe52f Report whether a protocol is encrypted 2021-08-07 17:35:34 +02:00
Toni
6ad0d6666c
Implemented function to retrieve flow information. #1253 (#1254)
* fixed [h]euristic typo

Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
2021-07-23 10:37:20 +02:00
Ivan Nardi
57b8969a3d
Fix setting of flow risks on 32 bit machines (#1251)
Since 19a29e1e (NDPI_TLS_CERT_VALIDITY_TOO_LONG is 32), unit tests are
failing on 32 bit machines (i.e Raspberry 4)
2021-07-19 16:22:39 +02:00
Ivan Nardi
cccf794265
ndpiReader: add statistics about nDPI performance (#1240)
The goal is to have a (roughly) idea about how many packets nDPI needs
to properly classify a flow.

Log this information (and guessed flows number too) during unit tests,
to keep track of improvements/regressions across commits.
2021-07-13 12:28:39 +02:00
Vitaly Lavrov
c418b7110b
ahoсorasick. Code review. Part 2. (#1236)
Simplified the process of adding lines to AC_AUTOMATA_t.
Use the ndpi_string_to_automa() function to add patterns with domain names.
For other cases can use ndpi_add_string_value_to_automa().

ac_automata_feature(ac_automa, AC_FEATURE_LC) allows adding
and compare data in a case insensitive manner. For mandatory pattern comparison
from the end of the line, the "ac_pattern.rep.at_end=1" flag is used.
This eliminated unnecessary conversions to lowercase and adding "$" for
end-of-line matching in domain name patterns.

ac_match_handler() has been renamed ac_domain_match_handler() and has been greatly simplified.
ac_domain_match_handler() looks for the template with the highest domain level.
For special cases it is possible to manually specify the domain level.
Added test for checking ambiguous domain names like:
 - short.weixin.qq.com is QQ, not Wechat
 - instagram.faae1-1.fna.fbcdn.net is Instagram, not Facebook

If you specify a NULL handler when creating the AC_AUTOMATA_t structure,
then a pattern with the maximum length that satisfies the search conditions will be found
(exact match, from the beginning of the string, from the end of the string, or a substring).

Added debugging for ac_automata_search.
To do this, you need to enable debugging globally using ac_automata_enable_debug(1) and
enable debugging in the AC_AUTOMATA_t structure using ac_automata_name("name", AC_FEATURE_DEBUG).
The search will display "name" and a list of matching patterns.
Running "AHO_DEBUG=1 ndpiReader ..." will show the lines that were searched for templates
and which templates were found.

The ac_automata_dump() prototype has been changed. Now it outputs data to a file.
If it is specified as NULL, then the output will be directed to stdout.
If you need to get data as a string, then use open_memstream().

Added the ability to run individual tests via the do.sh script
2021-07-12 17:39:43 +02:00
Luca Deri
6a1fd9ad97 Added missing check to prevent crashes 2021-06-23 12:17:21 +02:00
Vitaly Lavrov
2234b97149
ndpiReader: memory leak (#1215)
Non-critical bugs.
If a file list is used, then all files except the last are not closed.
Opening the next file loses the memory allocated via pcap_open_offline() for the previous file.
If a bpf filter is used, then no memory is freed after pcap_compile.
2021-06-23 12:04:03 +02:00
Alfredo Cardigliano
4aefbe0c7a Call ac_automata_release with free_pattern = 1 (malloc'ed patterns expected in ndpi_add_string_to_automa) 2021-06-14 14:41:14 +02:00
Luca Deri
380286c069 Fixes https://github.com/ntop/ntopng/issues/5482 2021-06-11 22:21:03 +02:00
Ivan Nardi
9d427faafe
ndpiReader: fix collecting of risks statistics (#1192) 2021-06-01 16:50:46 +02:00
Luca
c620858671 Reworked ndpi flow risk score adding client and server score 2021-06-01 09:17:26 +02:00
Luca Deri
732bcecd17 Added flow risk score 2021-05-18 21:05:47 +02:00
Luca Deri
86f3c29d03 Typo 2021-05-18 19:52:33 +02:00
Luca Deri
ca15e3295e Added risk/score dump (ndpiReader -h)
Added ndpi_dump_risks_score() API score
2021-05-18 19:34:17 +02:00
Luca Deri
43a8576efb Reworked human readeable string search in flows
Removed fragment manager code
2021-05-17 20:55:06 +02:00
Luca Deri
ac1eaca8a6 Added browser TLS heuristic 2021-05-13 20:00:27 +02:00
Luca Deri
a62be9b8ec Implemented heuristic to detect Safari and Firefox TLS browsing 2021-05-13 12:37:07 +02:00
Toni
87076dcd5b
Fixed obsolete error printing if CTRL-C is pressed. #1165 (#1184)
* This fix was proposed by @robertsong2019

Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
2021-05-11 21:38:56 +02:00
Luca Deri
4297a65ce8 Implemented flow score in Wireshark integration 2021-05-10 22:43:05 +02:00
Luca
ae2470fad4 Initial work towards detection via TLS of browser types 2021-05-06 21:42:06 +02:00
Luca Deri
dd65142020 Compilation fix 2021-04-27 08:26:08 +02:00
Luca Deri
70686249c9 Updated code due to https://github.com/ntop/nDPI/pull/1175 2021-04-27 08:12:14 +02:00
Luca Deri
4a09707e48 Added flow risk to wireshark dissection 2021-04-26 10:17:29 +02:00
Ivan Nardi
fb74785282
Fix some warnings about unused variables/functions (#1160) 2021-04-05 19:21:30 +02:00
Luca Deri
a1dba74346 Trace fix 2021-04-02 12:55:15 +02:00
Luca Deri
4f8ca9485a Fixed incapoatibilities with the latest extcap/wireshark 2021-04-01 23:53:53 +02:00
Luca Deri
fcbc16da00 Fixed invalid guess stats 2021-03-30 17:49:48 +02:00