Commit graph

91 commits

Author SHA1 Message Date
Ivan Nardi
c85f2fb0f4
TLS: add basic, basic, detection of Encrypted ClientHello (#2053) 2023-07-21 03:41:43 +02:00
Chiara Maggi
0b0f255cc2
added feature to extract filename from http attachment (#2037)
* added feature to extract filename from http attachment

* fixed some issues

* added check for filename format

* added check for filename format

* remove an unnecessary print

* changed the size from 952 to 960

* modified some test result files

* small changes string size

* comment removed and mallocs checked
2023-07-11 22:45:19 +02:00
Ivan Nardi
88425e0199
Simplify the report of streaming multimedia info (#2026)
The two fields `flow->flow_type` and `flow->protos.rtp.stream_type` are
pretty much identical: rename the former in `flow->flow_multimedia_type`
and remove the latter.
2023-06-26 12:05:16 +02:00
Luca Deri
d0609ea601 Implemented Zoom/Teams stream type detection 2023-06-14 23:44:57 +02:00
Ivan Nardi
46ff069117
ndpiReader: improve printing of payload statistics (#1989)
Add a basic unit test

Fix an endianess issue
2023-05-29 16:53:11 +02:00
Ivan Nardi
0223d3c4f5
HTTP: improve extraction of metadata and of flow risks (#1959) 2023-05-05 13:35:20 +02:00
Ivan Nardi
22fb8349b9
ndpiReader: print how many packets (per flow) were needed to perform full DPI (#1891)
Average values are already printed, but this change should ease to
identify regressions/improvements.
2023-03-01 21:50:47 +01:00
Ivan Nardi
b51a2ac72a
fuzz: some improvements and add two new fuzzers (#1881)
Remove `FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION` define from
`fuzz/Makefile.am`; it is already included by the main configure script
(when fuzzing).

Add a knob to force disabling of AESNI optimizations: this way we can
fuzz also no-aesni crypto code.

Move CRC32 algorithm into the library.

Add some fake traces to extend fuzzing coverage. Note that these traces
are hand-made (via scapy/curl) and must not be used as "proof" that the
dissectors are really able to identify this kind of traffic.

Some small updates to some dissectors:

CSGO: remove a wrong rule (never triggered, BTW). Any UDP packet starting
with "VS01" will be classified as STEAM (see steam.c around line 111).
Googling it, it seems right so.

XBOX: XBOX only analyses UDP flows while HTTP only TCP ones; therefore
that condition is false.

RTP, STUN: removed useless "break"s

Zattoo: `flow->zattoo_stage` is never set to any values greater or equal
to 5, so these checks are never true.

PPStream: `flow->l4.udp.ppstream_stage` is never read. Delete it.

TeamSpeak: we check for `flow->packet_counter == 3` just above, so the
following check `flow->packet_counter >= 3` is always false.
2023-02-09 20:02:12 +01:00
Ivan Nardi
29c5cc39fb
Some small changes (#1869)
All dissector callbacks should not be exported by the library; make static
some other local functions.
The callback logic in `ndpiReader` has never been used.
With internal libgcrypt, `gcry_control()` should always return no
errors.
We can check `categories` length at compilation time.
2023-01-25 11:44:09 +01:00
Luca Deri
fc7b070030 Added RTP stream type in flow metadata 2022-12-09 14:26:53 +01:00
Luca Deri
e0afc16aa2 Exported HTTP server in metadata 2022-12-05 21:27:30 +01:00
Luca Deri
4231f48059 Added support for Linux Cooked Capture v2 2022-11-16 17:48:28 +01:00
Nardi Ivan
52b562c328 Fix json export of ipv6 addresses
The "string" buffer was to short; better start using `INET6_ADDRSTRLEN`
as reported in the man page of `inet_ntop`.

Close: #1794
2022-11-07 20:36:55 +01:00
Ivan Nardi
ca5ffc4988
TLS: improve handling of ALPN(s) (#1784)
Tell "Advertised" ALPN list from "Negotiated" ALPN; the former is
extracted from the CH, the latter from the SH.

Add some entries to the known ALPN list.

Fix printing of "TLS Supported Versions" field.
2022-10-25 17:06:29 +02:00
Luca
de59eb8237 Added the ability to track the payload via -E and via the new option 'ndpi_track_flow_payload' 2022-10-04 11:26:44 +02:00
Toni
644ad34962
Improved NATPMP dissection. (#1745)
Signed-off-by: Toni Uhlig <matzeton@googlemail.com>

Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
2022-09-21 18:24:04 +02:00
Toni
2e25c36396
Add TiVoConnect dissector. Fixes #1697. (#1699)
* added static assert if supported, to complain if the flow struct changes

Signed-off-by: lns <matzeton@googlemail.com>
2022-08-08 19:04:20 +02:00
Toni
b3e722e5a8
Improved nDPI JSON serialization. (#1689)
* fixed autoconf CFLAGS/LDFLAGS MSAN issue which could lead to build errors
 * introduced portable version of gmtime_r aka ndpi_gmtime_r
 * do as most as possible of the serialization work in ndpi_utils.c
 * use flow2json in ndpiReader

Signed-off-by: lns <matzeton@googlemail.com>
2022-08-02 17:54:44 +02:00
Toni
ed4f106a0d
Add Softether dissector. (#1679)
Signed-off-by: lns <matzeton@googlemail.com>
2022-07-29 19:29:54 +02:00
Ivan Nardi
405a52ed65
Patricia tree, Ahocarasick automa, LRU cache: add statistics (#1683)
Add (basic) internal stats to the main data structures used by the
library; they might be usefull to check how effective these structures
are.

Add an option to `ndpiReader` to dump them; enabled by default in the
unit tests.
This new option enables/disables dumping of "num dissectors calls"
values, too (see b4cb14ec).
2022-07-29 15:25:00 +02:00
Ivan Nardi
df2e11ef51
Revert "Patricia tree, Ahocarasick automa, LRU cache: add statistics (#1677)" (#1682)
This reverts commit bb83899985.
2022-07-29 12:08:40 +02:00
Ivan Nardi
bb83899985
Patricia tree, Ahocarasick automa, LRU cache: add statistics (#1677)
Add (basic) internal stats to the main data structures used by the
library; they might be usefull to check how effective these structures
are.

Add an option to `ndpiReader` to dump them; disabled by default to avoid
too much fuss with the unit tests.
2022-07-29 12:07:41 +02:00
Ivan Nardi
b4cb14ec19
Keep track of how many dissectors calls we made for each flow (#1657) 2022-07-11 09:47:47 +02:00
Luca Deri
f25deeccb1 Added RiskInfo string 2022-05-30 00:32:32 +02:00
Toni
87f93ea4fd
Replaced ndpiReader's libjson-c support with libnDPI's internal serialization interface. (#1535)
* Fixes #1528
 * Serialization Interface should also fuzzed
 * libjson-c may only be used in the unit test to verify the internal serialization interface
 * Serialization Interface supports tlv(broken), csv and json
 * Unit test does work again and requires libjson-c

Signed-off-by: lns <matzeton@googlemail.com>
2022-05-07 09:26:09 +02:00
Ivan Nardi
fbb9700086
fuzz: purge old sessions (#1451)
At every fuzz iteration (i.e for every trace file):
* keep the same ndpi context (`ndpi_init_detection_module` is very
slow);
* reset the flow table, otherwise it grows indefinitely.

This change should fix the "out-of-memory" errors reported by oss-fuzz.
2022-02-21 20:32:50 +01:00
Ivan Nardi
0c70411b1b
Make some protocols more "big-endian" friendly (#1402)
See #1312
2022-01-29 09:18:32 +01:00
Ivan Nardi
3a087e951d
Add a "confidence" field about the reliability of the classification. (#1395)
As a general rule, the higher the confidence value, the higher the
"reliability/precision" of the classification.

In other words, this new field provides an hint about "how" the flow
classification has been obtained.
For example, the application may want to ignore classification "by-port"
(they are not real DPI classifications, after all) or give a second
glance at flows classified via LRU caches (because of false positives).

Setting only one value for the confidence field is a bit tricky: more
work is probably needed in the next future to tweak/fix/improve the logic.
2022-01-11 15:23:39 +01:00
Luca Deri
708d4ea33a Improved user agent analysis 2022-01-09 18:47:47 +01:00
Alfredo Cardigliano
23a4761276 Update copyright 2022-01-03 11:00:45 +01:00
Ivan Nardi
a8ffcd8bb0
Rework how hostname/SNI info is saved (#1330)
Looking at `struct ndpi_flow_struct` the two bigger fields are
`host_server_name[240]` (mainly for HTTP hostnames and DNS domains) and
`protos.tls_quic.client_requested_server_name[256]`
(for TLS/QUIC SNIs).

This commit aims to reduce `struct ndpi_flow_struct` size, according to
two simple observations:
 1) maximum one of these two fields is used for each flow. So it seems safe
to merge them;
 2) even if hostnames/SNIs might be very long, in practice they are rarely
longer than a fews tens of bytes. So, using a (single) large buffer is a
waste of memory for all kinds of flows. If we need to truncate the name,
we keep the *last* characters, easing domain matching.

Analyzing some real traffic, it seems safe to assume that the vast
majority of hostnames/SNIs is shorter than 80 bytes.

Hostnames/SNIs are always converted to lowercase.

Attention was given so as to be sure that unit-tests outputs are not
affected by this change.

Because of a bug, TLS/QUIC SNI were always truncated to 64 bytes (the
*first* 64 ones): as a consequence, there were some "Suspicious DGA
domain name" and "TLS Certificate Mismatch" false positives.
2021-11-24 10:46:48 +01:00
Ivan Nardi
afc2b641eb
Fix writes to flow->protos union fields (#1354)
We can write to `flow->protos` only after a proper classification.

This issue has been found in Kerberos, DHCP, HTTP, STUN, IMO, FTP,
SMTP, IMAP and POP code.
There are two kinds of fixes:
 * write to `flow->protos` only if a final protocol has been detected
 * move protocol state out of `flow->protos`
The hard part is to find, for each protocol, the right tradeoff between
memory usage and code complexity.

Handle Kerberos like DNS: if we find a request, we set the protocol
and an extra callback to further parsing the reply.

For all the other protocols, move the state out of `flow->protos`. This
is an issue only for the FTP/MAIL stuff.

Add DHCP Class Identification value to the output of ndpiReader and to
the Jason serialization.

Extend code coverage of fuzz tests.

Close #1343
Close #1342
2021-11-15 16:20:57 +01:00
Ivan Nardi
acb1de69aa
Reduce memory used by ndpiReader (#1371)
`ndpiReader` is only an example, aiming to show nDPI capabilities
and integration, without any claim about performances.

Nonetheless its memory usage per flow is *huge*, limiting the kinds
of traces that we can test on a "normal" hardware (example: scan
attacks).

The key reason of that behaviour is that we preallocate all the memory
needed for *all* the available features.

Try to reduce memory usage simply allocating some structures only
when they are really needed. Most significant example: JOY algorithms.

This way we should use a lot less memory in the two most common
user-cases:
 * `ndpiReader` invoked without any particular flag (i.e `ndpiReader -i
$FILENAME_OR_IFACE`)
 * internal unit tests

Before (on x86_64):
```
struct ndpi_flow_info {
[...]
	/* size: 7320, cachelines: 115, members: 72 */
```
After:
```
struct ndpi_flow_info {
[...]
	/* size: 2128, cachelines: 34, members: 75 */
```
2021-11-11 12:37:25 +01:00
Toni
6ad0d6666c
Implemented function to retrieve flow information. #1253 (#1254)
* fixed [h]euristic typo

Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
2021-07-23 10:37:20 +02:00
Ivan Nardi
cccf794265
ndpiReader: add statistics about nDPI performance (#1240)
The goal is to have a (roughly) idea about how many packets nDPI needs
to properly classify a flow.

Log this information (and guessed flows number too) during unit tests,
to keep track of improvements/regressions across commits.
2021-07-13 12:28:39 +02:00
Luca
ae2470fad4 Initial work towards detection via TLS of browser types 2021-05-06 21:42:06 +02:00
Luca Deri
4a09707e48 Added flow risk to wireshark dissection 2021-04-26 10:17:29 +02:00
Ivan Nardi
a6029d250d
ndpiReader: print an error msg if we found an unsupported datalink type (#1157) 2021-03-23 11:47:29 +01:00
Luca Deri
e2f6569adb Fixed CPHA missing protocol initialization
Improved IEC104 and IRC detection
2021-02-10 15:22:20 +01:00
Ivan Nardi
a772e18977
Fix a warning (#1125)
Introduced in 5f7b9d802

reader_util.c: In function ‘process_ndpi_collected_info’:
reader_util.c:1148:60: warning: ‘%s’ directive output may be truncated writing up to 255 bytes into a region of size 64 [-Wformat-truncation=]
 1148 |       sizeof(flow->ssh_tls.client_requested_server_name), "%s",
      |                                                            ^~
reader_util.c:1147:5: note: ‘snprintf’ output between 1 and 256 bytes into a destination of size 64
 1147 |     snprintf(flow->ssh_tls.client_requested_server_name,
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1148 |       sizeof(flow->ssh_tls.client_requested_server_name), "%s",
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1149 |       flow->ndpi_flow->protos.tls_quic_stun.tls_quic.client_requested_server_name);
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2021-02-03 11:56:37 +01:00
Luca Deri
d964c3e081 Code cleanup: third party uthash is at the right place 2021-01-20 19:11:36 +01:00
Luca Deri
68b6ac7da8 (C) Update 2021-01-07 11:13:36 +01:00
Luca Deri
eb37f8f1fb Split HTTP request from response Content-Type. Request Content-Type should be present with POSTs and not with other methods such as GET 2021-01-06 18:28:24 +01:00
Toni
74a77e7b3d
Added --ignore-vlanid / -I to exclude VLAN ids for flow hash calculation. #1073 (#1085)
Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
2020-12-11 21:01:51 +01:00
Luca Deri
948a906037 Added -D flag for detecting DoH in the wild
Removed heuristic from CiscoVPN as it leads to false positives
2020-10-26 21:40:59 +01:00
Adrian Zgorzałek
8f74d5733d OpenBSD: Introduce pkt_timeval to deal with (bpf_)_timeval
Some BSD APIs called in example/ return `struct bpf_timeval`, where nDPI
APIs expect `struct timeval`. These two structs, besides having
a different name, share the exact same set of fields.
2020-08-09 14:30:12 +01:00
Toni Uhlig
20fed83e0f
Removed csv_fp as external symbol. Instead passing csv_fp through as argument.
Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
2020-07-08 23:21:35 +02:00
Luca Deri
db707e0829
Merge pull request #932 from IvanNardi/log
Log
2020-07-07 14:43:32 +02:00
Nardi Ivan
c08693fda5 Incorporated some feedback 2020-07-01 20:16:16 +02:00
Nardi Ivan
b24f5c4c0a Fix memory leak about purged/expired flows
Create an helper to avoid similar errors in the future
Fixes: 1a62f4c7
2020-06-28 12:05:12 +02:00