Commit graph

1122 commits

Author SHA1 Message Date
Ivan Nardi
03fd155ae3
IPv6: add support for custom categories (#2126) 2023-10-29 12:56:44 +01:00
Ivan Nardi
32b50f5aa4
IPv6: add support for IPv6 risk exceptions (#2122) 2023-10-29 12:14:20 +01:00
Ivan Nardi
c711251578
IPv6: add support for custom rules (#2120) 2023-10-29 11:26:35 +01:00
Ivan Nardi
e8e4b9e8ff
IPv6: add support for IPv6 risk tree (#2118)
Fix the script to download crawler addressess
2023-10-27 13:58:15 +02:00
Ivan Nardi
611c3b66f0
ipv6: add support for ipv6 addresses lists (#2113) 2023-10-26 20:15:44 +02:00
Nardi Ivan
4a0eda69ad QUIC: export QUIC version as metadata 2023-10-11 15:15:20 +02:00
Nardi Ivan
86115a8a65 fuzz: extend fuzzing coverage 2023-10-07 13:34:37 +02:00
Luca
77e5daf03e Cleaned up mining datastructure 2023-09-27 17:05:12 +02:00
Toni
ef3adb9830
Added printf/fprintf replacement for some internal modules. (#1974)
* logging is instead redirected to `ndpi_debug_printf`

Signed-off-by: lns <matzeton@googlemail.com>
Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
2023-09-26 23:10:57 +02:00
Luca Deri
1bf7e5face Fixes matches with domain name strings that start with a dot 2023-09-11 22:50:03 +02:00
Ivan Nardi
2a0052f25e
fuzz: add fuzzers to test reader_util code (#2080) 2023-09-10 15:07:52 +02:00
Luca Deri
1d480c18e3 Reworked domain classification based on binary filters 2023-09-02 19:16:40 +02:00
Luca Deri
854c2d80f1 Improvement for reducing false positives 2023-09-01 10:26:07 +02:00
Luca Deri
36abf06c6f Swap from Aho-Corasick to an experimental/home-grown algorithm that uses a probabilistic
approach for handling Internet domain names.

For switching back to Aho-Corasick it is necessary to edit
ndpi-typedefs.h and uncomment the line
// #define USE_LEGACY_AHO_CORASICK

[1] With Aho-Corasick
$ ./example/ndpiReader -G ./lists/ -i tests/pcap/ookla.pcap | grep Memory
nDPI Memory statistics:
nDPI Memory (once):      37.34 KB
Flow Memory (per flow):  960 B
Actual Memory:           33.09 MB
Peak Memory:             33.09 MB

[2] With the new algorithm
$ ./example/ndpiReader -G ./lists/ -i tests/pcap/ookla.pcap | grep Memory
nDPI Memory statistics:
nDPI Memory (once):      37.31 KB
Flow Memory (per flow):  960 B
Actual Memory:           7.42 MB
Peak Memory:             7.42 MB

In essence from ~33 MB to ~7 MB

This new algorithm will enable larger lists to be loaded (e.g. top 1M domans
https://s3-us-west-1.amazonaws.com/umbrella-static/index.html)

In ./lists there are file names that are named as <category>_<string>.list
With -G ndpiReader can load all of them at startup
2023-08-29 17:34:04 +02:00
Luca Deri
34986de297 Search fixes 2023-08-26 19:47:50 +02:00
Luca Deri
4ca94369e1 Improved domain search tet unit 2023-08-26 10:43:19 +02:00
Luca Deri
2c565c77c9 Added ndpi_domain_classify_XXX(0 API 2023-08-26 00:24:33 +02:00
Ivan Nardi
cc4461f424
fuzz: extend coverage (#2073) 2023-08-20 15:18:19 +02:00
Luca Deri
049fbf819c Reworked ndpi_filter_xxx implementation using compressed bitmaps 2023-08-14 12:50:54 +02:00
Luca Deri
ec7adc212e Added new API calls for implementing Bloom-filter like data structures
ndpi_filter* ndpi_filter_alloc(uint32_t elements_number);
bool         ndpi_filter_add(ndpi_filter *f, uint64_t value);
bool         ndpi_filter_contains(ndpi_filter *f, uint64_t value);
void         ndpi_filter_free(ndpi_filter *f);
2023-08-11 22:49:31 +02:00
Luca Deri
196395398e Compilation fixes for older C compilers 2023-08-01 15:35:33 +02:00
Toni
e4d3d619bc
Add Service Location Protocol dissector. (#2036)
Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
2023-08-01 08:50:46 +02:00
Ivan Nardi
5019022e13
DNS: extract geolocation information, if available (#2065)
The option NSID (RFC5001) is used by Google DNS to report the
airport code of the metro where the DNS query is handled.

This option is quite rare, but the added overhead in DNS code is pretty
much zero for "normal" DNS traffic
2023-07-31 07:44:43 +02:00
Ivan Nardi
bc91192aca
ProtonVPN: split the ip list (#2060)
Use two separate lists:
* one for the ingress nodes, which triggers a ProtonVPN classification
* one for the egress nodes, which triggers the
`NDPI_ANONYMOUS_SUBSCRIBER` risk

Add a command line option (to `ndpiReader`) to easily test IP/port
matching.

Add another example of custom rule.
2023-07-27 09:05:22 +02:00
Ivan Nardi
c85f2fb0f4
TLS: add basic, basic, detection of Encrypted ClientHello (#2053) 2023-07-21 03:41:43 +02:00
Ivan Nardi
890f17788b
ndpireader: fix detection of DoH traffic based on packet distributions (#2045) 2023-07-14 23:20:06 +02:00
Luca Deri
1f55dc511f Implemented Count-Min Sketch [count how many times a value has been observed]
- ndpi_cm_sketch_init()
- ndpi_cm_sketch_add()
- ndpi_cm_sketch_count()
- ndpi_cm_sketch_destroy()
2023-07-13 21:54:51 +02:00
Chiara Maggi
0b0f255cc2
added feature to extract filename from http attachment (#2037)
* added feature to extract filename from http attachment

* fixed some issues

* added check for filename format

* added check for filename format

* remove an unnecessary print

* changed the size from 952 to 960

* modified some test result files

* small changes string size

* comment removed and mallocs checked
2023-07-11 22:45:19 +02:00
Ivan Nardi
88425e0199
Simplify the report of streaming multimedia info (#2026)
The two fields `flow->flow_type` and `flow->protos.rtp.stream_type` are
pretty much identical: rename the former in `flow->flow_multimedia_type`
and remove the latter.
2023-06-26 12:05:16 +02:00
Ivan Nardi
3608ab01b6
STUN: keep monitoring/processing STUN flows (#2012)
Look for RTP packets in the STUN sessions.
TODO: tell RTP from RTCP
2023-06-21 09:16:20 +02:00
Luca Deri
9cc4cbb9d1 Reworked teams handling 2023-06-15 22:31:11 +02:00
Luca Deri
d0609ea601 Implemented Zoom/Teams stream type detection 2023-06-14 23:44:57 +02:00
Toni
4e284b5e40
Set _DEFAULT_SOURCE and _GNU_SOURCE globally. (#2010)
Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
2023-06-12 19:53:57 +02:00
Luca Deri
63a2b359b9 Added vlan_id in ndpi_flow2json() prototype 2023-06-09 19:50:47 +02:00
Ivan Nardi
9987e5b482
ndpiReader: allow to configure LRU caches TTL and size (#2004) 2023-06-08 17:06:32 +02:00
Toni
04f5c5196e
Changed logging callback function sig. (#2000)
* make user data available for any build config

Signed-off-by: lns <matzeton@googlemail.com>
2023-05-30 08:27:37 +02:00
Ivan Nardi
efb261a95c
Fix some memory errors triggered by allocation failures (#1995)
Some low hanging fruits found using nallocfuzz.
See: https://github.com/catenacyber/nallocfuzz
See: https://github.com/google/oss-fuzz/pull/9902

Most of these errors are quite trivial to fix; the only exception is the
stuff in the uthash.
If the insertion fails (because of an allocation failure), we need to
avoid some memory leaks. But the only way to check if the `HASH_ADD_*`
failed, is to perform a new lookup: a bit costly, but we don't use that
code in any critical data-path.
2023-05-29 19:24:00 +02:00
Ivan Nardi
46ff069117
ndpiReader: improve printing of payload statistics (#1989)
Add a basic unit test

Fix an endianess issue
2023-05-29 16:53:11 +02:00
Ivan Nardi
86b56646b5
ndpiReader: fix export of DNS/BitTorrent attributes (#1985)
There is no BitTorrent hash in the DNS flows
2023-05-20 17:23:48 +02:00
Ivan Nardi
9004d5c2ca
ndpiReader: fix export of HTTP attributes (#1982) 2023-05-20 15:12:14 +02:00
headshog
dd7198fcaf fixed numeric truncation error 2023-05-20 15:09:51 +02:00
Luca Deri
5ca6f0ac62 Implemented ndpi_predict_linear() for predicting a timeseries value overtime 2023-05-19 11:46:03 +02:00
Ivan Nardi
0223d3c4f5
HTTP: improve extraction of metadata and of flow risks (#1959) 2023-05-05 13:35:20 +02:00
Ivan Nardi
02a2c80453
ndpiReader: fix print of flow payload (#1960) 2023-05-04 11:52:48 +02:00
Luca Deri
f7138f07a6 Missing module termination 2023-04-28 23:00:33 +02:00
Ivan Nardi
8934f7b45f
Add an heuristic to detect/ignore some anomalous TCP ACK packets (#1948)
In some networks, there are some anomalous TCP flows where the smallest
ACK packets have some kind of zero padding.
It looks like the IP and TCP headers in those frames wrongly consider the
0x00 Ethernet padding bytes as part of the TCP payload.
While this kind of packets is perfectly valid per-se, in some conditions
they might be treated by the TCP reassembler logic as (partial) overlaps,
deceiving the classification engine.
Add an heuristic to detect these packets and to ignore them, allowing
correct detection/classification.

This heuristic is configurable. Default value:
* in the library, it is disabled
* in `ndpiReader` and in the fuzzers, it is enabled (to ease testing)

Credit to @vel21ripn for the initial patch.

Close #1946
2023-04-25 19:25:07 +02:00
Toni
b6629ba2be
Improved debug output. (#1951)
* try to get rid of some `printf(..)`s as they do not belong to a shared library
 * replaced all `exit(..)`s with `abort()`s to indicate an abnormal process termination

Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
2023-04-21 12:40:26 +02:00
Ivan Nardi
ba99330031
ndpiReader: fix flow stats (#1943) 2023-04-14 11:17:37 +02:00
lns
77ea80862f Improved debug logging.
Signed-off-by: lns <matzeton@googlemail.com>
Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
2023-04-11 15:39:09 +02:00
Luca Deri
c47e9d201d Implemented ndpi_XXX_reset() API calls whre XXX is ses, des, hw 2023-04-08 09:52:14 +02:00