Commit graph

62 commits

Author SHA1 Message Date
Ivan Nardi
a7c2734b38
Remove classification "by-ip" from protocol stack (#1743)
Basically:
* "classification by-ip" (i.e. `flow->guessed_protocol_id_by_ip` is
NEVER returned in the protocol stack (i.e.
`flow->detected_protocol_stack[]`);
* if the application is interested into such information, it can access
`ndpi_protocol->protocol_by_ip` itself.

There are mainly 4 points in the code that set the "classification
by-ip" in the protocol stack:  the generic `ndpi_set_detected_protocol()`/
`ndpi_detection_giveup()` functions and the HTTP/STUN  dissectors.

In the unit tests output, a print about `ndpi_protocol->protocol_by_ip`
has been added for each flow: the huge diff of this commit is mainly due
to that.

Strictly speaking, this change is NOT an API/ABI breakage, but there are
important differences in the classification results. For examples:
* TLS flows without the initial handshake (or without a matching
SNI/certificate) are simply classified as `TLS`;
* similar for HTTP or QUIC flows;
* DNS flows without a matching request domain are simply classified as
`DNS`; we don't have `DNS/Google` anymore just because the server is
8.8.8.8 (that was an outrageous behaviour...);
* flows previusoly classified only "by-ip" are now classified as
`NDPI_PROTOCOL_UNKNOWN`.

See #1425 for other examples of why adding the "classification by-ip" in
the protocol stack is a bad idea.

Please, note that IPV6 is not supported :(  (long standing issue in nDPI) i.e.
`ndpi_protocol->protocol_by_ip` wil be always `NDPI_PROTOCOL_UNKNOWN` for
IPv6 flows.

Define `NDPI_CONFIDENCE_MATCH_BY_IP` has been removed.

Close #1687
2022-09-20 22:24:47 +02:00
Ivan Nardi
4f584f78a0
Fix ndpi_do_guess() (#1731)
Avoid a double call of `ndpi_guess_host_protocol_id()`.
Some code paths work for ipv4/6 both
Remove some never used code.
2022-09-12 19:28:41 +02:00
Ivan Nardi
0a47f745cc
Avoid useless host automa lookup (#1724)
The host automa is used for two tasks:
* protocol sub-classification (obviously);
* DGA evaluation: the idea is that if a domain is present in this
automa, it can't be a DGA, regardless of its format/name.

In most dissectors both checks are executed, i.e. the code is something
like:

```
ndpi_match_host_subprotocol(..., flow->host_server_name, ...);
ndpi_check_dga_name(..., flow->host_server_name,...);

```

In that common case, we can perform only one automa lookup: if we check the
sub-classification before the DGA, we can avoid the second lookup in
the DGA function itself.
2022-09-05 13:59:51 +02:00
Ivan Nardi
405a52ed65
Patricia tree, Ahocarasick automa, LRU cache: add statistics (#1683)
Add (basic) internal stats to the main data structures used by the
library; they might be usefull to check how effective these structures
are.

Add an option to `ndpiReader` to dump them; enabled by default in the
unit tests.
This new option enables/disables dumping of "num dissectors calls"
values, too (see b4cb14ec).
2022-07-29 15:25:00 +02:00
Ivan Nardi
b190dab6bc
Improve handling of HTTP-Proxy and HTTP-Connect (#1673)
Treat HTTP-Proxy and HTTP-Connect flows like the HTTP ones:
print/serialize all the attributes and allow parsing of replies.

The line about "1kxun" has been removed to avoid regressions in 1KXUN
classification in `tests/pcap/1kxun.pcap`. I haven't fully understod
what was happening but the comment at the beginning of `static
ndpi_category_match category_match[]` says that we can't have overlaps
between `host_match` and `category_match` lists and that is no longer true
since 938e89ca.
Bottom line: removing this line seems the right thing to do, anyway.
2022-07-25 12:57:33 +02:00
Ivan Nardi
b4cb14ec19
Keep track of how many dissectors calls we made for each flow (#1657) 2022-07-11 09:47:47 +02:00
Ivan Nardi
ff4e010501
Avoid spurious calls to extra dissection (#1648)
If the extra callabck is not set, calling the extra dissection is only a
waste of resources...
2022-07-07 17:49:35 +02:00
Luca Deri
354addd693 Updated risk results 2022-05-30 23:28:59 +02:00
Ivan Nardi
71636dcafd
Sync unit tests results (#1533) 2022-04-27 18:22:11 +02:00
Toni
bc2ad3407a
Added generic user agent setter. (#1530)
* ndpiReader: Print user agent if one was set and not just for certain protocols.

Signed-off-by: lns <matzeton@googlemail.com>
2022-04-25 13:00:50 +02:00
Ivan Nardi
075bce5f3d
XIAOMI: add detection of Xiaomi traffic (#1529)
Most of the credits should go to @utoni (see #1521)
2022-04-25 11:00:02 +02:00
Ivan Nardi
42909673ce
Add some scripts to easily update some IPs lists (#1522)
Follow-up of 8b062295

Add a new protocol id for generic Tencent/Wechat flows
2022-04-21 20:43:52 +02:00
Ivan Nardi
3a087e951d
Add a "confidence" field about the reliability of the classification. (#1395)
As a general rule, the higher the confidence value, the higher the
"reliability/precision" of the classification.

In other words, this new field provides an hint about "how" the flow
classification has been obtained.
For example, the application may want to ignore classification "by-port"
(they are not real DPI classifications, after all) or give a second
glance at flows classified via LRU caches (because of false positives).

Setting only one value for the confidence field is a bit tricky: more
work is probably needed in the next future to tweak/fix/improve the logic.
2022-01-11 15:23:39 +01:00
Luca Deri
708d4ea33a Improved user agent analysis 2022-01-09 18:47:47 +01:00
Ivan Nardi
b1e9245d94
ndpiReader: slight simplificaton of the output (#1378) 2021-11-27 17:32:23 +01:00
Luca Deri
ea435c46f5 Reworked HTTP protocol dissection including HTTP proxy and HTTP connect 2021-11-25 22:53:46 +01:00
Ivan Nardi
5464bad6db
Differentiate between standard Amazon stuff (i.e market) and AWS (#1369) 2021-11-04 00:20:45 +01:00
Luca Deri
9e97d20c25 Refreshed results list 2021-10-16 12:03:16 +02:00
Luca Deri
e8455236bd Updated output 2021-08-07 17:38:33 +02:00
Ivan Nardi
cccf794265
ndpiReader: add statistics about nDPI performance (#1240)
The goal is to have a (roughly) idea about how many packets nDPI needs
to properly classify a flow.

Log this information (and guessed flows number too) during unit tests,
to keep track of improvements/regressions across commits.
2021-07-13 12:28:39 +02:00
Luca Deri
23a15bae5f Fixes #1029 2020-11-27 18:51:56 +01:00
Luca Deri
dd75060932 Fixed false positive in suspicous user agent
Optimized stddev calculation
2020-08-30 12:25:15 +02:00
Luca Deri
e71df49b3e Changed due to bin size extension 2020-07-30 00:06:46 +02:00
Luca Deri
879cec94b2 User agent detection improvements 2020-07-21 12:06:34 +02:00
Luca Deri
12abcd516b Updated test results due to bin changes 2020-07-09 17:28:02 +02:00
Luca Deri
1a62f4c799 Added ndpi_bin_XXX API
Added packet lenght distribution bins
2020-06-22 01:02:54 +02:00
Luca Deri
b7e666e465 Added fix to avoid potential heap buffer overflow in H.323 dissector
Modified HTTP report information to make it closer to the HTTP field names
2020-05-19 08:31:05 +02:00
Luca Deri
e5e69d0f7a Added the ability to detect when a known protocol is using a non-standard port
Added check to spot executables exchanged via HTTP
2020-05-10 21:25:38 +02:00
emanuele-f
fd94270507 Remove decimals in test results for IAT, packet lengths and goodput ratio 2020-02-14 11:42:20 +01:00
Luca Deri
e98b994a39 Updated results 2019-11-21 13:35:04 +01:00
Luca
4802987178 Initial work towards HTTP content-type export 2019-10-31 00:14:20 +01:00
Luca
0e54f87b18 Added telnet dissector
Improved data report
2019-10-29 19:12:42 +01:00
Luca Deri
044ba7697a Improved guess 2019-10-25 16:02:44 +02:00
Luca Deri
e6bd64b3ea Improved HTTP reporting in ndpiReader 2019-10-25 15:56:47 +02:00
Luca Deri
0974075fa0 Major cleanup
Removed ndpi_pref_http_dont_dissect_response and ndpi_pref_dns_dont_dissect_response as the ndpi_extra_dissection_possible() call will now handle everything
2019-10-24 19:48:55 +02:00
Luca Deri
4fd7e5734a Manual merge of pull #769 2019-10-02 23:01:29 +02:00
Luca Deri
6a22bee2ca Added URL in results 2019-10-01 12:26:15 +02:00
Luca Deri
f2a5bbef17 Reworked categories handling
Removed GenericProtocol and replaced with categories
Removed ndpi_pref_enable_category_substring_match option: substring matching is now default
2019-09-29 21:46:41 +02:00
Luca Deri
c839dcb74c Improved category handlign in subprotocols
Further DNS dissection fixes
Fixed WeChat invalid category
2019-09-27 17:34:22 +02:00
Luca
0ed679e795 Improves IAT calculation 2019-09-24 16:37:42 +02:00
Luca
886d575157 Added -C to generate CSV analysis files
Improved IAT and byte distribution
2019-09-03 18:38:54 +02:00
Luca
b1270fc7bb Uodated results 2019-08-29 15:23:01 +02:00
Luca
e4e40e3c70 Added entropy, average, stddev, variance, bytes ratio calculation 2019-08-28 14:02:39 +02:00
Luca Deri
63173e7360 Updated results with new dissection 2019-07-24 00:13:07 +02:00
Luca Deri
b8867642fc Refresh after data leak detection 2019-07-18 11:49:53 +02:00
Luca Deri
17c49b2e6d Updated test resultss after export changes 2019-07-13 18:37:57 +02:00
Luca
1290706fad Tests result fix
Merge branch 'dev' of https://github.com/ntop/nDPI into dev
2019-04-05 12:51:59 +02:00
Luca Deri
f88648fbc8 Tests update 2018-08-16 12:05:07 +02:00
Luca
a499f369a5 Updated results based on the new output format 2018-07-21 15:20:11 +02:00
Luca Deri
36c1b72118 Updated test resuls 2018-05-18 23:22:14 +02:00