Commit graph

21 commits

Author SHA1 Message Date
Luca Deri
1fecf69359 Added DNS error code mapping number -> string 2022-10-14 09:39:15 +02:00
Nardi Ivan
83de3e4716 DNS: change category of DNS flows
DNS flows should have `NDPI_PROTOCOL_CATEGORY_NETWORK` as category,
regardless of the subprotocol (if any).
2022-09-22 13:23:55 +02:00
Ivan Nardi
a7c2734b38
Remove classification "by-ip" from protocol stack (#1743)
Basically:
* "classification by-ip" (i.e. `flow->guessed_protocol_id_by_ip` is
NEVER returned in the protocol stack (i.e.
`flow->detected_protocol_stack[]`);
* if the application is interested into such information, it can access
`ndpi_protocol->protocol_by_ip` itself.

There are mainly 4 points in the code that set the "classification
by-ip" in the protocol stack:  the generic `ndpi_set_detected_protocol()`/
`ndpi_detection_giveup()` functions and the HTTP/STUN  dissectors.

In the unit tests output, a print about `ndpi_protocol->protocol_by_ip`
has been added for each flow: the huge diff of this commit is mainly due
to that.

Strictly speaking, this change is NOT an API/ABI breakage, but there are
important differences in the classification results. For examples:
* TLS flows without the initial handshake (or without a matching
SNI/certificate) are simply classified as `TLS`;
* similar for HTTP or QUIC flows;
* DNS flows without a matching request domain are simply classified as
`DNS`; we don't have `DNS/Google` anymore just because the server is
8.8.8.8 (that was an outrageous behaviour...);
* flows previusoly classified only "by-ip" are now classified as
`NDPI_PROTOCOL_UNKNOWN`.

See #1425 for other examples of why adding the "classification by-ip" in
the protocol stack is a bad idea.

Please, note that IPV6 is not supported :(  (long standing issue in nDPI) i.e.
`ndpi_protocol->protocol_by_ip` wil be always `NDPI_PROTOCOL_UNKNOWN` for
IPv6 flows.

Define `NDPI_CONFIDENCE_MATCH_BY_IP` has been removed.

Close #1687
2022-09-20 22:24:47 +02:00
Ivan Nardi
4f584f78a0
Fix ndpi_do_guess() (#1731)
Avoid a double call of `ndpi_guess_host_protocol_id()`.
Some code paths work for ipv4/6 both
Remove some never used code.
2022-09-12 19:28:41 +02:00
Ivan Nardi
0a47f745cc
Avoid useless host automa lookup (#1724)
The host automa is used for two tasks:
* protocol sub-classification (obviously);
* DGA evaluation: the idea is that if a domain is present in this
automa, it can't be a DGA, regardless of its format/name.

In most dissectors both checks are executed, i.e. the code is something
like:

```
ndpi_match_host_subprotocol(..., flow->host_server_name, ...);
ndpi_check_dga_name(..., flow->host_server_name,...);

```

In that common case, we can perform only one automa lookup: if we check the
sub-classification before the DGA, we can avoid the second lookup in
the DGA function itself.
2022-09-05 13:59:51 +02:00
Ivan Nardi
405a52ed65
Patricia tree, Ahocarasick automa, LRU cache: add statistics (#1683)
Add (basic) internal stats to the main data structures used by the
library; they might be usefull to check how effective these structures
are.

Add an option to `ndpiReader` to dump them; enabled by default in the
unit tests.
This new option enables/disables dumping of "num dissectors calls"
values, too (see b4cb14ec).
2022-07-29 15:25:00 +02:00
Ivan Nardi
b4cb14ec19
Keep track of how many dissectors calls we made for each flow (#1657) 2022-07-11 09:47:47 +02:00
Ivan Nardi
7645909460
Fix handling of NDPI_UNIDIRECTIONAL_TRAFFIC risk (#1636) 2022-07-05 17:01:00 +02:00
Luca Deri
ab09b8ce2e Added unidirectional traffic flow risk 2022-06-20 00:22:13 +02:00
Luca Deri
defe7d7f79 Updated DNS alert triggered only with TTL == 0 2022-06-14 00:13:05 +02:00
Luca Deri
cf5873ffd7 Improved DNS traffic analysis
Added ability to identify application and network protocols
2022-06-13 23:19:47 +02:00
Luca Deri
1da9f1a36f Updated tests results
Code cleanup
2022-05-30 00:54:17 +02:00
Ivan Nardi
71636dcafd
Sync unit tests results (#1533) 2022-04-27 18:22:11 +02:00
Luca Deri
c40496eaac Updated test results 2022-02-03 13:54:38 +01:00
Ivan Nardi
3a087e951d
Add a "confidence" field about the reliability of the classification. (#1395)
As a general rule, the higher the confidence value, the higher the
"reliability/precision" of the classification.

In other words, this new field provides an hint about "how" the flow
classification has been obtained.
For example, the application may want to ignore classification "by-port"
(they are not real DPI classifications, after all) or give a second
glance at flows classified via LRU caches (because of false positives).

Setting only one value for the confidence field is a bit tricky: more
work is probably needed in the next future to tweak/fix/improve the logic.
2022-01-11 15:23:39 +01:00
Ivan Nardi
b1e9245d94
ndpiReader: slight simplificaton of the output (#1378) 2021-11-27 17:32:23 +01:00
Luca Deri
9e97d20c25 Refreshed results list 2021-10-16 12:03:16 +02:00
Luca Deri
e8455236bd Updated output 2021-08-07 17:38:33 +02:00
Ivan Nardi
cccf794265
ndpiReader: add statistics about nDPI performance (#1240)
The goal is to have a (roughly) idea about how many packets nDPI needs
to properly classify a flow.

Log this information (and guessed flows number too) during unit tests,
to keep track of improvements/regressions across commits.
2021-07-13 12:28:39 +02:00
Luca Deri
96b71def49 Minor change 2021-07-12 17:51:25 +02:00
Vitaly Lavrov
c418b7110b
ahoсorasick. Code review. Part 2. (#1236)
Simplified the process of adding lines to AC_AUTOMATA_t.
Use the ndpi_string_to_automa() function to add patterns with domain names.
For other cases can use ndpi_add_string_value_to_automa().

ac_automata_feature(ac_automa, AC_FEATURE_LC) allows adding
and compare data in a case insensitive manner. For mandatory pattern comparison
from the end of the line, the "ac_pattern.rep.at_end=1" flag is used.
This eliminated unnecessary conversions to lowercase and adding "$" for
end-of-line matching in domain name patterns.

ac_match_handler() has been renamed ac_domain_match_handler() and has been greatly simplified.
ac_domain_match_handler() looks for the template with the highest domain level.
For special cases it is possible to manually specify the domain level.
Added test for checking ambiguous domain names like:
 - short.weixin.qq.com is QQ, not Wechat
 - instagram.faae1-1.fna.fbcdn.net is Instagram, not Facebook

If you specify a NULL handler when creating the AC_AUTOMATA_t structure,
then a pattern with the maximum length that satisfies the search conditions will be found
(exact match, from the beginning of the string, from the end of the string, or a substring).

Added debugging for ac_automata_search.
To do this, you need to enable debugging globally using ac_automata_enable_debug(1) and
enable debugging in the AC_AUTOMATA_t structure using ac_automata_name("name", AC_FEATURE_DEBUG).
The search will display "name" and a list of matching patterns.
Running "AHO_DEBUG=1 ndpiReader ..." will show the lines that were searched for templates
and which templates were found.

The ac_automata_dump() prototype has been changed. Now it outputs data to a file.
If it is specified as NULL, then the output will be directed to stdout.
If you need to get data as a string, then use open_memstream().

Added the ability to run individual tests via the do.sh script
2021-07-12 17:39:43 +02:00