We should pay attention to tell ndpiReader configuration files and
libnDPI configuration files!! Better solution?
Be sure that configuration files are located where they are expected.
In oss-fuzz enviroment we can't make any assumptions about the current
working directory of your fuzz target.
Application should keep calling nDPI until flow state became
`NDPI_STATE_CLASSIFIED`.
The main loop in the application is simplified to something like:
```
res = ndpi_detection_process_packet(...);
if(res->state == NDPI_STATE_CLASSIFIED) {
/* Done: you can get finale classification and all metadata.
nDPI doesn't need more packets for this flow */
} else {
/* nDPI needs more packets for this flow. The provided
classification is not final and more metadata might be
extracted.
If `res->state` is `NDPI_STATE_PARTIAL`, partial/initial
classification is available in `res->proto`
as usual but it can be updated later.
*/
}
/*
Example A (QUIC flow):
pkt 1: proto QUIC state NDPI_STATE_PARTIAL
pkt 2: proto QUIC/Youtube state NDPI_STATE_CLASSIFIED
Example B (GoogleMeet call):
pkt 1: proto STUN state NDPI_STATE_PARTIAL
pkt N: proto DTLS state NDPI_STATE_PARTIAL
pkt N+M: proto DTLS/GoogleCall state NDPI_STATE_CLASSIFIED
Example C (standard TLS flow):
pkt 1: proto Unknown state NDPI_STATE_INSPECTING
pkt 2: proto Unknown state NDPI_STATE_INSPECTING
pkt 3: proto Unknown state NDPI_STATE_INSPECTING
pkt 4: proto TLS/Facebook state NDPI_STATE_PARTIAL
pkt N: proto TLS/Facebook state NDPI_STATE_CLASSIFIED
*/
}
```
You can take a look at `ndpiReader` for a slightly more complex example.
API changes:
* remove the third parameter from `ndpi_detection_giveup()`. If you need
to know if the classification flow has been guessed, you can access
`flow->protocol_was_guessed`
* remove `ndpi_extra_dissection_possible()`
* change some prototypes from accepting `ndpi_protocol foo` to
`ndpi_master_app_protocol bar`. The update is trivial: from `foo` to
`foo.proto`
Initial work to support out-of-tree builds
```
./autogen.sh
mkdir build
cd build
../configure
make
make check
```
IMPORTANT: `autogen.sh` doesn't call `configure` automatically anymore!!
You have to do: `./autogen.sh && ./configure --$OPTIONS`.
A little bit annoying but the pattern `autogen && configure && make` is
very common on Linux.
Known issues:
* `make doc` doesn't work in out-of-tree builds, yet
* Windows/MinGW/DPDK (out-of-tree) builds have not been tested, so it is unlikely they work
See: #2992
We don't know how much memory we are currently using: we only know the
amount of total memory allocated. Use proper label to report this
information in a correct way
It was completly broken.
Pay some attention to HTTP case where we might have Host header in the
"$DOMAIN:$PORT" form: we usually want to strip the port part
`memrchr` is not available on macOS and on Windows: create a wrapper
- keeping track of the number of updates without rank changes
- not creating new slots (but overwriting the last one) when a new update with no rank changes is computed. This way in the ranking atastructure there are only entries that caused ranking chnages
Right now, there is, in essence, a static mapping between flow protocols
and flow breeds.
Make it dynamic: allow to have different flows, with the same
classification but differents breeds. This is the same logic that we
already have for categories....
Preliminary work to support breed in category lists.
API change from the app POV: to get the flow breed don't use anymore
`ndpi_get_proto_breed()`, but access directly `struct ndpi_proto->breed`
The functions `ndpi_domain_classify_*()` and
`ndpi_get_host_domain_suffix()` now have a `u_int32_t` parameter as
`class_id` (instead of `u_int_16_t`), with the following logic:
```
class_id = (breed << 16) | category
```
instead of the old:
```
class_id = category
```
Please note that this change is back-compatible: if you are not
interested into breeds, you don't need to update the application code.
```
ndpi_analyze.c: In function ‘ndpi_deserialize_ranking’:
ndpi_analyze.c:2244:3: warning: ignoring return value of ‘fread’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
2244 | fread(&rank->header, sizeof(ndpi_ranking_header), 1, fd);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ndpi_main.c: In function ‘ndpi_match_host_subprotocol’:
ndpi_main.c:11798:9: warning: ‘__builtin_strncpy’ output may be truncated copying between 0 and 63 bytes from a string of length 255 [-Wstringop-truncation]
11798 | strncpy(str, string_to_match, ndpi_min(string_to_match_len, sizeof(str)-1));
| ^
ndpi_main.c:11811:7: warning: ‘__builtin_strncpy’ output may be truncated copying between 0 and 63 bytes from a string of length 255 [-Wstringop-truncation]
11811 | strncpy(str, string_to_match, ndpi_min(string_to_match_len, sizeof(str)-1));
```
- TCP fingerprint
- JA4 fingepriint
- TLS SHA1 certificate (if present), or JA3S fingerprint (is SHA1 is missing)
By default the fingerprint uses the client and server fingerprints (format 0)
and combines them. However you can chnge it format (eg. use only the client info,
format 1) with
--cfg NULL,metadata.ndpi_fingerprint_format,X
where X is the fingerprint format.
By default nDPI fingerprint is enabled but you can enable/disble it as follows
--cfg NULL,metadata.ndpi_fingerprint,0
The idea is to remove the limitation of only two protocols ("master" and
"app") in the flow classifcation.
This is quite handy expecially for STUN flows and, in general, for any
flows where there is some kind of transitionf from a cleartext protocol
to TLS: HTTP_PROXY -> TLS/Youtube; SMTP -> SMTPS (via STARTTLS msg).
In the vast majority of the cases, the protocol stack is simply
Master/Application.
Examples of real stacks (from the unit tests) different from the standard
"master/app":
* "STUN.WhatsAppCall.SRTP": a WA call
* "STUN.DTLS.GoogleCall": a Meet call
* "Telegram.STUN.DTLS.TelegramVoip": a Telegram call
* "SMTP.SMTPS.Google": a SMTP connection to Google server started in
cleartext and updated to TLS
* "HTTP.Google.ntop": a HTTP connection to a Google domain (match via
"Host" header) and to a ntop server (match via "Server" header)
The logic to create the stack is still a bit coarse: we have a decade of
code try to push everything in only ywo protocols... Therefore, the
content of the stack is still **highly experimental** and might change
in the next future; do you have any suggestions?
It is quite likely that the legacy fields "master_protocol" and
"app_protocol" will be there for a long time.
Add some helper to use the stack:
```
ndpi_stack_get_upper_proto();
ndpi_stack_get_lower_proto();
bool ndpi_stack_contains(struct ndpi_proto_stack *s, u_int16_t proto_id);
bool ndpi_stack_is_tls_like(struct ndpi_proto_stack *s);
bool ndpi_stack_is_http_like(struct ndpi_proto_stack *s);
```
Be sure new stack logic is compatible with legacy code:
```
assert(ndpi_stack_get_upper_proto(&flow->detected_protocol.protocol_stack) ==
ndpi_get_upper_proto(flow->detected_protocol));
assert(ndpi_stack_get_lower_proto(&flow->detected_protocol.protocol_stack) ==
ndpi_get_lower_proto(flow->detected_protocol));
```
- Changed ndpi_flow_info: replaced fixed-size char arrays (always INET6_ADDRSTRLEN) for src_name and dst_name with char* pointers.
- Now IPv4 flows use only INET_ADDRSTRLEN when needed, instead of always reserving IPv6 size.
Call ndpi_stats_reset() once per thread instead of once per flow root
Moved ndpi_stats_reset() outside the loop that destroys ndpi_flows_root[]
to avoid redundant resets. The stats structure is shared per thread and
should only be reset once after all roots are cleared.
Refactored stats allocation and reset logic to avoid segmentation faults
when running ndpiReader in live_capture mode with the -m (duration) option.
- Introduced ndpi_stats_init(), ndpi_stats_reset(), and ndpi_stats_free()
to encapsulate lifecycle management of stats.
- Applied these functions in ndpiReader.c and reader_util.{c,h}.
- Prevented multiple allocations and ensured safe reuse of cumulative_stats
and per-thread stats structures between capture iterations.
Fixes: https://github.com/ntop/nDPI/issues/2903
Change the API to enable/disable protocols: you can set that via the
standard `ndpi_set_config()` function, as every configuration
parameters. By default, all protocols are enabled.
Split the (local) context initialization into two phases:
* `ndpi_init_detection_module()`: generic part. It does not depend on the
configuration and on the protocols being enabled or not. It also
calculates the real number of internal protocols
* `ndpi_finalize_initialization()`: apply the configuration. All the
initialization stuff that depend on protocols being enabled or not
must be put here
This is the last step to have the protocols number fully calculated at
runtime
Remove a (now) useless fuzzer.
Important API changes:
* remove `NDPI_LAST_IMPLEMENTED_PROTOCOL` define
* remove `ndpi_get_num_internal_protocols()`. To get the number of
configured protocols (internal and custom) you must use
`ndpi_get_num_protocols()` after having called `ndpi_finalize_initialization()`