Application should keep calling nDPI until flow state became
`NDPI_STATE_CLASSIFIED`.
The main loop in the application is simplified to something like:
```
res = ndpi_detection_process_packet(...);
if(res->state == NDPI_STATE_CLASSIFIED) {
/* Done: you can get finale classification and all metadata.
nDPI doesn't need more packets for this flow */
} else {
/* nDPI needs more packets for this flow. The provided
classification is not final and more metadata might be
extracted.
If `res->state` is `NDPI_STATE_PARTIAL`, partial/initial
classification is available in `res->proto`
as usual but it can be updated later.
*/
}
/*
Example A (QUIC flow):
pkt 1: proto QUIC state NDPI_STATE_PARTIAL
pkt 2: proto QUIC/Youtube state NDPI_STATE_CLASSIFIED
Example B (GoogleMeet call):
pkt 1: proto STUN state NDPI_STATE_PARTIAL
pkt N: proto DTLS state NDPI_STATE_PARTIAL
pkt N+M: proto DTLS/GoogleCall state NDPI_STATE_CLASSIFIED
Example C (standard TLS flow):
pkt 1: proto Unknown state NDPI_STATE_INSPECTING
pkt 2: proto Unknown state NDPI_STATE_INSPECTING
pkt 3: proto Unknown state NDPI_STATE_INSPECTING
pkt 4: proto TLS/Facebook state NDPI_STATE_PARTIAL
pkt N: proto TLS/Facebook state NDPI_STATE_CLASSIFIED
*/
}
```
You can take a look at `ndpiReader` for a slightly more complex example.
API changes:
* remove the third parameter from `ndpi_detection_giveup()`. If you need
to know if the classification flow has been guessed, you can access
`flow->protocol_was_guessed`
* remove `ndpi_extra_dissection_possible()`
* change some prototypes from accepting `ndpi_protocol foo` to
`ndpi_master_app_protocol bar`. The update is trivial: from `foo` to
`foo.proto`
Add the concept of "global context".
Right now every instance of `struct ndpi_detection_module_struct` (we
will call it "local context" in this description) is completely
independent from each other. This provide optimal performances in
multithreaded environment, where we pin each local context to a thread,
and each thread to a specific CPU core: we don't have any data shared
across the cores.
Each local context has, internally, also some information correlating
**different** flows; something like:
```
if flow1 (PeerA <-> Peer B) is PROTOCOL_X; then
flow2 (PeerC <-> PeerD) will be PROTOCOL_Y
```
To get optimal classification results, both flow1 and flow2 must be
processed by the same local context. This is not an issue at all in the far
most common scenario where there is only one local context, but it might
be impractical in some more complex scenarios.
Create the concept of "global context": multiple local contexts can use
the same global context and share some data (structures) using it.
This way the data correlating multiple flows can be read/write from
different local contexts.
This is an optional feature, disabled by default.
Obviously data structures shared in a global context must be thread safe.
This PR updates the code of the LRU implementation to be, optionally,
thread safe.
Right now, only the LRU caches can be shared; the other main structures
(trees and automas) are basically read-only: there is little sense in
sharing them. Furthermore, these structures don't have any information
correlating multiple flows.
Every LRU cache can be shared, independently from the others, via
`ndpi_set_config(ndpi_struct, NULL, "lru.$CACHE_NAME.scope", "1")`.
It's up to the user to find the right trade-off between performances
(i.e. without shared data) and classification results (i.e. with some
shared data among the local contexts), depending on the specific traffic
patterns and on the algorithms used to balance the flows across the
threads/cores/local contexts.
Add some basic examples of library initialization in
`doc/library_initialization.md`.
This code needs libpthread as external dependency. It shouldn't be a big
issue; however a configure flag has been added to disable global context
support. A new CI job has been added to test it.
TODO: we should need to find a proper way to add some tests on
multithreaded enviroment... not an easy task...
*** API changes ***
If you are not interested in this feature, simply add a NULL parameter to
any `ndpi_init_detection_module()` calls.
In a lot of places in ndPI we use *packet* source/dest info
(address/port/direction) when we are interested in *flow* client/server
info, instead.
Add basic logic to autodetect this kind of information.
nDPI doesn't perform any "flow management" itself but this task is
delegated to the external application. It is then likely that the
application might provide more reliable hints about flow
client/server direction and about the TCP handshake presence: in that case,
these information might be (optionally) passed to the library, disabling
the internal "autodetect" logic.
These new fields have been used in some LRU caches and in the "guessing"
algorithm.
It is quite likely that some other code needs to be updated.