nfstream/README.md
2020-09-07 23:56:51 +02:00

16 KiB

NFStream Logo


NFStream is a Python package providing fast, flexible, and expressive data structures designed to make working with online or offline network data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world network data analysis in Python. Additionally, it has the broader goal of becoming a common network data processing framework for researchers providing data reproducibility across experiments.

Live Notebook live notebook
Project Website website
Discussion Channel Gitter
Latest Release latest release
Supported Versions python3
Project License License
Build Status Github WorkFlows Travis CI
Code Quality Quality
Code Coverage Coverage

Main Features

  • Performance: NFStream is designed to be fast: parallel processing, native C (using CFFI) for critical computation and PyPy support.
  • Encrypted layer-7 visibility: NFStream deep packet inspection is based on nDPI. It allows NFStream to perform reliable encrypted applications identification and metadata fingerprinting (e.g. TLS, SSH, DHCP, HTTP).
  • Statistical fingerprinting: NFStream provides state of the art of flow-based statistical feature extraction. It includes both post-mortem statistical features (e.g. min, mean, stddev and max of packet size and inter arrival time) and early flow features (e.g. sequence of first n packets sizes, inter arrival times and directions).
  • Flexibility: NFStream is easily extensible using NFPlugins. It allows to create a new feature within a few lines of Python.
  • Machine Learning oriented: NFStream aims to make Machine Learning Approaches for network traffic management reproducible and deployable. By using NFStream as a common framework, researchers ensure that models are trained using the same feature computation logic and thus, a fair comparison is possible. Moreover, trained models can be deployed and evaluated on live network using NFPlugins.

How to use it?

Encrypted application identification and metadata extraction

Dealing with a big pcap file and just want to aggregate into labeled network flows? NFStream make this path easier in few lines:

from nfstream import NFStreamer
# We display all streamer parameters with their default values.
# See documentation for detailed information about each parameter.
my_streamer = NFStreamer(source="facebook.pcap", # or network interface
                         decode_tunnels=True,
                         bpf_filter=None,
                         promiscuous_mode=True,
                         snapshot_length=1500,
                         idle_timeout=30,
                         active_timeout=300,
                         accounting_mode=0,
                         udps=None,
                         n_dissections=20,
                         statistical_analysis=False,
                         splt_analysis=0,
                         n_meters=0,
                         performance_summary=False)
                         
for flow in my_streamer:
    print(flow)  # print it.
NFlow(id=0,
      expiration_id=0,
      src_ip='192.168.43.18',
      src_ip_is_private=1,
      src_port=52066,
      dst_ip='66.220.156.68',
      dst_ip_is_private=0,
      dst_port=443,
      protocol=6,
      ip_version=4,
      vlan_id=0,
      bidirectional_first_seen_ms=1472393122365,
      bidirectional_last_seen_ms=1472393123665,
      bidirectional_duration_ms=1300,
      bidirectional_packets=19,
      bidirectional_bytes=5745,
      src2dst_first_seen_ms=1472393122365,
      src2dst_last_seen_ms=1472393123408,
      src2dst_duration_ms=1043,
      src2dst_packets=9,
      src2dst_bytes=1345,
      dst2src_first_seen_ms=1472393122668,
      dst2src_last_seen_ms=1472393123665,
      dst2src_duration_ms=997,
      dst2src_packets=10,
      dst2src_bytes=4400,
      application_name='TLS.Facebook',
      application_category_name='SocialNetwork',
      application_is_guessed=0,
      requested_server_name='facebook.com',
      client_fingerprint='bfcc1a3891601edb4f137ab7ab25b840',
      server_fingerprint='2d1eb5817ece335c24904f516ad5da12',
      http_user_agent='',
      http_content_type='')

Extract post-mortem statistical flow features

NFStream performs 48 post mortem flow statistical features extraction:

from nfstream import NFStreamer
my_streamer = NFStreamer(source="facebook.pcap",
                         # Disable L7 dissection for readability purpose.
                         n_dissections=0,  
                         statistical_analysis=True)
for flow in my_streamer:
    print(flow)
NFlow(id=0,
      expiration_id=0,
      src_ip='192.168.43.18',
      src_ip_is_private=1,
      src_port=52066,
      dst_ip='66.220.156.68',
      dst_ip_is_private=0,
      dst_port=443,
      protocol=6,
      ip_version=4,
      vlan_id=0,
      bidirectional_first_seen_ms=1472393122365,
      bidirectional_last_seen_ms=1472393123665,
      bidirectional_duration_ms=1300,
      bidirectional_packets=19,
      bidirectional_bytes=5745,
      src2dst_first_seen_ms=1472393122365,
      src2dst_last_seen_ms=1472393123408,
      src2dst_duration_ms=1043,
      src2dst_packets=9,
      src2dst_bytes=1345,
      dst2src_first_seen_ms=1472393122668,
      dst2src_last_seen_ms=1472393123665,
      dst2src_duration_ms=997,
      dst2src_packets=10,
      dst2src_bytes=4400,
      bidirectional_min_ps=66,
      bidirectional_mean_ps=302.36842105263156,
      bidirectional_stddev_ps=425.53315715259754,
      bidirectional_max_ps=1454,
      src2dst_min_ps=66,
      src2dst_mean_ps=149.44444444444446,
      src2dst_stddev_ps=132.20354676701294,
      src2dst_max_ps=449,
      dst2src_min_ps=66,
      dst2src_mean_ps=440.0,
      dst2src_stddev_ps=549.7164925870628,
      dst2src_max_ps=1454,
      bidirectional_min_piat_ms=0,
      bidirectional_mean_piat_ms=72.22222222222223,
      bidirectional_stddev_piat_ms=137.34994188549086,
      bidirectional_max_piat_ms=398,
      src2dst_min_piat_ms=0,
      src2dst_mean_piat_ms=130.375,
      src2dst_stddev_piat_ms=179.72036811192467,
      src2dst_max_piat_ms=415,
      dst2src_min_piat_ms=0,
      dst2src_mean_piat_ms=110.77777777777777,
      dst2src_stddev_piat_ms=169.51458475436397,
      dst2src_max_piat_ms=409,
      bidirectional_syn_packets=2,
      bidirectional_cwr_packets=0,
      bidirectional_ece_packets=0,
      bidirectional_urg_packets=0,
      bidirectional_ack_packets=18,
      bidirectional_psh_packets=9,
      bidirectional_rst_packets=0,
      bidirectional_fin_packets=0,
      src2dst_syn_packets=1,
      src2dst_cwr_packets=0,
      src2dst_ece_packets=0,
      src2dst_urg_packets=0,
      src2dst_ack_packets=8,
      src2dst_psh_packets=4,
      src2dst_rst_packets=0,
      src2dst_fin_packets=0,
      dst2src_syn_packets=1,
      dst2src_cwr_packets=0,
      dst2src_ece_packets=0,
      dst2src_urg_packets=0,
      dst2src_ack_packets=10,
      dst2src_psh_packets=5,
      dst2src_rst_packets=0,
      dst2src_fin_packets=0)

Early statistical flow features extraction

NFStream performs early (up to 255 packets) flow statistical features extraction (also referred as SPLT analysis in the literature)

from nfstream import NFStreamer
my_streamer = NFStreamer(source="facebook.pcap",
                         # We disable both l7 dissection and statistical analysis
                         # for readability purpose.
                         n_dissections=0,
                         statistical_analysis=False,
                         splt_analysis=True)
for flow in my_streamer:
    print(flow)
NFlow(id=0,
      expiration_id=0,
      src_ip='192.168.43.18',
      src_ip_is_private=1,
      src_port=52066,
      dst_ip='66.220.156.68',
      dst_ip_is_private=0,
      dst_port=443,
      protocol=6,
      ip_version=4,
      vlan_id=0,
      bidirectional_first_seen_ms=1472393122365,
      bidirectional_last_seen_ms=1472393123665,
      bidirectional_duration_ms=1300,
      bidirectional_packets=19,
      bidirectional_bytes=5745,
      src2dst_first_seen_ms=1472393122365,
      src2dst_last_seen_ms=1472393123408,
      src2dst_duration_ms=1043,
      src2dst_packets=9,
      src2dst_bytes=1345,
      dst2src_first_seen_ms=1472393122668,
      dst2src_last_seen_ms=1472393123665,
      dst2src_duration_ms=997,
      dst2src_packets=10,
      dst2src_bytes=4400,
      # The sequence of 10 first packet direction, size and inter arrival time.
      splt_direction=[0, 1, 0, 0, 1, 1, 0, 1, 0, 1],
      splt_ps=[74, 74, 66, 262, 66, 1454, 66, 1454, 66, 463],
      splt_piat_ms=[0, 303, 0, 0, 313, 0, 0, 0, 0, 1])

Convert your network flows into a Pandas dataframe

From pcap to Pandas DataFrame?

my_dataframe = NFStreamer(source='facebook.pcap').to_pandas(ip_anonymization=False)
my_dataframe.head(5)

Convert your network flows into a CSV file

From pcap to csv file?

flows_count = NFStreamer(source='facebook.pcap').to_csv(path=None,
                                                        flows_per_file=0,
                                                        ip_anonymization=False)

Extending NFStream

Didn't find a specific flow feature? add a plugin to NFStream in few lines:

from nfstream import NFPlugin
    
class MyCustomFeature(NFPlugin):
    def on_init(self, packet, flow):
        # flow creation with the first packet
        if packet.raw_size == self.custom_size:
            flow.udps.packet_with_custom_size = 1
        else:
            flow.udps.packet_with_custom_size = 0
	
    def on_update(self, packet, flow):
        # flow update with each packet belonging to the flow 
        if packet.raw_size == self.custom_size:
            flow.udps.packet_with_custom_size += 1


extended_streamer = NFStreamer(source='facebook.pcap', 
                               udps=MyCustomFeature(custom_size=555))

for flow in extended_streamer:
    # see your dynamically created metric in generated flows
    print(flow.udps.packet_with_custom_size) 

Run your Machine Learning models

In the following, we want to run a classification of flows application category based on a trained machine learning model than takes as features bidirectional_packets and bidirectional_bytes of a flow and predict if traffic category is SocialNetwork or not. In the following example, we decide to predict only at flow expiration stage.

Training your model

from nfstream import NFPlugin, NFStreamer
import numpy
from sklearn.ensemble import RandomForestClassifier

df = NFStreamer(source="training_traffic.pcap").to_pandas()
X = df[["bidirectional_packets", "bidirectional_bytes"]]
y = df["application_category_name"].apply(lambda x: 1 if 'SocialNetwork' in x else 0)
model = RandomForestClassifier()
model.fit(X, y)

Start your ML powered streamer on live traffic

class ModelPrediction(NFPlugin):
    def on_init(self, packet, flow):
        flow.udps.model_prediction = 0
    def on_expire(self, flow):
        # You can do the same in on_update entrypoint and force expiration with custom id. 
        to_predict = numpy.array([flow.bidirectional_packets,
                                  flow.bidirectional_bytes]).reshape((1,-1))
        flow.udps.model_prediction = self.my_model.predict(to_predict)

ml_streamer = NFStreamer(source="eth0", udps=ModelPrediction(my_model=model))
for flow in ml_streamer:
    print(flow.udps.model_prediction)

More example and details are provided on the official documentation. You can test NFStream without installation using our live demo notebook.

Installation

Using pip

Binary installers for the latest released version are available:

python3 -m pip install nfstream

Build from sources

If you want to build NFStream from sources on your local machine:

linux Linux

sudo apt-get update
sudo apt-get install autoconf automake libtool pkg-config libpcap-dev flex bison
sudo apt-get install libusb-1.0-0-dev libdbus-glib-1-dev libbluetooth-dev libnl-genl-3-dev
git clone https://github.com/nfstream/nfstream.git
cd nfstream
python3 -m pip install -r requirements.txt
python3 setup.py bdist_wheel

osx MacOS

brew install autoconf automake libtool pkg-config
git clone https://github.com/nfstream/nfstream.git
cd nfstream
python3 -m pip install -r requirements.txt
python3 setup.py bdist_wheel

Contributing

Please read Contributing for details on our code of conduct, and the process for submitting pull requests to us.

Ethics

NFStream is intended for network data research and forensics. Researchers and network data scientists can use these framework to build reliable datasets, train and evaluate network applied machine learning models. As with any packet monitoring tool, NFStream could potentially be misused. Do not run it on any network of which you are not the owner or the administrator.

License

This project is licensed under the LGPLv3 License - see the License file for details