Datacenter Traces

In our research we use actual traces, collected from different production datacenters. We share these traces (sanitized) for the benefit of the research community.

Traces Info

Our traces are collected from three different datacenters, each is provided in the form of a (GEXF) graph file. The graph nodes are either a datacenter machine or an internet host. Some of the machines are directly monitored (indicated by machine type = ASSET) and we can track their traffic with unmonitored machines. For each dataset we also provide ground truth information regarding the app (id) of each ASSET machine (indicated by field named sol).

The number of nodes in each trace is provided by the following table:


The distribution of app sizes is provided by the following figure:

For more details please see our paper on Automated and Traffic-Pattern Based Application Clustering in Datacenters. To cite when using our database please use the following Paper bibtex entry:

 author = {Liron Schiff and Ofri Ziv and Manfred Jaeger and Stefan Schmid},
 title = {NetSlicer: Automated and Traffic-Pattern Based Application
             Clustering in Datacenters},
 booktitle = {Proc. ACM SIGCOMM 2018 Workshop on Big Data Analytics and Machine Learning for Data Communication Networks (Big-DAMA)},
 year = {2018},