Interpretation of results
The Graphing section gives means to try to arrange graphs, splitting a graph in sub-graphs for levels of a variable, ignoring less important variables, …
But if you have many variables and results, graphs will be hard to comprehend. It might be best to use statistical tools to reduce the search space and conduct a more restricted experimental design (see Variables).
Statistics
The --statistics argument will generate some statistical information.
This is the result of --statistics with the 01-iperf-advanced.npf example which
is using the iPerf tool to measure the performance of local TCP connections. Compared to the front page
example, this version has more variables, such as the number of CPU allocated to the iperf software.
The statistics are generated in two parts. First, for each metric observed (THROUGHPUT, L1 misses, etc…), some quantification is proposed. .. As with many observations this would quickly sum up to a lot of text, only observations with Then, a correlation matrix is built for all metrics at once.
Building dataset...
Statistics for THROUGHPUT
Learning dataset built with 6144 samples and 6 features...
No tree graph when maxdepth is > 8. Use --statistics-maxdepth 8 to fix it to 8.
Feature importance:
CONGESTION : 0.0017
NODELAY : 0.0025
PARALLEL : 0.2349
CPU : 0.2898
WINDOW : 0.4711
Max:
PARALLEL = 8, CPU = 8, WINDOW = 512, CONGESTION = reno, NODELAY = Nagle enabled, TIME = 2 : 272730423296.00
Min:
PARALLEL = 1, CPU = 5, WINDOW = 4, CONGESTION = bbr, NODELAY = Nagle enabled, TIME = 2 : 17406361.60
Means per variables:
PARALLEL:
1 : 23406949034.67
2 : 41691482521.60
3 : 55738238730.24
4 : 64936604357.97
5 : 69935443312.64
6 : 73196809803.09
7 : 73895472223.57
8 : 71723332840.11
CPU:
1 : 10506062342.83
2 : 29201748241.07
3 : 43929965499.73
4 : 53619540923.73
5 : 70731434530.13
6 : 84852326263.47
7 : 89814343543.47
8 : 91868911479.47
WINDOW:
1 : 104438169.60
2 : 104145715.20
4 : 104596548.27
8 : 104852411.73
16 : 104377548.80
32 : 106264166.40
64 : 99640165444.27
128 : 93488939008.00
256 : 94197328991.57
512 : 94784307855.36
1024 : 94311106478.08
2048 : 94559996477.44
4096 : 94357551404.37
8192 : 94045998503.25
16384 : 94278390906.88
32768 : 94756206018.56
CONGESTION:
bbr : 60031966392.32
cubic : 58966340638.72
reno : 58948317777.92
NODELAY:
Nagle disabled : 57695090414.93
Nagle enabled : 60935992791.04
Correlation matrix:
PARALLEL CPU WINDOW CONGESTION NODELAY THROUGHPUT
PARALLEL 1.00 -0.00 -0.00 -0.00 -0.00 0.23
CPU 1.00 0.00 0.00 0.00 0.41
WINDOW 1.00 0.00 0.00 0.25
CONGESTION 1.00 -0.00 -0.01
NODELAY 1.00 0.02
THROUGHPUT 1.00
Graph of correlation matrix saved to doc/covariance-THROUGHPUT-correlation.png
P-value of ANOVA (low p-value indicates a probable interaction):
PARALLEL CPU WINDOW CONGESTION NODELAY THROUGHPUT
PARALLEL 0.00 0.00 0.24 0.36 0.00
CPU 0.00 0.88 0.16 0.00
WINDOW 0.74 0.29 0.00
CONGESTION 0.37 0.51
NODELAY 0.02
Graph of a ANOVA matrix saved to doc/covariance-THROUGHPUT-anova.png
Generating graphs...
Pandas dataframe written to doc/covariance.csv
Graph of test written to /etinfo/users2/tbarbette/workspace/npf/doc/covariance-THROUGHPUT.png
Feature importance
The feature importance is built using the entropy of a regression tree.
It shows the importance of most variables. Here WINDOW is more important than PARALLEL, but arguably they’re both important and do contribute to the THROUGHPUT metric.
The regression tree is saved to a PDF file for visualization. In the example above, it is not generated because the tree is too deep. Use –statistics-maxdepth 5 to limit the tree depth.
The tree can be read as the most significant decisions to reach the best (or worst) performance.
Max/min and features per variables
The next lines show the variables for the best and the worst values.
Then, for each parameter, the mean of the result (in this case the throughput) for each parameter.
Interactions with ANOVA
Finally, the last available statistic is the p-value of a two-way ANOVA test for each pair of variables.
It shows the possible interaction between each pair of variables. If the P value is smaller than 0.05 there is a probable interaction. A value higher than 0.05 only means there is no clear linear interaction between variables.
Correlation matrix for all parameters
All the other statics are per-metric, a correlation matrix is then built for all metrics at once. The correlation matrix then shows the pearson correlation between each factor and each observation.
The correlation matrix is printed textually but also generated as a picture.
Correlation matrix are symmetrical. It shows in this cas the parameters have no correlation between themselves, but the interesting part is for the correlation between factors and results. We find again a notion of importance of the factors towards the throughput.