Cluster operations

NPF is built to distribute script over a cluster of (perhaps virtual) machines.

Cluster configuration

For simple test case –cluster client=user@server argument is enough to configure an experiment to run the client role on the machine server using the username user. However to define more parameters for the server, such as the NIC orders and interface names, one must use a .node file. Typically, you will create a file named server.node in a folder named cluster alongside your NPF tests files that contains a few configuration variables for the machine named server.

Note : all communication is done through SSH. You should have passwordless connection set up using SSH keys.

3 global parameters are supported :

addr=full_address_of_node //if unset, the node name is used
path=path/to/npf
user=user_for_ssh_connection

For instance, using –cluster client=user01@server01.network.edu is the same than using –cluster client=server01 with the following server01.node file:

addr=server01.network.edu
user=user01

Syncing NPF folder across your cluster improves performance as it avoids “rsyncing” dependencies and scripts for each tests. One suggested option is to use a NFS share, another possibility is to run sshfs on each node. If this is not possible for you, add the nfs=0 global parameter in the repo file.

Test files can use special variables like ${role:0:ip} to be replaced by the node’s ip that run the given role. Allowed types such as ip are defined below. Currently NPF does not support reading NIC’s IP or MAC address neither ensuring a specific NIC order. By default, each node has 32 randomly generated NICs with reference 0 to 31, using ifname eth%d, random MAC address and a 10.0.0.0/8 random IP address.

Most testie reference the NIC 0 as the first dataplane NIC to run the test. Therefore you should set your data plane NICs as the first ones, reading testies %info section to understand topology specifics.

NICs parameters are defined in the format N:type where N is a NIC reference number, and type is one of the following :

ifname=interface_name

mac=ma:ca:dd:rr:es:s_

pci=0000:00:00.0

ip=static_address

For tests using the standard networking stack, setting the ip and the ifname is enough, so each script can reference other node’s IP address and use ifconfig tools using the ifname.

For tests using DPDK, and more generally L2 tests, the PCI adress and MAC should be defined.

See cluster01.node.sample for an example. Note that localhost.node is the default node used when roles are not defined or not mapped.

Running a single script on multiple machines

It is authorized to assign multiple machines to the same role, by running –cluster client=user01@server01.network.edu client=user01@server02.network.edu.

The script will run on each machines, in this example therefore having multiple client machines. The constants $NPF_NODE_MAX can be used inside scripts and files to get the number of machines assigned to the script/files’s given role. Each machine will be assigned an increasing id available through $NPF_NODE_ID.

The number of machines assigned to a specific role can be retrieved via ${role:node}.

The following example shows how multiple client can be configured using different IPs, and use the WRK module to generate HTTP requests towards a common server. This is useful to create more load towards a device under test.

%variables
WRK_HOST=10.100.0.100

%init@client
ifconfig ${self:0:ifname} 10.100.0.${NPF_NODE_ID} netmask 255.255.255.0

%import@client wrk2

Accessing node information from Jinja templates

When using the jinja keyword in a script section, you can call the get_nodes(role) function to retrieve the list of nodes assigned to a given role. This is useful when a script on one role needs to reference properties (e.g. IP addresses) of nodes assigned to another role.

The function returns the list of node objects for the given role. Each node exposes its NIC information via get_nic(index), which provides attributes like ip, mac, ifname, and pci.

For example, the following test has multiple clients and a server that iterates over all client nodes to print their IP addresses:

%script@client autokill=false
echo "Client ${NPF_NODE_ID}"

%script@server jinja autokill=false
{% for client in get_nodes("client") %}
echo "Client IP is {{client.get_nic(0).ip}}"
{% endfor %}

Run with: npf_run --test mytest.npf --cluster client=user@machine1 client=user@machine2

See Meta-scripting with jinja for more details on using Jinja templating in NPF scripts.

Running the same script multiple times on each machine

It can be useful to run the same script multiple times in parallel on each machine of the same role. For instance, when the software does not support multi-threading, or to use multiple network namespace.

To run 16 times the “client” scripts, use –cluster client=user01@server01.network.edu,multi=16. Similarly to running the script on multiple nodes, ${NPF_MULTI_MAX} will be the given number of times scripts should run, and ${NPF_MULTI_ID} an increasing ID starting from 1 to $NPF_MULTI_MAX for each scripts.

The script command must be appended with -* to have the test run for each of the multi value, e.g.:

%script@client-*
echo "Script $NPF_MULTI_ID over $NPF_MULTI_MAX"

One can also append mode=netns to run each scripts inside different network namespaces. I.e. –cluster client=user01@server01.network.edu,multi=16,mode=netns. This is equivalent of running every of the “client” scripts with ip netns exec npfns$N … with N varying from 1 to 16. The namsepace must be created in an init script with Ip netns add npfns${NPF_MULTI_ID}. Check the modules/wrk-nsdelay.npf example that supports both multiple nodes and the “multi” feature as an example to simulate many different clients using a per-namespace link delay simulation using netem.