Veth Devices, Network Namespaces and Open vSwitch
Posted: February 4, 2018 Filed under: Networking, Unix 2 CommentsIt’s useful to be able to set up miniature networks on a Linux machine, for development, testing or just for fun. Here we use veth
devices and network namespaces to create a small virtual network, connected together with an Open vSwitch instance. I’m using a Raspberry Pi 3 for this, it’s less inconvenient when it goes wrong, but I don’t think anything is Pi specific (and I certainly wouldn’t recommend a Pi for serious routing applications).
A veth
device pair is a virtual ethernet cable, packets sent on one end come out the other (and vice versa of course):
$ sudo ip link add veth0 type veth peer name veth1 $ ip link show type veth 4: veth1@veth0: mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 92:e3:6f:51:b7:96 brd ff:ff:ff:ff:ff:ff 5: veth0@veth1: mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether aa:4c:fd:e3:cc:a1 brd ff:ff:ff:ff:ff:ff
I could assign an IP to both ends and try to send traffic through the link, but since the system knows about both ends, the traffic would get sent directly to the destination interface. Instead, I need to hide one end in a network namespace, each namespace has a set of interfaces, routing tables etc. that are private to that namespace. Initially everything is in the global namespace and we can create a new namespace, which are often named after colours, with the ip
command:
$ sudo ip netns add blue
Now put the “lower” end of the veth
device into the new namespace:
$ sudo ip link set veth1 netns blue
veth1
is no longer visible in the global namespace:
$ ip link show type veth 5: veth0@if4: mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000 link/ether aa:4c:fd:e3:cc:a1 brd ff:ff:ff:ff:ff:ff link-netnsid 0
but we can see it in the blue namespace:
$ sudo ip netns exec blue ip link show 1: lo: mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 4: veth1@if5: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 92:e3:6f:51:b7:96 brd ff:ff:ff:ff:ff:ff link-netnsid 0
Note that veth0
is in state LOWERLAYERDOWN
because veth1
is now DOWN
(as is the local interface in the namespace). We can now assign addresses to veth0
and veth1
and make sure all the interfaces are up:
$ sudo ip addr add 10.0.0.10/24 dev veth0 $ sudo ip netns exec blue ip addr add 10.0.0.1/24 dev veth1 $ sudo ip link set veth0 up $ sudo ip netns exec blue ip link set veth1 up $ sudo ip netns exec blue ip link set lo up
Now we can ping the other end:
$ ping -c1 10.0.0.1 PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data. 64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.197 ms --- 10.0.0.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.197/0.197/0.197/0.000 ms
and tcpdump
confirms that traffic really is being sent over the veth
link:
$ sudo tcpdump -i veth0 icmp tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on veth0, link-type EN10MB (Ethernet), capture size 262144 bytes 16:36:50.390395 IP 10.0.0.10 > 10.0.0.1: ICMP echo request, id 20113, seq 1, length 64 16:36:50.390714 IP 10.0.0.1 > 10.0.0.10: ICMP echo reply, id 20113, seq 1, length 64
That’s all we need for the most basic setup. Now we’ll add a second namespace and connect everything together with a switch – we could use a normal Linux bridge for this, but it’s more fun to use Open vSwitch and later use some very basic Openflow commands to set up a learning switch.
For a more complicated setup it’s usually a good idea to enable IP forwarding, so while we remember:
$ sudo bash -c "echo 1 > /proc/sys/net/ipv4/ip_forward"
And we now want another veth
pair and another namespace:
$ sudo ip netns add red $ sudo ip link add veth2 type veth peer name veth3 $ sudo ip link set veth3 netns red $ sudo ip netns exec red ip addr add 10.0.0.2/24 dev veth3 $ sudo ip netns exec red ip link set lo up $ sudo ip netns exec red ip link set veth3 up
Let’s remove the address assigned above to veth0
(we are going to put veth0
in the bridge anyway, but explicitly removing the address is tidier and prevents confusion later):
$ sudo ip addr del 10.0.0.10/24 dev veth0
Check we have openvswitch installed, on Ubuntu:
$ sudo apt-get install openvswitch-switch
and see what is already running:
$ sudo ovs-vsctl show b494c304-46b7-4ff8-9fa4-581952fae2f1 ovs_version: "2.3.0"
Add a new bridge:
$ sudo ovs-vsctl add-br ovsbr0 $ sudo ovs-vsctl show b494c304-46b7-4ff8-9fa4-581952fae2f1 Bridge "ovsbr0" Port "ovsbr0" Interface "ovsbr0" type: internal ovs_version: "2.3.0"
If you’ve been experimenting, remove the upper veth
s from any bridge they might be in:
$ sudo ip link set veth0 nomaster $ sudo ip link set veth2 nomaster
and add to the OVS bridge:
$ sudo ovs-vsctl add-port ovsbr0 veth0 $ sudo ovs-vsctl add-port ovsbr0 veth2 $ sudo ovs-vsctl show b494c304-46b7-4ff8-9fa4-581952fae2f1 Bridge "ovsbr0" Port "veth0" Interface "veth0" Port "veth2" Interface "veth2" Port "ovsbr0" Interface "ovsbr0" type: internal ovs_version: "2.3.0" $ ip link show type veth 5: veth0@if4: mtu 1500 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000 link/ether 4a:b7:05:b1:29:d6 brd ff:ff:ff:ff:ff:ff link-netnsid 0 7: veth2@if6: mtu 1500 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000 link/ether 5e:a7:f8:99:d4:ba brd ff:ff:ff:ff:ff:ff link-netnsid 1
Master for veth0
and veth2
is now the ovs-system
device. Note that both links are UP
.
Now we are ready to go (we set up everything within the namespaces earlier):
$ sudo ip netns exec blue ping -c1 10.0.0.2 PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data. 64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.922 ms --- 10.0.0.2 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.922/0.922/0.922/0.000 ms
Let’s try external connectivity:
$ sudo ip netns exec blue ping -c1 8.8.8.8 connect: Network is unreachable
Looks like a routing problem:
$ sudo ip netns exec blue ip route 10.0.0.0/24 dev veth1 proto kernel scope link src 10.0.0.1
There is no default route, let’s add one:
$ sudo ip netns exec blue ip route add default via 10.0.0.254
and this will need an address on the bridge itself:
$ sudo ip addr add 10.0.0.254/24 dev ovsbr0
Try again:
$ sudo ip netns exec blue ping -c1 8.8.8.8 PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. --- 8.8.8.8 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms
Better, we seem to be sending packets out of the namespace and this is confirmed by tcpdump on the bridge interface:
$ sudo tcpdump -i ovsbr0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on ovsbr0, link-type EN10MB (Ethernet), capture size 262144 bytes 13:20:19.777324 IP 10.0.0.1 > google-public-dns-a.google.com: ICMP echo request, id 5667, seq 1, length 64 13:20:24.831303 ARP, Request who-has 10.0.0.254 tell 10.0.0.1, length 28 13:20:24.831565 ARP, Reply 10.0.0.254 is-at 06:d4:34:9b:26:42 (oui Unknown), length 28
And we can see the packet exiting on wlan0
:
$ sudo tcpdump -i wlan0 icmp tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on wlan0, link-type EN10MB (Ethernet), capture size 262144 bytes 13:21:54.727306 IP 10.0.0.1 > google-public-dns-a.google.com: ICMP echo request, id 5697, seq 1, length 64
but sadly the source address is still in the 10.0.0.0 subnet and it’s not surprising that the Google DNS server isn’t responding.
Now, part of this exercise is to find out about Open vSwitch and its capabilities and I would hope that they would include setting up simple NAT translation, but I have no idea how to do that right now, so we’ll just use IP tables, so set up NAT and make sure forwarding is enabled:
$ sudo iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -o wlan0 -j MASQUERADE $ sudo iptables -F $ sudo iptables -P FORWARD ACCEPT
Now all is well:
$ sudo ip netns exec blue ping -n -c1 8.8.8.8 PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. 64 bytes from 8.8.8.8: icmp_seq=1 ttl=57 time=19.4 ms --- 8.8.8.8 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 19.434/19.434/19.434/0.000 ms
This assumes we have a default forwarding policy of ACCEPT. It may be prudent to be more selective:
$ sudo iptables -P FORWARD DROP $ sudo iptables -A FORWARD -s 10.0.0.0/24 -o wlan0 -j ACCEPT $ sudo iptables -A FORWARD -d 10.0.0.0/24 -i wlan0 -j ACCEPT
so we are just prepared to forward to and from the switch network.
We can add an external IP to the switch. The main connection to my Pi 3 is through wlan0
and so I’d like to leave that alone, so let’s put eth0
into the switch:
$ sudo ovs-vsctl add-port ovsbr0 eth0
Now attach a network cable eg. directly to another laptop (crossover cables are largely a thing of the past), configure an ip address in our 10.0.0.0/24 subnet:
$ sudo ip addr add 10.0.0.100/24 eth0
and we have connectivity out of our box – the external IP is now the switch address.
Finally, since we have been doing so well, let’s program our OVS bridge to be a learning switch. See, for example, http://openvswitch.org/support/dist-docs-2.5/tutorial/Tutorial.md.html for further information.
First, turn off the default flow rules:
$ sudo ovs-vsctl set Bridge ovsbr0 fail-mode=secure
Before when we created the OVS bridge, it started in “Normal” mode, with a single flow rule that sends every incoming packet out of every interface (except the one that it came in on), so the bridge is acting like a hub. Setting “fail-mode=secure” means there are no default rules so all packets are dropped.
First, if we have been playing, it’s a good idea to clear the rule table:
ovs-ofctl del-flows ovsbr0
Now set up the learning rules. The idea is that when a packet comes in from a particular MAC address, the switch remembers which interface the packet arrived on, so when it wants to send a packet to that address, it can just send it on the interface recorded earlier. We can do a similar thing with the local interface so we don’t need to configure the rules to handle whatever the local MAC address is (maybe there is a better way to handle the local interface – comments welcome).
ovs-ofctl add-flow ovsbr0 "table=0, priority=60, in_port=LOCAL, actions=learn(table=10, NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[], load:0xffff->NXM_NX_REG0[0..15]), resubmit(,1)" ovs-ofctl add-flow ovsbr0 "table=0, priority=50, actions=learn(table=10, NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[], load:NXM_OF_IN_PORT[]->NXM_NX_REG0[0..15]), resubmit(,1)"
The first rule says that when a packet originating locally is received, ie. that is being sent from a local process, add a rule (to table 10) that says that when an incoming packet is received, addressed to same MAC address, put the value 0xFFFF in the lower 16 bits of register 0. The second is the same but for packets received from the other interfaces in the switch, add a rule that puts the interface number in register 0. Having added a rule, processing continues with table 1.
In table 1, we have:
ovs-ofctl add-flow ovsbr0 "table=1 priority=99 dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,2)" ovs-ofctl add-flow ovsbr0 "table=1 priority=50 actions=resubmit(,10), resubmit(,2)"
The first rule sends packets with a broadcast ethernet address directly through to table 2, the second rule goes through table 10 first – the idea being that if the packet is being sent to a known MAC address, table 10 will put the number of the interface in register 0, or 0xffff if it’s the local MAC address, or 0 if the interface hasn’t been learned yet.
Finally table 2 just sends packets off to the right place using the register 0 values:
ovs-ofctl add-flow ovsbr0 "table=2 reg0=0 actions=LOCAL,1,2,3" ovs-ofctl add-flow ovsbr0 "table=2 reg0=1 actions=1" ovs-ofctl add-flow ovsbr0 "table=2 reg0=2 actions=2" ovs-ofctl add-flow ovsbr0 "table=2 reg0=3 actions=3" ovs-ofctl add-flow ovsbr0 "table=2 reg0=0xffff actions=LOCAL"
To inspect the all rules table (including any that have been added by the table 1 rules):
$ sudo ovs-ofctl dump-flows ovsbr0
Now thing should work much as they did before. If not, see the link above for further information on OVS testing and debugging.
IPv6 TUN reflector
Posted: August 25, 2013 Filed under: C++, Networking Leave a commentWe had a look as a simple network simulation using TUN a couple of posts ago: https://matthewarcus.wordpress.com/2013/05/18/fun-with-tun/.
Let’s now have a look at getting it all working with IPv6. Changing the address swapping code is fairly straightforward, and for some extra points we’ll add a facility for printing out the source and destination addresses of each packet forwarded, the correct function for doing this is now inet_ntop
, which works for both v4 and v6 addresses.
Here are the main changes [see https://github.com/matthewarcus/stuff/tree/master/tun for full code]:
It’s convenient to define a 32-bit swap function:
void swap32(uint8_t *p, uint8_t *q) { uint32_t t = get32(p); put32(p,get32(q)); put32(q,t); }
and our main function is now:
#define SRC_OFFSET4 12 #define DST_OFFSET4 16 #define SRC_OFFSET6 8 #define DST_OFFSET6 24 void reflect(uint8_t *p, size_t nbytes) { uint8_t version = p[0] >> 4; switch (version) { case 4: if (verbosity > 0) { char fromaddr[INET_ADDRSTRLEN]; char toaddr[INET_ADDRSTRLEN]; inet_ntop(AF_INET, p+SRC_OFFSET4, fromaddr, sizeof(fromaddr)); inet_ntop(AF_INET, p+DST_OFFSET4, toaddr, sizeof(toaddr)); printf("%zu: %s->%s\n", nbytes, fromaddr, toaddr); } // Swap source and dest of an IPv4 packet // No checksum recalculation is necessary swap32(p+SRC_OFFSET4,p+DST_OFFSET4); break; case 6: if (verbosity > 0) { char fromaddr[INET6_ADDRSTRLEN]; char toaddr[INET6_ADDRSTRLEN]; inet_ntop(AF_INET6, p+SRC_OFFSET6, fromaddr, sizeof(fromaddr)); inet_ntop(AF_INET6, p+DST_OFFSET6, toaddr, sizeof(toaddr)); printf("%zu: %s->%s\n", nbytes, fromaddr, toaddr); } // Swap source and dest of an IPv6 packet // No checksum recalculation is necessary for (int i = 0; i < 4; i++) { swap32(p+SRC_OFFSET6+4*i,p+DST_OFFSET6+4*i); } break; default: fprintf(stderr, "Unknown protocol %u\n", version); exit(0); } }
Setting up the the v6 addresses for our new interface is a little different. As before, we bring the interface up:
$ ip link set tun0 up
Now, we need to add a link-local address, mandatory for all IPv6 interfaces:
$ ip -6 addr add fe80::1/64 dev tun0
We can use ping6 to try this out, localizing the request to the tun0 interface:
$ ping6 -I tun0 fe80::a617:31ff:fe5a:334f
PING fe80::a617:31ff:fe5a:334f(fe80::a617:31ff:fe5a:334f) from fe80::1 tun0: 56 data bytes
64 bytes from fe80::a617:31ff:fe5a:334f: icmp_seq=1 ttl=64 time=0.164 ms
...
This address is also the link-local address of my Wifi interface, but there is no ambiguity as we must specify which interface to use.
We can also add a private network address. IPv6 does not have the same concept of a private network as IPv4, instead we define Unique Local Addresses: append 0xfd to a random 10 digit hex global id and add an arbitrary 4 digit subnet identifier. Any random global id is fine – the idea is to ensure that any given network will have a different id from any other private network it is likely to come in contact with – we don’t need to worry about true global uniqueness though the Birthday Paradox tells us that we are likely to have a potential conflict with only about a million private networks (there might be lots of people out there with the same name and birthday as you, but you are unlikely to meet one of them at random).
We can generate our own random address, for example, using the method described in RFC4193, or use /dev/random:
$ hexdump -v -e '/1 "%02x"' -n 5 /dev/urandom; echo
2acd2c8bc4
or just copy a random sequence from somewhere on the Internet, for example, the one used here:
$ ip -6 route add fd2a:cd2c:8bc4:0::/64 dev tun0
This adds a local network with a global id of 2acd2c8bc4 and a subnet id of 0.
We can also define a larger subnet:
$ ip -6 route add fd2a:cd2c:8bc4:1100::/56 dev tun0
Now traffic to any IPv6 address of form fd2a:cd2c:8bc4:11xx:… will be sent to our TUN device:
$ ping6 fd2a:cd2c:8bc4:11ff::23
PING fd2a:cd2c:8bc4:11ff::23(fd2a:cd2c:8bc4:11ff::23) 56 data bytes
64 bytes from fd2a:cd2c:8bc4:11ff::23: icmp_seq=1 ttl=64 time=0.110 ms
...
Indeed, we can define all subnets for another global id:
$ hexdump -e '/1 "%02x"' -n 5 /dev/urandom; echo
40bd2f7ba0
$ sudo ip -6 route add fd40:bd2f:7ba0::/48 dev tun0
Just for interest, here’s our entire IPv6 routing table:
$ route -A inet6
Kernel IPv6 routing table
Destination Next Hop Flag Met Ref Use If
fd2a:cd2c:8bc4::/64 :: U 1024 0 0 tun0
fd2a:cd2c:8bc4:1100::/56 :: U 1024 0 0 tun0
fd40:bd2f:7ba0::/48 :: U 1024 0 0 tun0
fe80::/64 :: U 256 0 0 wlan0
fe80::/64 :: U 256 0 0 tun0
::/0 :: !n -1 1 524 lo
::1/128 :: Un 0 1 35 lo
fe80::1/128 :: Un 0 1 10 lo
fe80::a617:31ff:fe5a:334f/128 :: Un 0 1 7 lo
ff00::/8 :: U 256 0 0 wlan0
ff00::/8 :: U 256 0 0 tun0
::/0 :: !n -1 1 524 lo
Finally, to set up a simple service to use IPv6:
In one terminal:
$ nc -l -6 9901
...
In another:
$ nc -6 fd2a:cd2c:8bc4:11ff::23 9901
...
and our logging now looks like this:
$ ./reflect -v
Capability CAP_NET_ADMIN: 1 0 1
Created tun device tun0
48: fe80::1->ff02::2
72: fe80::1->fd2a:cd2c:8bc4:11ff::23
72: fe80::1->fd2a:cd2c:8bc4:11ff::23
72: fe80::1->fd2a:cd2c:8bc4:11ff::23
72: fe80::1->fd2a:cd2c:8bc4:11ff::23
48: fe80::1->ff02::2
48: fe80::1->ff02::2
80: fe80::1->fd2a:cd2c:8bc4:11ff::23
80: fe80::1->fd2a:cd2c:8bc4:11ff::23
72: fe80::1->fd2a:cd2c:8bc4:11ff::23
79: fe80::1->fd2a:cd2c:8bc4:11ff::23
72: fe80::1->fd2a:cd2c:8bc4:11ff::23
72: fe80::1->fd2a:cd2c:8bc4:11ff::23
72: fe80::1->fd2a:cd2c:8bc4:11ff::23
72: fe80::1->fd2a:cd2c:8bc4:11ff::23
Those ff02::2 addresses are for IPv6 router discovery. The rest are two TCP flows, one in each direction (we can get more detail from Wireshark, in particular, the relevant port numbers, but this gives the general idea).
Fun with TUN
Posted: May 18, 2013 Filed under: C, Networking 1 CommentTUN devices are much used for virtualization, VPNs, network testing programs, etc. A TUN device essentially is a network interface that also exists as a user space file descriptor, data sent to the interface can be read from the file descriptor, and data written to the file descriptor emerges from the network interface.
Here’s a simple example of their use. We create a TUN device that simulates an entire network, with traffic to each network address just routed back to the original host.
For a complete program, see:
https://github.com/matthewarcus/stuff/blob/master/tun/reflect.cpp
First create your TUN device, this is fairly standard, most public code seems to be derived from Maxim Krasnyansky’s:
https://www.kernel.org/doc/Documentation/networking/tuntap.txt
and our code is no different:
int tun_alloc(char *dev) { assert(dev != NULL); int fd = open("/dev/net/tun", O_RDWR); CHECKFD(fd); struct ifreq ifr; memset(&ifr, 0, sizeof(ifr)); ifr.ifr_flags = IFF_TUN | IFF_NO_PI; strncpy(ifr.ifr_name, dev, IFNAMSIZ); CHECKSYS(ioctl(fd, TUNSETIFF, (void *) &ifr)); strncpy(dev, ifr.ifr_name, IFNAMSIZ); return fd; }
We want a TUN device (rather than TAP, essentially the same thing but at the ethernet level) and we don’t want packet information at the moment. We copy the name of the allocated device to the char array given as a parameter.
Now all our program needs to do is create the TUN device and sit in a loop copying packets:
int main(int argc, char *argv[]) { char dev[IFNAMSIZ+1]; memset(dev,0,sizeof(dev)); if (argc > 1) strncpy(dev,argv[1],sizeof(dev)-1); // Allocate the tun device int fd = tun_alloc(dev); if (fd < 0) exit(0); uint8_t buf[2048]; while(true) { // Sit in a loop, read a packet from fd, reflect // addresses and write back to fd. ssize_t nread = read(fd,buf,sizeof(buf)); CHECK(nread >= 0); if (nread == 0) break; reflect(buf,nread); ssize_t nwrite = write(fd,buf,nread); CHECK(nwrite == nread); } }
The TUN mechanism ensures that we get exactly one packet for each read, we don’t need to worry about fragmentation, and we just send each packet back with the source and destination IPs swapped:
static inline void put32(uint8_t *p, size_t offset, uint32_t n) { memcpy(p+offset,&n,sizeof(n)); } static inline uint32_t get32(uint8_t *p, size_t offset) { uint32_t n; memcpy(&n,p+offset,sizeof(n)); return n; } void reflect(uint8_t *p, size_t nbytes) { (void)nbytes; uint8_t version = p[0] >> 4; switch (version) { case 4: break; case 6: fprintf(stderr, "IPv6 not implemented yet\n"); exit(0); default: fprintf(stderr, "Unknown protocol %u\n", version); exit(0); } uint32_t src = get32(p,12); uint32_t dst = get32(p,16); put32(p,12,dst); put32(p,16,src); }
We don’t need to recalculate the header checksum as it doesn’t get changed by just swapping two 32 bit segments.
Handling IPV6 is left as an exercise for the reader (we just need to use a different offset and address size I think).
In this day and age, security should be prominent in our minds, particularly for long-running programs like our TUN server, so for extra points, let’s add in some capability processing.
(You might need to install a libcap-dev package for this to work, for example, with “sudo apt-get install libcap-dev” and link with -lcap).
Once we have started up, we should check if we have the required capability, we just require CAP_NET_ADMIN to be permitted:
cap_t caps = cap_get_proc(); CHECK(caps != NULL); cap_value_t cap = CAP_NET_ADMIN; const char *capname = STRING(CAP_NET_ADMIN); cap_flag_value_t cap_permitted; CHECKSYS(cap_get_flag(caps, cap, CAP_PERMITTED, &cap_permitted)); if (!cap_permitted) { fprintf(stderr, "%s not permitted, exiting\n", capname); exit(0); }
and then make effective what we require:
CHECKSYS(cap_clear(caps)); CHECKSYS(cap_set_flag(caps, CAP_PERMITTED, 1, &cap, CAP_SET)); CHECKSYS(cap_set_flag(caps, CAP_EFFECTIVE, 1, &cap, CAP_SET)); CHECKSYS(cap_set_proc(caps));
Finally, after creating our TUN object, before entering our main loop, we can relinquish our extra privileges altogether:
CHECKSYS(cap_clear(caps)); CHECKSYS(cap_set_proc(caps)); CHECKSYS(cap_free(caps));
For completeness, here are the error checking macros used above:
#define CHECKAUX(e,s) \ ((e)? \ (void)0: \ (fprintf(stderr, "'%s' failed at %s:%d - %s\n", \ s, __FILE__, __LINE__,strerror(errno)), \ exit(0))) #define CHECK(e) (CHECKAUX(e,#e)) #define CHECKSYS(e) (CHECKAUX((e)==0,#e)) #define CHECKFD(e) (CHECKAUX((e)>=0,#e)) #define STRING(e) #e
Of course, production code will want to do something more sophisticated than calling exit(0)
when an error occurs…
To use, compile for example with:
g++ -W -Wall -O3 reflect.cpp -lcap -o reflect
We can set permissions for our new executable to include the relevant capability, so we don’t need to start it as root:
$ sudo setcap cap_net_admin+ep ./reflect
Actually start it:
$ ./reflect&
Capability CAP_NET_ADMIN: 1 0 1
Created tun device tun0
We now have an interface, but it isn’t configured:
$ ifconfig tun0
tun0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
POINTOPOINT NOARP MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:500
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
With the interface running, set up networking:
$ sudo ip link set tun0 up
$ sudo ip addr add 10.0.0.1/8 dev tun0
Check all is well:
$ ifconfig tun0
tun0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:10.0.0.1 P-t-P:10.0.0.1 Mask:255.0.0.0
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:500
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
And try it out:
$ ping -c 1 10.0.0.41
PING 10.0.0.41 (10.0.0.41) 56(84) bytes of data.
64 bytes from 10.0.0.41: icmp_req=1 ttl=64 time=0.052 ms
--- 10.0.0.41 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.052/0.052/0.052/0.000 ms
Let’s check performance, firstly, a flood ping on the loopback device:
$ sudo ping -f -c10000 -s1500 127.0.0.1
PING 127.0.0.1 (127.0.0.1) 1500(1528) bytes of data.
--- 127.0.0.1 ping statistics ---
10000 packets transmitted, 10000 received, 0% packet loss, time 778ms
rtt min/avg/max/mdev = 0.003/0.006/0.044/0.002 ms, pipe 2, ipg/ewma 0.077/0.006 ms
compared to one through the TUN connection:
$ sudo ping -f -c10000 -s1500 10.0.0.100
PING 10.0.0.100 (10.0.0.100) 1500(1528) bytes of data.
--- 10.0.0.100 ping statistics ---
10000 packets transmitted, 10000 received, 0% packet loss, time 945ms
rtt min/avg/max/mdev = 0.022/0.032/3.775/0.038 ms, pipe 2, ipg/ewma 0.094/0.032 ms
Respectable. We have got ourselves a network!