After having followed P4Lang P4 for dummies [ #002 ] article, you should have now a working P4 development environment.
Requirement
|
Overview
Let's start writing. compiling and running our first P4 program.
Article objective
This 3rd article propose to write your first P4 program based on P4Lang P4 for dummies [ #001 ] my_program.p4 specification.
Diagram: my_program.p4
[ #003 ] - Cookbook
Verification
Conclusion
In this article you:
- wrote your first P4 program
- use p4c in order to compile it
- learned how to instantiate virtual ethernet pair in order to bind them with simple_switch
- launch simple_switch and load your program on it
- set up a test environment using scapy
- and verify your program using a combination a scapy and tcpdump
P4Lang P4 for dummy [ #002 ] - key take-away
- my_program.p4 is written following V1Model definition that defines:
- a parsing stage
- a checksum verification stage
- an ingress packet processing control stage
- an egress packet processing control stage
- a checksum computation stage
- deparser stages
V1Switch( prs_main(), ctl_verify_checksum(), ctl_ingress(), ctl_egress(), ctl_compute_checksum(), ctl_deprs() ) main;
It is described by the diagram below:
In a subsequent article we will dissect my_program.p4, but as you could observe, P4 programming is quite intuitive as it is all about switching a packet based on intrinsic ingress packet header and metadata (like packet ingress port) value.
Requirement
|
Overview
In order to be able to start P4 programming, we will concretely start setting up a P4 development environment using Open Source P4Lang P4 community software.
Article objective
This article exposes how to install:
- P4Lang PI
- P4Lang BMv2
- P4Lang p4c
Operating system supported
- Debian 10 (stable aka buster)
- Ubuntu 18.04 (Bionic beaver)
- Ubuntu 20.04 (Focal fossa)
Note
You can of course use the distribution of your choice as soon as the Operating System you are using has all the necessary third party dependencies required by P4Lang software, mainly:
- protobuf
- grpc
- thrift
- nanomsg
- nnpy
You can find the full list here in Launchpad.
Diagram:
[ #002 ] - Cookbook
Verification
Conclusion
In this article you learned how to set up a P4 environment development
- Debian 10
- Ubuntu 18.04
- Ubuntu 20.04
And tested the installation by compiling RARE P4 code.
P4Lang P4 for dummy [ #002 ] - key take-away
- P4Lang P4 development environment creation is easy
- it uses P4Lang packages on Debian and Ubuntu
- These packages are maintained by RARE project and are nightly built based on P4Lang official GitHub
In the next article we will:
- write our first P4 program: my_program.p4 as it is specified in P4Lang P4 for dummies [ #001 ]
- compile my_program.p4
- launch P4Lang virtual switch called simple_switch and load my_program.p4 on it
- perform basic verification
Requirement
|
Overview
P4 is a language for programming the data plane of network devices. From p4.org web site:
«P4 is a domain-specific programming language for specifying the behaviour of the dataplanes of network-forwarding elements. »
Article objective
This 1st article exposes:
- A brief introduction to the P4 language
- A basic P4 development workflow
- Some basic specificities of the P4 language
Note
This article is preliminary a pure introduction to P4lang P4. It does not correspond in any way to an extensive programming language description nor a P4 compilation guide.
Diagram: P4 development workflow
[ #001 ] - Cookbook: P4 development workflow
Router for Academia Research & Education (RARE) & P4
Conclusion
In this article you:
- had a brief introduction of P4Lang P4 language
- had been presented a 10 thousand feet view of P4 development workflow
- had been exposed a list of P4 targets and the use cases enabled by these targets
P4Lang P4 for dummy [ #001 ] - key take-away
__THE__ exciting INNOVATION provided by P4 boils down into this community language that unlocks and opens for you the door of system's dataplane. Till now, dataplane programming was reserved to commercial vendors. Some of these dataplanes like the well known CEF (Cisco Express Forwarding) are specific to Cisco equipment. Juniper, has its own dataplane (not sure about the name) implemented by Forwarding Plane component. (example of vMX architecture)
P4 language inherent characteristics:
- Behavioral programming language
- Language with constraints
- Limited number of variable types
- With fixed size
- P4 is not a general purpose language, You cannot program any software. like C, C++ or Java
It is therefore a simple language, that is easier to be tamed by network managers rather than pure software developers. Indeed, writing a P4 program is all about defining the behavior of a network packet processing algorithm based on intrinsic variables encoded into the packet header.
Requirement
|
Overview
BGP is THE protocol of Internet, it is used to exchange routing information between other BGP systems between Internet domains. It comes in two flavours:
External BGP(eBGP): Network Layer Reachability Information (NLRI) is exchanged between network domain called Autonomous system usually administratively independant. We are speaking about BGP inter-domain routing. As an example, let's us assume a BGP speaker from AS2200 (RENATER) advertising NLRI information to AS20965 (GÉANT R&E). From that point AS20965 has the knowledge of how to reach any network advertised by AS2200 based on the NLRI information.
Internal BGP (iBGP): NLRI is propagated between BGP speakers inside the same domain. We are speaking about BGP intra-domain routing. As an example, assume border router AS2200 in Paris connected to GEANT network and get NLRI information from AS20965. I will then propagate this information internally and advertise GEANT NLRI information via iBGP session to other BGP speaker inside network domain for AS2200.
iBGP requires a full mesh network between all BGP speakers inside a domain because of an anti-AS loop avoidance. Thus requiring n*(n-1)/2 number of sessions to be implemented. BGP route reflection is a proposal that remove full mesh requirement. BGP Edge router has now only 1 BGP session toward the RR, thus reducing network equipment workload.
Article objective
In this article we will describe how to build a carrier grade route reflector cluster composed by RR1 and RR2. In order to reach Telecom Internet Service provider 99,999% of availability:
Let's consider the architecture network of a fictitious service provider below, router reflector RR1 and RR2 are dual homed to a core P routers.
Diagram
[ #001 ] - Cookbook
Verification
Conclusion
In this article you:
- had a brief introduction of BGP protocol and BGP route reflector rationale
- learned the design consideration related to BGP RR setup
- got a typical BGP configuration example with a long list of AFI/SAFI enabled
- This configuration is not exhaustive as for example BGP add-path is available but not configured
- verified BGP RR operation
RARE validated design: [ BGP RR #001 ]- key take-away
- BGP Router Reflector use case does not require a commercial vendor router, it can be handled perfectly by a sowftare solution running on a server with enoough RAM.
The example above an example of a high availability Route Reflector that is able to handle BGP signalling for a high carrier Service Provider for all address familay
- Redundant BGP Router Reflection is ensured by deploying 2 RR (at minimum) belonging to the same BGP RR cluster
In addition to have several RR for the whole domain, it is also common to see hierarchical RR design. SOme Service provider deploy dedicated RR for specific address family (L3VPN unicast for example)
- RR in the same cluster run basic iBGP session
These RR also share the same cluster ID, in order to ensure route withdraw in case of routing advertisement
- RR should not be in the traffic datapath
This is the reason why we are setting high cost (4444 and 6666) for IPv4 and IPv6 respectively on both direction on the RR(s) interconnections ports
- RR design for a multi-service backbone
In the example, the RR client are running only IPv4/IPv6 but the RR design above can empower a Service provider backbone with additional service running on TOP of MPLS, L3VPN, 6VPE, VPLS EVPN etc.
- In the next article we will dissect the rr1 configurations
This will demonstrate some nice features proposed by freeRouter such as BGP template and nexthop tracking among a list of other feature not mentioned here... (like BGP add-path)
RR design test
You can test this design above in order to check RR and backbone router signalling.
- Set up freeRouter environment as describe above
- Get RARE code
git clone https://github.com/frederic-loui/RARE.git
cd RARE/00-unit-labs/0101-rare-validated-design-bgp/ make
c1: telnet localhost 10001 c2: telnet localhost 10002 c3: telnet localhost 10003 c4: telnet localhost 10004 c4: telnet localhost 10005 c6: telnet localhost 10006 c7: telnet localhost 10007 c8: telnet localhost 10008 rr1: telnet localhost 10010 rr2: telnet localhost 10011
cd RARE/00-unit-labs/0101-rare-validated-design-bgp/ make clean
In article #005 you learned how RARE/freeRouter is controlling a P4Emu/pcap dataplane. We also demonstrated that this setup could be integrated into real networks.
Requirement
|
Overview
Though P4Emu/pcap can be used for SOHO and can handle nx1GE of traffic, this comes at a high CPU load cost and thus a higher power consumption.
"Why write yet another software dataplane as freeRouter has already a working native software dataplane ?"
The partial answer to the question raised in the previous article was:
"decoupling control plane from the dataplane"
We learned that P4Emu:
- is able to understand the VERY same strict control message from freeRouter as it occurs with a P4 dataplane
- is able to switch packet emulating router.p4 using libpcap packet forwarding backend.
However, even though libpcap is a performant packet processing library, the kernel is still heavily sollicited and the higher the traffic rate is, the higher CPU workload becomes.
Article objective
In this article we'll using freeRouter setup deployed in #005 and replace P4Emu/pcap's dataplane by P4Emu/dpdk dataplane.
Source Wikipedia: https://en.wikipedia.org/wiki/Data_Plane_Development_Kit
The Data Plane Development Kit (DPDK) is an Open source software project managed by the Linux Foundation. It provides a set of data plane libraries and network interface controller polling-mode drivers for offloading TCP packet processing from the operating system kernel to processes running in user space. This offloading achieves higher computing efficiency and higher packet throughput than is possible using the interrupt-driven processing provided in the kernel.
It is important to note that though its name implies, P4Emu/dpdk is not emulating V1Model. P4Emu is emulating router.p4 packet processing logic and uses a packet forwarding library to effectively transmit packets at specific ingress port to the right egress port defined by freeRouter control plane message. However, in this precise case, packet processing is offloaded from the kernel to user space. The consequence is the ability with dpdk compatible NIC and driver, to reach tremendous traffic rate. DPDK is not available on all hardware, please refer to DPDK HCL.
Diagram
[ #006 ] - Cookbook
Verification
Conclusion
In this article you:
- had a demonstration of how to integrate freeRouter into a local area network (Similar to article #002)
- However instead of using P4Emu/dpdk we used a P4Emu/dpdk dataplane
- communication between freeRouter control plane and P4Emu/dpdk is ensured by pcapInt via veth pair [ veth0a - veth0b ]
- In this example the freeRouter with P4Emu/dpdk has only 1 dataplane interface that is bound to enp0s3 VM interface exposed to the local network as a bridged interface
[ #006 ] RARE/FreeRouter-101 - key take-away
- FreeRouter is using UNIX socket in order to forward packet dedicated to control plane + dataplane communication.
This essential paradigm is used to ensure communication between freeRouter and P4Emu/dpdk dataplane. It is ensured by pcapInt binary from freeRouter net-tools that will bind freeRouter socket (veth0a@locathost:22001) to a virtual network interface (veth0b@localhost:22002) connected to CPU_PORT 1.
- freeRouter is the control plane for P4Emu/dpdk dataplane
freeRouter is doing all the control plane route computation and write/modify/remove message entry P4 entries are created/modified/removed accordingly from P4Emu/dpdk tables. Although the name is P4Emu, it does not emulate BMv2 V1Model.p4, but rather router.p4
- dpdk port_id allocation
dpkg port_id allocation follow pci_id port naming convention starting from id 0. p4dpdk.bin is invoked with the parameter: (number_of_dpdk_port - 1) + 1 <--- CPU_PORT
- In this setup the combination of freeRouter/P4Emu/dpdk delivers a solution for small campus network having 10GE links (100GE links to be validated)
dpkg removed the kernel intervention calls for each packet processed. In that configuration packet processing is now off loaded to user space. Reducing kernel intervention to ~ 0%. Congratulation you have a hardware NIC assisted forwarding is system !
In subsequent article we will see how this setup behaves with a DELL 640 server powered by Intel(R) Xeon(R) Gold 6138 CPU x 2 and equipped with a Mellanox ConnectX-5 EX Dual Port 100GbE QSFP28 PCIe Adapter Low Profile card. We will also see how to connect this server to a P4 switch, BF2556X-1T. So stay tuned !
In article #003 and #004 you learned how RARE/freeRouter is controlling a P4 dataplane (BMv2 or TOFINO virtual model). We also demonstrated that this setup could be integrated into real networks. However, these P4 dataplanes are not suitable for day to day real operation as it have inherent software limitations. While freeRouter native software dataplane presents the advantage to get the entire feature set and is sufficient to handle a home network traffic load, we investigated a way to improve dataplane performance. In that context we considered to study:
- VMWare P4 XDP project
- ELTE T4P4S project
Requirement
|
Overview
However, XDP model was not complete enough in order to compile router.p4 and we could not generate the corresponding kernel bypass code with ELTE T4P4S based on BMv2 V1Model.p4. (A GitHub issue is still pending). In that context, Csaba freeRouter lead developer decided to develop P4Emu a software dataplane that has the particularity to:
- understand freeRouter control plane message meant to be addressed to a P4 dataplane
- thus maintaining the control plane decoupled to the dataplane as it was the case with BMv2 and BF_SWITCHD
One would ask: Why write yet another software dataplane as freeRouter has already a working native software dataplane. This is a very good and valid question. The answer boils down in:
"decoupling control plane from the dataplane"
We will see in subsequent article how P4Emu unlock new valid uses cases.
Article objective
In this article we'll using freeRouter setup deployed in #004 and replace bf_switchd providing freeRouter INTEL/BAREFOOT TOFINO's dataplane by P4Emu/pcap.
It is important to note that though its name, P4Emu/pcap is not emulating V1Model. P4Emu is emulating router.p4 packet processing logic and uses a packet forwarding library to effectively transmit packets at specific ingress port to the right egress port defined by freeRouter control plane message.
Diagram
[ #005 ] - Cookbook
Verification
Conclusion
In this article you:
- had a demonstration of how to integrate freeRouter into a local area network (Similar to article #002)
- However instead of using bmv2 or TOFINO we used a P4Emu/pcap dataplane
- communication between freeRouter control plane and P4Emu/pcap is ensured by pcapInt via veth pair [ veth250 - veth251 ]
- In this example the freeRouter with P4Emu/pcap has only 1 dataplane interface that is bound to enp0s3 VM interface exposed to the local network as a bridged interface
[ #005 ] RARE/FreeRouter-101 - key take-away
- FreeRouter is using UNIX socket in order to forward packet dedicated to control plane + dataplane communication.
This essential paradigm is used to ensure communication between freeRouter and P4Emu/pcap dataplane. It is ensured by pcapInt binary from freeRouter net-tools that will bind freeRouter socket (veth251@locathost:22710) to a virtual network interface (veth250@localhost:22709) connected to CPU_PORT 0.
- freeRouter is the control plane for P4Emu/pcap dataplane
freeRouter is doing all the control plane route computation and write/modify/remove message entry P4 entries are created/modified/removed accordingly from P4Emu/pcap tables. Although the name is P4Emu, it does not emulate BMv2 V1Model.p4, but rather router.p4
- In this setup the combination of freeRouter/pcap deliver a solution for SOHO network having 1GE links
However, 1GE traffic rate require 50% of one CPU thread. Nevertheless, traffic rate achieved is higher with P4Emu/pcap than freeRouter native software packet forwarding software.
In subsequent article we will see how we can improve the latter requirement implied by P4Emu/pcap.
In the previous article #003 "Are you P4 compliant ?" we exposed a setup where RARE/freeRouter was controlling BMv2 P4 dataplane called simple_switch_grpc. In this article we replace the open source BMv2 target by a commercial virtual target provided by INTEL/BAREFOOT. As a side note, we will show that this setup can be integrated with real networks. (with inherent software limitations)
Requirement
|
Overview
I'm repeating the core message from #003: For those who are not familiar with data plane programming and especially with P4, "P4 is a domain-specific programming language for specifying the behaviour of the dataplanes of network-forwarding elements." (from p4.org) in short it helps you to write a "program specifying how a switch processes packets".
Article objective
In this article we'll using freeRouter setup deployed in #003 and replace bmv2/simple_switch_grpc providing freeRouter P4Lang's dataplane by INTEL BAREFOOT/bf_switchd. Actually the effective dataplane is ensured by INTEL/BAREFOOT virtual bf_switchd model running RARE P4 program called: bf_router.p4.
Diagram
[ #004 ] - Cookbook
Verification
Conclusion
In this article you:
- had a demonstration of how to integrate freeRouter into a local area network (Similar to article #002)
- However instead of using bmv2 we used a INTEL/BAREFOOT P4 dataplane called: TOFINO (bf_switchd)
- TOFINO bf_switchd target is running RARE bf_router.p4
- communication between freeRouter control plane and TOFINO is ensured by pcapInt via veth pair [ veth250 - veth251 ]
- This communication is possible via RARE bf_forwarder.py based on GRPC P4Lang BfRuntime python binding
- In this example the TOFINO bf_switchd P4 virtual switch model has only 1 dataplane interface that is bound to enp0s3 VM interface exposed to the local network as a bridged interface
[ #004 ] RARE/FreeRouter-101 - key take-away
- FreeRouter is using UNIX socket in order to forward packet dedicated to control plane + dataplane communication.
This essential paradigm is used to ensure communication between freeRouter and TOFINO bf_switchd P4 dataplane. It is ensured by pcapInt binary from freeRouter net-tools that will bind freeRouter socket (veth251@locathost:22710) to a virtual network interface (veth250@localhost:22709) connected to CPU_PORT 64.
- freeRouter control plane and dataplane communication is enabled by RARE bf_forwarder.py
bf_forwarder.py is a simple python script based on GRPC client BfRuntime python library.
- freeRouter is the control plane for TOFINO bf_switchd P4 dataplane
freeRouter is doing all the control plane route computation and write/modify/remove message entry via BfRuntime so that P4 entries are created/modified/removed accordingly from P4 tables
- TOFINO bf_switchd virtual model target
While TOFINO bf_switchd virtual model target is a very good choice for packet processing algorithm validation on TOFINO platform, the virtual model is not a target for production use. We will see in next articles how we can reach TREMENDOUS traffic throughput required by Internet Service Provider's use cases. Indeed, while with the model we can validate algorithm accuracy, traffic transfers achieved have a very low throughput. (I could barely make my setup described above working)
- TOFINO bf_switchd hardware target
In a subsequent article we will demonstrate how we can create with RARE/freeRouter/TOFINO TNA architecture, a service provider/carrier grade router that technically is able to switch 3.3 Tbps of traffic (line rate) using EdgeCore WEDGE100BF32X hardware switch.
TOFINO family most powerful Programmable Switching ASIC has the ability to switch 6.5 Tbps traffic throughput, our WEDGE100BF32X switches are powered by the ASIC's little brother that is able to handle 3.3 Tbps line rate traffic throughput.