With the growing popularity of the disaggregated approach in data center networking, the final solution’s quality becomes even more challenging than before. This article describes a re-imagined approach for SAI’s functional and integration testing, based on SONiC’s native infrastructure. The SAI Challenger framework extends the boundaries from pure SAI testing to SAI integration with SONiC to interoperability with external PHYs and beyond.
SAI and SONiC – the two wings of OCP Networking
The Open Compute Project (OCP) positions itself as “a collaborative community focused on redesigning hardware technology to efficiently support the growing demands on compute infrastructure.”
Founded 10 years ago around the idea to design the world’s most energy efficient data center, OCP now comprises specifications for the most critical aspects of the modern data center architecture – servers, storages, rack and power, security, networking, hardware management, etc. – and beyond.
When speaking about data centers, we usually mean distributed powerful computing resources complemented by gigantic storages that can be allocated and scaled as needed. But what holds all the pieces together is networking. As of now, the OCP Networking Project includes 5 sub-projects: ONIE, ONL, SAI, SONiC and CBW. From my point of view, the most crucial are SAI and SONiC (Software for Open Networking in the Cloud). Why? Simply because it’s hard to name either a silicon vendor or an OEM/ODM not involved in at least one of these open initiatives. But is it just a trend or, let’s say, big money? From an engineering perspective, it’s definitely something more.
SAI defines generic, object-based CRUD APIs for switching silicon configuration and monitoring. The unified strict rules are used in both APIs and the attributes naming scheme. In addition to this, SAI is self-documented in the Doxygen format with compile time interface validation and supplementary metadata generation. To better understand how well-done the SAI design is, it is worth mentioning the PHY extension to SAI – a successful attempt to re-use SAI as a generic interface for the external PHY chip configuration. Also, the same design approaches were used in the Transponder Abstraction Interface (TAI) definition that adopted the use of many terms from SAI.
SONiC is not the only network operating system, but is certainly the most popular one using the SAI abstraction to interface ASIC in a vendor-independent manner. For the past several years, SONiC has evolved from a framework with a list of applications that were installed independently on a target device to the mature NOS with a rich set of features. With its Docker-based micro services architecture and Redis DB as its communication channel between the applications, SONiC became more than just a flexible and extendable core component of a modern DC. It is now a very powerful collaboration platform for networking experts all over the world to speed up developing and adopting new technologies – the true F1 of networking.
Switch Abstraction Interface (SAI) as a critical component of SONiC
Let’s take a deeper look into SONiC’s architecture and the way it integrates SAI as a south bound API for switching silicon abstraction.
From a high-level perspective, SONiC consists of a bunch of networking applications (e.g., FRR, LDDP, LACP, NAT) running independently in dedicated Docker containers that use Redis as an information source that shares about configuration and the system’s current state.
Also, SONiC runs a generic SyncD daemon that uses a vendor-specific SAI library to configure a switching silicon. Redis is the only communication channel between the SyncD and SONiC applications. This communication is happening through ASIC DB – Redis namespace. It contains serialized string data in the SAI format translated by the Orchagent service. This translation is done based on the configuration and runtime state (routes, neighbor MACs, etc.) generated by the applications. After, there is a conversion of SAI attributes from ASIC DB strings into C language binary data types. This step is performed by the SyncD serialization library based on SAI metadata.
What’s interesting about this architecture is that it allows decoupling a major part of the control plane from the thin shim ASIC driver layer (vendor SDK/SAI) – resulting in many benefits, such as:
- Independent development, testing and debugging of new NOS components.
- Isolation of components’ dependencies.
- Running the ASIC driver layer (Redis + SyncD + PMON) as a thin NOS – a thin SONiC model – similar to the Stratum NOS approach where P4Runtime, gNMI and gNOI interfaces are used as northbound API in a thin switch implementation model. This model enables:
- SONIC deployment on switches with limited HW capabilities
- The SDN-based approach for network configuration (the centralized control plane)
- Control of Smart NICs
Therefore, if your networking device (either a smart NIC or a switch ASIC) is still not supporting the SAI interface, it may be time to put it on your development roadmap.
SAI testing and integration
A common rule in a modular software development process is – whenever possible, each component should be well-tested independently. The same is applicable to the NOS development, in general, and the SAI library as a part of it.
Various ASIC vendors tend to use a typical SAI testing approach based on Python bindings implementation over the Thrift client-server model. This method allows for interfacing of the SAI library with PTF – the unittest based dataplane test framework. Apart from many obvious benefits of this approach, there are still several downsides that cannot be ignored:
- Implementation and maintenance of Python binding for SAI is quite challenging and time-consuming.
- A very small coverage of SAI PTF testcases is available in open source at the moment.
- There is no way to contribute new testcases for those who do not have a HW with the proper SAI implementation.
- PTF does not cover the SAI library integration part of testing.
What if a thin SONiC model could be reused for SAI testing and integration?
Yes, SyncD does not utilize all available SAI functions and attributes (the majority, but not all). However, if SyncD does not support something, it might be better to extend it instead of implementing the additional Python code. This is a win-win cooperation for SAI and SONiC.
To better understand the complexity of a possible PTF-based testing alternative, it’s time to dive deep into a few technical aspects of SONiC inter-application communication.
Communication between SyncD and SONiC applications
The communication rules between SyncD and SONiC applications are well-defined and quite simple. When Orchagent creates a new SAI object through libsairedis (a SAI interface that serializes SAI calls into a string format and writes data into Redis DB), it actually performs two operations on Redis DB:
- 1. Enqueues a SAI object create operation through LPUSH (in fact, a Redis linked list object is used as a queue to pass SAI object data between Orchagent and Syncd).
- 2. Inform Syncd about the new data in the linked list through PUBLISH operation.
/* KEYS : tableName + "_KEY_VALUE_OP_QUEUE * ARGV : key * ARGV : value * ARGV : op * KEYS : tableName + "_CHANNEL" * ARGV : "G" */ string luaEnque = "redis.call('LPUSH', KEYS, ARGV, ARGV, ARGV);" "redis.call('PUBLISH', KEYS, ARGV);";
Both Orchagent and SyncD use VIDCOUNTER key-value entry in the ASIC DB to generate new virtual ID (VID) values. These VID values are independent of SAI objects ID values (SAI OID) – a real object ID (RID). SAI OID to VID mapping and VID to OID mapping are happening in SyncD. Also, ASIC DB stores this vid2rid and rid2vid mapping so it can be used by the applications to find RID by VID and vice versa.
Here is an example of a typical SAI object create request executed by Orchagent:
LPUSH ASIC_STATE_KEY_VALUE_OP_QUEUE "SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000" \ '["SAI_SWITCH_ATTR_INIT_SWITCH","true", \ "SAI_SWITCH_ATTR_SRC_MAC_ADDRESS","52:54:00:EE:BB:70" \ ]' \ Screate PUBLISH ASIC_STATE_CHANNEL G
Here is an example of a SAI object’s attribute get request:
LPUSH ASIC_STATE_KEY_VALUE_OP_QUEUE \ "SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000" \ '["SAI_SWITCH_ATTR_DEFAULT_VIRTUAL_ROUTER_ID","oid:0x0"]' \ Sget PUBLISH ASIC_STATE_CHANNEL G
As you can see, the format of these requests is pretty simple and can be mapped easily into SAI definitions – even without knowing “the rules of the game.”
SAI Challenger is a Docker-based environment with pre-installed Redis DB and SyncD application. It can be used for SAI’s functional and integration testing in the environment, most closely replicating the real SONiC NOS runtime environment.
SAI Challenger provides a simple Python CRUD API to operate on SAI data in the ASIC DB. In addition to this, SAI Challenger uses pyTest as a framework for test cases (TCs) development and PTF test utilities to send/receive packages over the data plane. That’s it.
# This class defines SAI API to ASIC DB class Sai: # Allocate or retrieve VID for specific SAI object type # API returns VID as a string in format "oid:0x0000000000000000" def get_vid(self, obj_type, value=None) # Remove VID def pop_vid(self, obj_type, value) # SAI objects CRUD API def create(self, obj, attrs) def remove(self, obj) def set(self, obj, attr) def get(self, obj, attrs, do_assert=True) # Fixture with session scope to access SAI API from TCs @pytest.fixture(scope="session") def sai(): return Sai()
To get SAI Challenger sources:
git clone https://github.com/PLVision/sai-challenger.git cd sai-challenger/ git submodule update --init --recursive
To prepare the SAI Challenger framework for TCs execution on top of SONiC vslib:
docker build -f Dockerfile.saivs -t saivs-challenger .
To run SAI Challenger and execute TCs:
docker run --name sai-challenger-run -v $(pwd):/sai-challenger \ --cap-add=NET_ADMIN --device /dev/net/tun:/dev/net/tun \ -d saivs-challenger docker exec -ti sai-challenger-run pytest -v
SAI Challenger as a framework for external PHYs testing
PHY is a connector between MAC (SerDes) and a physical medium, such as optical fiber or copper transceivers. The necessity of using PHY depends on the platform/hardware design. Some platforms may be designed without PHYs (PHY-less) or with PHY integrated as a part of ASIC (Internal PHY). But in some cases, it might be an external PHY. External PHYs can serve various purposes like gearbox, retimer, MACSEC, multi gigabit ethernet PHY transceivers, etc.
As mentioned, SAI was extended a while back to provide a generic interface for External PHY configuration. This approach allows a PHY chip to be considered a separate type of SAI switch object with system-side to line-side port hierarchy.
In SONiC, a new SAIPHY SyncD Docker was created to support multiple external PHYs. Each PHY is controlled by a separate physyncd daemon that has been extended to support the new SAI APIs.
The communication flow between Orchagent and physyncd instances is identical to the flow between Orchagent and SyncD. This makes it quite simple to adopt SAI Challenger for external PHY testing needs as well. Moreover, a single instance of SAI Challenger can configure multiple external PHY instances together with a switching silicon instance, which introduces new integration level testing. The same L2/L3/ACL/etc. testcases written for a switching silicon’s testing can be reused in this more complicated scenario with external PHY configuration as an additional setup/teardown procedure. And, this procedure can be enabled on demand.
The testing approach described here is not something completely new or unique. For example, SONiC contains the SAI Player tool used to reproduce SAI provisioning sequences based on the record files created by Orchagent. Similar files with predefined SAI operations are used by the SONiC community for SyncD unit testing. I attempted to adapt and simplify this approach for wider use in SAI testing and integration. However, I wouldn’t consider it a replacement for PTF. The SAI Challenger framework is rather a simple option to reduce time to market, starting from SAI development to the final SONiC integration without compromising quality.