In this blog post series, we are exploring the emerging SONiC-DASH networking technology, including its use cases, its integration with DPUs, and the specifics of DASH testing. Part 2 of our series describes DASH’s enablement for a new DPU platform in 4 key steps.
Why SONiC-DASH is a Game Changer for DPU and SmartNIC
Programmable networking hardware platforms have been growing in popularity over the past few years – a trend that is predicted to continue. An emerging open-source technology, DASH (Disaggregated APIs for SONiC Hosts), has become a game changer for DPUs and SmartNICs, creating new means for implementation of the SDN data plane. It lays a foundation for new DPU-based solutions that can meet pressing market needs.
Initiated by Microsoft and transitioned to the SONiC community in 2021, DASH optimizes cloud services performance by leveraging programmable hardware for flexible high-speed flow processing at scale. It builds upon traditional SONiC network operating system architecture while delivering the same benefits that allow software products to be customized for a variety of use cases.
In the Part 1 of our SONiC-DASH series, we took a closer look at the most fundamental VNET-to-VNET scenario. As we dive deeper into this technology in this article, I will explore the steps required for integrating DASH with a DPU based on PLVision’s experience developing DASH-enabled products.
SONiC-DASH integration is like any other NOS integration
What does it mean to enable SONiC-DASH for a new DPU platform? Based on our experience with various open networks operating systems and recently delivered DASH projects, we find it very similar to any other NOS integration. The following common steps are required:
- First, platform enablement needs to happen – make sure that your Linux of choice is up and running on the platform and DPU-specific drivers are loaded and able to control the datapath. This is the bare minimum to bring any new platform to life and start working with it. To complete platform integration for a NOS, a platform abstraction interface must be developed. Usually, it’s a set of software plugins aimed at accessing and controlling peripheral hardware like PSU, fans, EEPROM, etc.
- The next step is hardware abstraction layer development and integration. In terms of NOS, this is an interface between control plane components and hardware datapath. For a specific NOS, each new switch ASIC should provide its own implementation. Once hardware abstraction layer (HAL) is implemented and integrated with the NOS, control plane components can push configuration to hardware and get operational data back. Usually, there is a database right on top of HAL. This DB keeps the state of hardware on the side of NOS applications. There is also a translation layer that implements “configuration logic” for a particular ASIC. It sits between DB and HAL and works in both directions:
– monitoring changes in DB and translates to a proper sequence of HAL calls;
– monitoring hardware events via HAL and populating updates to DB.
Ultimately, this enables NOS protocols and features to operate with hardware assistance.
- There is an operation and configuration database at the heart of each NOS. This database stores the configuration and run-time state of network nodes. Each control plane protocol and feature keep its data there, in particular things like port mode/speed configuration, L2 interfaces, routing entries, ACL entries, and counters. The structure of this database is defined by device capabilities – i.e. which protocols and features the product is intended to run. For each new product, a schema for this DB needs to be tuned and adjusted to match its requirements.
- Finally, user interfaces need to be enabled to provide configuration and monitoring capabilities. Again, the set of UIs is driven by product requirements and use cases.
Now, let’s map these steps to the DASH environment.
DASH resembles SONiC’s System Architecture : it reuses most of the core components, while control plane elements that are irrelevant for DPU are removed, and a few elements that extend SONiC to DPU use cases are added. Overall, the following levels of architecture are present in SONiC-DASH:
- On the platform side, we have Debian Linux, similar to SONiC, as the base OS. The control over peripherals is implemented via Python platform plugins that wrap Linux drivers for LEDs, PSU, fans, etc.
- In the SONiC realm, HAL is called SAI (Switch Abstraction Interface). For every new DPU, a DASH SAI library needs to be implemented and integrated with the syncd component. The main purpose of syncd is to synchronize the hardware state with a Redis DB. You might already have guessed that Redis is used at this level to keep configuration and run-time data of DASH control plane.
- Moving to the next level, we have Redis DB keeping state for control plane applications with the help of an orchestration agent. For DASH, the orchestration agent is called dashorch. The dashorch agent, along with control plane application agents, is packed into the SWSS container. Overall, the SWSS container is responsible for preserving NOS state at the application level and translating the application state to the hardware DB. In turn, hardware DB drives DPU configuration via SAI calls.
- Now we’ve reached the user interface level. DASH APIs are exposed via the gRPC interface. A usual SONiC component container runs the gRPC server and exposes API to the outside management system. An SDN controller configures DASH via gRPC (set/get) calls to make DPU process traffic in the required way.
Building a basic version of SONiC-DASH for a new DPU in 4 steps
The entire network operating system arch diagram looks massive and complicated. There is a way to reduce it down to just the essential components. We will take a bottom-up approach, starting from the hardware level and moving to user interfaces, picking only the fundamental components to get SONiC-DASH running on a DPU. This results in a four-step roadmap that allows us to build a bare minimum of DASH for a new DPU. This functionality allows us to perform essential tests for a new product. Therefore, the SONiC-DASH integration roadmap includes four steps:
- Step #1: “DPU Platform integration for SONiC-DASH”
- Step #2: “SAI library development. Integration with syncd and redisDB”
- Step #3: “Tuning SWSS Lite to product capabilities”
- Step #4: “Enabling gRPC”
Step #1 is all about bringing up Linux OS along with platform-specific drivers for the pmon container and dependencies. It results in SONiC-DASH platform containers running on a single DPU. It’s possible to run control plane containers as well, but it will have no effect until the libsai library is integrated in the next step. In short, it gives us control over platform-specific devices and prepares the environment for development on higher levels.
Step #2 starts with the definition of DPU capabilities required to implement mandatory underlay and overlay SAI APIs and attributes. This translates into a set of DASH SAI APIs to be implemented and the structure of Redis ASIC_DB. Then a set of the underlay and overlay SAI APIs is developed to support DASH operations. Once libsai for DPU is ready, it is integrated with syncd application. At the end of this step, the DPU datapath is properly initialized and you can configure hardware via changes to Redis DB records.
Step #3 focuses on a customized swss container. For DASH, this is called “SWSS Lite”. A new orchestration agent called dashorch subscribes to Redis APP DB objects and translates its configuration to ASIC_DB, which is in turn applied to hardware via DASH SAI API. In addition, dashorch collects state data and populates the state of each configured table to STATE_DB. This closes the gap between control plane applications (containers) and libsai and sets the stage for applications like LLDP or BGP to operate.
Step #4 enables user interfaces. In the case of DASH, we have a rather limited list: gNMI and DASH API over gRPC. This brings us to a DASH/gNMI container, which enables an external SDN controller to operate a DPU device. The DASH/gNMI server writes DASH object attribute values to the CONFIG_DB and/or DASH APP_DB. Finally, we are at the point where a DPU can be externally configured and monitored. Major execution and control flows for DASH are enabled.
These essential steps lead us to a minimal set of components sufficient to start the DASH stack and perform a functional validation of the system.
Of course, each integration step requires testing, and we have different approaches to ensure quality while taking these steps.
As a cutting-edge open networking system, SONiC-DASH gives you the architectural freedom to develop the optimal solution for your DPU or SmartNIC. Leveraging the outcomes of community collaboration allows you to significantly speed up the launch of your new DASH-based product to the market. DASH provides scalability, interoperability, and cost-efficiency, and is expected to have the same success and wide adoption for DPUs and SmartNICs as SONiC for switches.
However, with this emerging technology, DASH developers face significant development challenges, including a lack of proven and defined standards. Evidently, there is also a lack of relevant DASH expertise on the market. PLVision’s experience working with open NOS and early DASH exploration has enabled us to create a comprehensive, transparent, step-by-step roadmap for integrating it with a new DPU, as presented in this blog post.
In the next part of our DASH series, we will provide an in-depth review of DASH testing, which is a critical stage for the successful delivery of your product.
Get expert consultation
- SONiC-DASH: Integration for a DPU in 4 Steps - January 23, 2023
- The Best Open Network Operating System: SONiC, Stratum or DentOS? - December 14, 2020
- Thin NOS Approach with Open vSwitch for New Hardware Products - October 21, 2020