Hardware acceleration for computational I/O, that is the integration of specialized computational devices into the I/O path, is one of the most promising technologies to further improve performance and energy efficiency of analyzing high-volume and high-velocity datasets and streams. Despite the general availability of a number of devices such as Data Processing Units (DPUs, also known as SmartNICs) and Samsung’s SmartSSDs, the open source data science ecosystem lacks an open and shared computational I/O software stack ecosystem. This lack hampers composability and innovation, and increases design cost. To address this. the Center for Research in Open Source Software launched Skyhook Data Management to create open source blueprints for a computational I/O stack that can be adopted by industry. With seed funding from industry component makers, SkyhookDM had a promising start: a blueprint using the unmodified Ceph open source distributed storage system was contributed to Apache Arrow in 2022 and has been included in every release since v7.0.0. It serves as a use case for SNIA Computational Storage TWG, and has attracted world-leading experts from industry and national labs.
This workshop invites participants to help put together a roadmap for an open and shared computational I/O software stack ecosystem at UC Santa Cruz following best practices in open source software techniques, strategies, and governance. We will discuss technical and organizational opportunities, leveraging readily available technologies and institutions.