Storage for containerized applications is a complicated subject. Not only are containers unable to store data inside their own file systems persistently due to the fact that any data stored inside a container is deleted permanently when the container shuts down, but there are multiple approaches for implementing external storage that containers can use.
Fortunately, the Container Storage Interface, or CSI, simplifies matters by making it possible to use multiple container storage solutions in a standardized way on Kubernetes and other popular container platforms.
Let’s take a look how CSI works and why it matters.
A note on container storage
As noted above, containers can’t store data persistently inside their own file systems. If you want to implement persistent storage for a containerized application, you need to store the data in a location that is external to the containers, then provide a way for the containers to connect to it.
There are many approaches for building storage that containers can use. You can set up storage directories on the individual servers that host your containers, then interface your containers with those directories. You could build a distributed storage system that spans multiple servers using a platform like GlusterFS or CephFS. You could use an object storage bucket in the cloud to store data for containerized applications.
Not only does each of these approaches represent a different storage architecture, but there are also multiple software tools or services that you could use within each category. A variety of open source and closed source distributed storage engines exist, for instance, and each major cloud provider has its own object storage service.
As a result, there are myriad potential configurations for persistent container storage.
What is CSI?
The Container Storage Interface, or CSI, was created to provide a consistent, standardized way to integrate these various storage setups with container environments.
CSI is a standard that defines how storage systems can expose data to container orchestration platforms, and how the orchestration platforms can connect to that data. As long as both the storage system and container orchestrator that you use are CSI-compliant, you can connect the two together using the same process that you would follow for any other CSI-compatible solutions.
CSI is the result of a Cloud Native Computing Foundation (CNCF) initiative designed to standardize container storage, which in the early days of containers (which is to say circa 2015, shortly after Docker and Kubernetes first appeared) was fragmented by different storage methods and technologies.
Today, there are more than 100 CSI-compatible storage solutions that work with popular orchestrators like Kubernetes.
Why is CSI important?
The value of CSI lies in the fact that it standardizes storage management while still providing flexibility and choice with regard to individual storage solutions.
In other words, CSI lets you choose whichever storage architecture and software platform make most sense to you, while keeping the configuration process simple. Whether you want to use cloud storage, an on-premises scale-out storage system or something else, you can connect it to your container orchestrator in a consistent way, thanks to CSI.
From this perspective, CSI is important for the same reasons that make any widely used community standard important: It ensures the interoperability of different solutions from different vendors. Just as the TCP/IP protocol suite lets multiple operating systems talk to each other over the network, and the HTTP protocol lets virtually any browser open any Web page, CSI lets any mainstream storage solution work with any major container orchestration tool.
Which orchestrators support CSI?
Most of the major container orchestrators, including Kubernetes, Cloud Foundry, Mesos and Nomad, available today support CSI.
The major exception is Docker Swarm, the orchestrator provided by the Docker project. Swarm uses Docker Data Volumes, Docker’s own storage solution, to provide storage to containers. CSI support for Swarm has been proposed, but given the fact that Docker Swarm has more or less been eclipsed by Kubernetes, it seems unlikely that there will be much community interest at this point in extending Swarm with CSI compatibility. Most developers who want CSI support are already using Kubernetes, and few are likely to want to switch to Swarm even if it becomes compatible with CSI.
Choosing a container storage solution
Thanks to CSI, most teams that deploy containers can select whichever storage solution they want, without worrying about how they will integrate it with their orchestrator. Whether you prefer to store data on-premises or in the cloud, or whether you use an open source or proprietary storage engine, you can connect your storage system easily to Kubernetes and almost every other container orchestration platform. Docker Swarm is the only exception.