Software-defined storage
This chapter provides an overview of how IT storage infrastructures can be designed with programmable software-defined storage (SDS) components into systems controlled through software, enabling dynamic assignment of IT resources. Resources can then be assigned based on application workload requirements with best-available resources aligned to business-requirements-based service level policies. Additionally, the benefits of how storage infrastructure can be provisioned, configured, reconfigured, and deprovisioned through SDS to optimize and use IT resources based on real-time business needs are illustrated.
This chapter includes the following sections:
2.1 Introduction to SDS
What exactly does Software-Defined Storage refer to? Most IT storage systems today are already based in either software, microcode, or both.
SDS in today’s business context refers to IT storage that goes beyond typical array interfaces (for example, command-line interface and graphical user interface) to operate within a higher architectural construct. SDS supports overall IT architectural definition, configuration, and operations, often referred to as software-defined infrastructure (SDI) The greatest value and versatility of this approach is the standardized programming interfaces applied across a heterogeneous, multivendor IT infrastructure.
IDC, an industry analysis firm, has defined three criteria for software-defined storage, as shown in Figure 2-1 on page 11:
Runs on industry-standard hardware
Offers a full suite of storage services
Embraces multiple storage options
SDS runs on industry-standard hardware, and does not rely on proprietary adapters, ASIC, FPGA, or other co-processors. In this case, standard hardware includes x86-based, as well as OpenPOWER servers.
SDS offers a full suite of storage services. These services include data footprint reduction technologies (such as compression, data deduplication, and thin provisioning) and copy services (such as point-in-time copies or remote distance mirroring). Software that merely re-routes I/O traffic across to different devices is not SDS.
SDS embraces multiple storage options. SDS can be sold as software-only by one vendor, enabling customers to purchase industry-standard servers from a variety of other suppliers. SDS can also be pre-installed on industry-standard servers and sold as a pre-built system or appliance. In addition, SDS can run in private or public cloud configurations, with a storage utility license that charges only for the amount of storage capacity or I/O bandwidth actually used.
 
Figure 2-1 Storage for software-defined infrastructure
2.2 SDS overview
It is important to recognize that the SDS and the larger SDI architectural concepts are relatively new and still evolving technological approaches to supporting software-defined architectures. These approaches are not all fully developed into mature, straightforward implementations. Therefore, this document provides guidance in planning SDS deployments in an incremental and evolutionary way that optimizes value while avoiding potential operational disruptions or dead-end IT architecture investments.
SDS has been defined in different ways for different users and technology providers, but to fully realize the potential of this technology, true SDS implementations incorporate the following characteristics:
Programmable interfaces to support dynamic storage configuration and management
Support for block, file, and object storage interfaces
Dynamic service level configuration
Enterprise scale storage virtualization and pooling
Generic data storage and control components
SDS is a new storage architecture for a wide variety of data storage requirements based on a set of loosely coupled software and hardware components dynamically configurable to meet customers’ workload requirements. It is a model that encompasses traditional workloads (systems of record) and newer types of workloads (systems of engagement), and is optimized for interoperability across hardware and software platforms. SDS provides greater storage infrastructure flexibility to share resources while maintaining the required service levels and enabling customers to better use data for greater business insights.
SDS delivers software-based storage services to an SDI through the following capabilities:
Storage virtualization
Automated policy-driven administration for storage management functions
Analytics and optimization
Backup and copy management
Integration and API services
Security
Massive scale-out architecture
Cloud accessibility
Although point SDS solutions are available, it is important to recognize an enterprise-wide SDS implementation will not generally be realized by installing one product offering, and will not be software only. Software and hardware products and their specific features must be orchestrated to meet specific customer workload requirements in an enterprise. Most IT organizations want an evolutionary transition path into SDS and SDI to gain experience, avoid risk, and preserve existing infrastructure investments.
Loosely coupled means that each product brings its own advantages in terms of new features and functions without having to redesign all of the storage infrastructure to be able to integrate them.
Meet customer’s workload requirements presumes that documented or implied SLAs are established between the storage provider and the users/customers, and that metrics are collected and reports delivered to validate that SLAs are met.
SDS creates smarter interactions between workloads and storage by using the following techniques:
Exposing storage capabilities for the workloads to dynamically provision storage with the most suitable (architecturally fit for purpose with required service level capabilities) characteristics
Introducing new operations and concepts between workloads and storage to help storage better adapt to the needs of workloads (Applied Analytics)
Moving some of the storage functions closer to the workloads to use higher-level knowledge and lower cost (capability to provide storage functions at the server rather than rely solely on storage array features)
2.2.1 SDS supports emerging as well as traditional IT consumption models
The fundamental emerging market that SD addresses is the so-called digital consumption model, which is aligned to the digital economy. This paper outlines traditional consumption models to compare and contrast with new and emerging ones, and how the SD framework can provide more effective and efficient support. This is in the context of other consumption models, such as legacy dedicated, managed services, and newer and in many cases overlapping consumption models such as IaaS, SaaS, and PaaS cloud models.
The term consumption model was borrowed from economics to describe how resources are purchased and used by consumers. In this context, the relative perspective of the various types of consumers is key. Although this is not a new concept, it figures prominently in the basis for SDS as a means of addressing IT requirements that arise from the emerging digital economy consumption models where flexibility and rapid deployment (agility) are essential.
From glass house mainframes to departmental computing through client/server and most recently cloud computing, a central theme has been to optimize the costs of data processing. Is it ultimately more cost-effective (including monetized risk) for an organization to create its own data processing environment or to procure external capability?
Architectural criteria have undergone significant shifts and realignments from dedicated, customized, and consolidated to shared, commoditized, and distributed, depending on business requirements and technological capabilities, balanced with relative costs. Ultimately users have never been concerned with the underlying structure by which the information they consume is designed. Their concern is with content, format, ease of use, and accessibility.
SDS supports the SDI goal of technical agility in supporting new workloads across the cloud, mobile, social, and analytics infrastructure spaces with the necessary security for data storage by using these techniques:
Automation: Realization of autonomic data storage capabilities for deployment, provisioning, monitoring, reconfiguration, performance management, and capacity planning.
Virtualization: The near-universal abstraction of functionality across underlying components is the primary enabler of the software-defined model, making systems integration and configuration of infrastructure components possible by using software programming interfaces.
Programmatically administered: Storage is deployed, configured, and administered by using open programming standard APIs, enabling policy-based automation of infrastructure storage resources.
SDS enables new storage consumption models that, in some respects, resemble vending machines. Customers (data storage consumers) see the products (storage service catalog) that they want to buy, and decide which one meets their requirements (service levels). The customer then decides how much is needed, inserts the coins (chargeback/pay-per-use), and presses the dispense button.
The vending machine starts a series of actions (provisioning orchestration) to provide the customer with the chosen products. The release of the products is immediate (real-time). Beyond this analogy though, SDS analytically monitors the product usage and adjusts service level and capacity based on real-time application needs and available resources.
Figure 2-2 shows the new storage consumption model.
Figure 2-2 SDS control plane versus data plane vending machine analogy
Importantly, this hypothetical storage capacity vending system is able to measure (analytics and optimization) the available products (storage services) and when they are below a predefined threshold. It can then send alerts to be refilled with the required service level capacity. The access method (service catalog and orchestration) and the products (storage services) exposed by the vending-machine (SDS infrastructure) can be tailored to various customers.
New products can be added as required by business needs. The dispenser (SDS system) is responsible to ensure its products’ quality. Only the customers who have access to the vending-machine’s main panel can buy them (security).
2.2.2 Required SDS capabilities
Depending on the business scenario, SDS needs to support SDI with storage services characterized by some fundamental capabilities:
Optimal workload allocation: SDS must be able to support applications that deliver optimal performance and store data according to relative business importance. Therefore, SDS must be able to orchestrate the movement of data and workloads across storage service level tiers to achieve agreed-upon performance, availability, retention, and security. These requirements are typically defined in SLAs.
Data Retention and Archive: Robust SDS should also support archival capability for long-term data retention in line with business requirements to support regulatory requirements, as well as effective data rationalization (purge expired data).
Agility and Scalability: As mentioned, the ability to rapidly deploy IT support of new business initiatives lies at the core of SD framework. Therefore, SDS must ensure that storage services can scale according to business needs and be agile enough to respond within the requested time frame. This feature requires the following capabilities:
 – The ability to provide metrics about the storage infrastructure usage and to report on those metrics
 – Tiered capacity by service level, and allocate and reclaim capacity based on current business requirements
 – Fault tolerance as appropriate and rapid troubleshooting of the infrastructure
 – The ability to provision storage on demand, with support for various consumption models including cloud
Flexible Data Access: SDS must be able to provide the access methods and protocols that are required by business applications, enabling data sharing and multi-tenancy across data centers if needed.
2.2.3 SDS Functions
To support these capabilities, SDS-enabled components must include the following integrated, API-controlled functions when needed to support business application service levels:
Storage virtualization: Enables the creation of unique pools of storage capacity starting from heterogeneous disk storage arrays and devices. Advanced function capabilities like transparent data migration, tiering, thin-provisioning, compression, and local/remote copy services can be software-defined or use array capabilities, depending on the disk storage vendor and its native technology. These features can be leveraged and integrated by the orchestrator to provide the SDS capabilities.
Policy automation: Enables automation of storage provisioning with definition of storage policies that are correlated to automation workflows based on the features provided by the virtualized storage capacity, and the measurements and reports provided by the analytics tools.
Analytics and Optimization: Provides the metrics to measure the storage performances and the capacity usage, and the tools to report on them and compare them to the required service levels. Also ensures optimization of the storage capacity usage based on the collected metrics and policies aligned to the automation workflows. Metrics can include chargeback data as required.
Availability, Backup, and Copy Management: Physical and logical integrity of data within the storage infrastructure that provides backup/restore capability, and local/remote copies for test or disaster recovery purposes.
Integration and API services: All of the features that are provided by SDS must be integrated, not only internally but also with the other functions available in the SDI and with the business applications. The most common way of doing that is to communicate with the other infrastructure services by using specific interfaces called application programming interfaces (APIs).
Security: Enables and ensures secure access to the data by authorized persons. Common features to ensure this functionality are the encryption of data at rest, SAN zoning, LUN masking, Access Control Lists, Role-based Authentication Control (RBAC), and the usage of enhanced attributes in the object storage environment that enable a per-object access granularity.
Massive scale-out architecture: Support for big data and analytics is a key driver for SDS, with the capability for rapid deployment (and recovery/redeployment) of massive storage capacity for these and other Systems of Insight.
Cloud accessibility: SDS must be able to support cloud implementation, connect to the cloud, and eventually use storage cloud services from other providers. SDS can offer storage cloud services to customers who want to access them in a “cloud” fashion. It should also expose a storage service catalog and support storage self-provisioning to business requirements.
2.3 SDS Data-access protocols
It is fundamental to know the way that data will be accessed, and particularly the protocol that the customer wants or needs to use. This section covers the main standard data access protocols that can be used in an SDS environment, although this is not a complete list.
Figure 2-3 shows how in traditional IT environments there were essentially two ways of accessing data:
The Block I/O protocol that is the closest to the physical HW
The File I/O Object Storage access methods
Figure 2-3 Data types in an SDS environment
2.3.1 Block I/O
The term block I/O is traditionally used to represent all local disk I/O on a computer. This term is also used to describe I/O over the Fibre Channel SAN, because the I/O in this environment is driven by the operating system. This type of I/O occurs at a logical layer below the layer of a file system. The operating system uses file system drivers that manage the file system. It is these drivers that manipulate the file system at the block I/O or disk level.
Generally, applications only use file I/O commands to manipulate bytes of data. They issue commands to the operating system to access and manipulate their files. The operating system then translates those I/O requests into lower-level commands that use block I/O to manipulate the file system and the application data file.
2.3.2 File I/O
In order for an application to access a file, it needs to know the name of the file and its location (file path). If the name of the file is not predetermined, a user can browse a location or storage container to locate the file or create one.
An application that performs file I/O is generally expected to know how data is organized within a file, or has access to a description that provides that information. The descriptor can also be another file, such as an XML file, or be embedded within the data file at a predetermined location. This is typical of documents that are created in Microsoft Word, Microsoft Excel, and many other document types, as well as in databases. It is also possible for the file to contain only data, and the descriptor is stored within the application. In any case, if an application can get access to its data correctly, it is then able to process that data.
2.3.3 Object Storage
The most recent approach is called Object Storage, which is a way of accessing data seen as objects described by a richer set of metadata when compared to a normal File System implementation. This way of accessing data can be used with normal applications, but it is especially beneficial when dealing with large quantities of unstructured data. An object consists of the following parts:
User Data Data coming from the user application.
 
Metadata Information about the data that includes system properties and custom, user-defined attributes, which improves data sharing and search among applications.
Object ID The unique identification code of the Object.
Objects are accessed using HTTP protocols, including Amazon S3 or OpenStack Swift protocols.
2.4 SDS reference architecture
SDS is one the three main components of the new SDI architecture. The main characteristic of the SDS reference architecture, similar to the software-defined networking (SDN) environment, is the separation of the storage functions into two main layers:
SDS control plane: The control plane is a software layer that manages the virtualized storage resources. It provides the high-level functions that are needed by the customer to run the business workload and enable optimized, flexible, scalable, and rapid provisioning storage infrastructure capacity. These capabilities span functions like policy automation, analytics and optimization, backup and copy management, security, and integration with the API services, including other cloud provider services.
SDS data plane: The data plane encompasses the infrastructure where data is processed. It consists of all basic storage management functions, such as virtualization, RAID protection, tiering, copy services (remote, local, synchronous, asynchronous, and point-in-time), encryption, compression, and data deduplication that can be requested by the control plane. The data plane is the interface to the hardware infrastructure where the data is stored. It provides a complete range of data access possibilities and spans traditional access methods, such as block I/O (for example, iSCSI), File I/O (NFS, SMB, or Hadoop Distributed File System (HFDS)), and object-storage.
Figure 2-4 shows the SDS capabilities of the control plane and data plane.
Figure 2-4 SDS capabilities through the control plane and data plane
SDS provides the agility, control, and efficiency needed to meet rapidly changing business requirements by dynamically optimizing infrastructure capabilities to application service level requirements. New business requirements are the driving force for the emergence of this new IT storage infrastructure architecture.
SDS is built with standardized software APIs to provide organizations the underlying capabilities to support applications aligned to the digital consumption model of the digital economy. It also provides a compelling value proposition for optimizing traditional workloads within this consolidated architectural construct.
2.5 Ransomware Considerations
Ransomware has reached high levels of attention due to the dramatic damages that it can cause by locking away user data. Security is among the various functions that SDS has to provide, and SDS can play a critical role in preventing damages caused by ransomware in particular. Because of encryption, auditing, and virus detection, abnormal accesses to data can be detected and if needed interrupted. Furthermore, copies of data can be protected through “air gap” measures that physically separate them, such as exporting off-site copies that can’t be accessed online anymore.
 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset