|Research Area:||Distributed storage|
|In cooperation with||
|Proposed start date:||2015-01-19||Proposed end date:||2017-12-29|
|Funded by:||European Union|
The main objective is to create IOStack: a Software-defined Storage toolkit for Big Data on top of the OpenStack platform. IOStack will enable efficient execution of virtualized analytics applications over virtualized storage resources thanks to flexible, automated, and low cost data management models based on software-defined storage (SDS).
In order to achieve this general objective, IOStack also has the following objectives:
G-1 Storage and compute disaggregation and virtualization. Virtualizing data analytics to reduce costs implies disaggregation of existing hardware resources. This requires the creation of a virtual model for compute, storage and networking that allows orchestration tools to manage resources in an efficient manner. For the orchestration layer it is essential to provide policy-based provisioning tools so that the provisioning of virtual components for the analytics platform is made according to the set of QoS policies.
G-2 SDS Services for Analytics. The objective is to define, design, and build a stack of SDS data services enabling virtualized analytics with improved performance and usability. Among these services we include native object store analytics that will allow running analytics close to the data without taxing initial migration, data reduction services that will be optimized for the special requirements posed by virtualized analytics platforms, and specialized persistent caching mechanisms, advanced prefetching, and data placement.
G-3 Orchestration and deployment of big data analytics services. The objective is to design and build efficient deployment strategies for virtualized analytic-as-a-service instances (both ephemeral and permanent). In particular, the focus of this work is on data-intensive scalable computing (DISC) systems such as Apache Hadoop and Apache Spark, which enable users to define both batch and latency-sensitive analytics. This objective includes the design of scalable algorithms that strive at optimizing a service-wide objective function (e.g., optimize performance, minimize cost, etc...) under heterogeneous workloads.
Finally, we will create an experimental prototype of the SDS toolkit for Big Data on top of Open-Stack. To this end, IOStack will mainly contribute to widely used OpenStack projects including OpenStack Swift, OpenStack Nova, OpenStack Cinder and OpenStack Sahara. We will leverage the massive user communities of these open source projects to disseminate the results of the project and to validate the platform results. In particular, we outline three main software objectives:
S-1 Create an open SDS toolkit and APIs targeting virtualized data analytics in OpenStack. By Leveraging MPSTOR Orchestra SDS experience, we will devise a set of novel SDS APIs providing full control of the logical and physical infrastructure required to launch a scalable data analytics platform. This implies contributions and extensions to OpenStack Cinder for managing virtual storage, OpenStack Nova for adaptations of the compute scheduler, and OpenStack Swift for virtualizing object storage. This objective is directly related to main objective [G-1].
S-2 Implement SDS services for analytics on top of [S-1] standard SDS APIs. We will demonstrate extensions to OpenStack Swift in order to offer true native object store analytics. In particular, we will design and implement a specialized SDS service able to manage computation close to the actual data thanks to storlets embedded in the object store. In addition, end-users will be able to define QoS policies to instruct the SDS controller to deploy such data services and optimize data flows in analytic experiments based on data reduction and caching techniques. This objective is directly related to main objective [G-2].
S-3 Implement efficient deployment strategies on top of OpenStack Sahara. This objective clearly leverages the entire infrastructure created in the aforementioned objectives (SDS APIs and data services). We will create a novel monitoring software service to understand and leverage both the dynamics in service demand and in the underlying Cloud system and we will contribute to the OpenStack project, through novel “plug-in” software components for Sahara. Finally, we will also provide system deployment tools dealing with the instantiation of deployment strategies on top of OpenStack Sahara.