Cassandra trade-off models and assessment
Facebook operates five data centres including 180000+ servers. Everyday data increase ~ 500 TB; Number of everyday user accesses ~ 1 million.
Very large systems like Facebook will partition at some point due to node failures (e.g. Google reports MTTF=9 hours in its 1800’ node cluster), packets loss, network disconnections, congestions and other accidents. Thus, high availability and performance requirements demand the use of data replication.
However, necessity to run several data replicas increases the overall power, consumed by a system. The more replicas we run, the better availability can be achieved at higher energy expenses. Moreover, the more replicas we invoke simultaneously to increase data consistency, the more energy is spent in addition for parallel request processing and transferring larger amount of data over the network.
Developers of large-scale Internet-systems and distributed data storages need to understand energy overheads and have to care about additional delays and their uncertainty. Similarly, providing consistency among replicas is another major issue in distributed systems.
Some of existing BigData solutions like Cassandra NoSQL, originated by Facebook, provide a possibility to choose a desired consistency level. However, there are no known tools predicting system latency depending on the provided consistency level, number of replicas, and other essential systems parameters.
Project concept and architecture
The purpose of the project is to develop a framework allowing developers of large-scale distributed systems to interplay between Energy Consumption, Latency, Availability, Durability and Consistency of distributed data storages like Cassandra NoSQL.
- Cassandra trade-off models and assessment tools. We will provide a set of tools allowing to quantitatively estimate and predict non-functional Cassandra properties (consistency, latency, energy consumption, availability, durability, etc.) by identifying fundamental trade-offs between them. Besides, we will provide a tool support for estimation of optimal settings of Cassandra cluster (replication factor, consistency level, timeout settings, caching strategy, etc.) taking into account desired system properties (e.g. response time, energy consumption, etc.).
- enhanced create/read/write/update APIs; we will enhance Cassandra API with adding better user control upon system properties while executing create/read/write/update requests. New strategies for data read/write/update will be introduced to minimise system response time.
Target audiences
- Companies operating and developing distributed Internet systems and data storages.
- Cassandra community.
- Facebook customers.
Sources of revenue
- Enhancement of flexibility, availability, performance, energy efficiency and other system properties of Facebook services.
- Providing commercial tools for Cassandra DB management and control.
Implementation stage
The research group runs extensive experiments investigating trade-offs between different system properties. Some experimental and theoretical results have been already reported in:
- Gorbenko, A., Tarasyuk, O., Romanovsky “Fault Tolerant Internet Computing: Benchmarking and modelling trade-offs between availability, latency and consistency”, Elsevier Journal of Network and Computer Applications. – 2019. – No146. – P. 1–14.
- A. Gorbenko, A. Romanovsky, “Timeouting Internet Services,” IEEE Security & Privacy, vol. 11(2), 2013
- O. Tarasyuk, A. Gorbenko, A. Romanovsky, “The Impact of Consistency on System Latency in Fault Tolerant Internet Computing,” Distributed Applications and Interoperable Systems, LNCS, vol. 9038, 2015.