When sequential code meets replicated data: programming language support to simplify the development of correct replicated data types

Research output: ThesisPhD Thesis

36 Downloads (Pure)


Many applications today run atop a geo-distributed system, that is, a system that replicates data over several machines (called replicas) located a strategic positions around the globe. Such systems typically ensure good performance and high availability by allowing replicas to be updated independently. However, replicas may execute concurrent updates in different orders, which can lead to conflicts. Programmers must foresee and handle these conflicts, which is extremely difficult, even for experts.
To avoid manual conflict resolution, researchers proposed replicated variants for sequential data types, called Replicated Data Types (RDTs). RDTs resemble common data structures but internally implement mechanisms to detect and solve conflicts. A data type may have several RDTs exhibiting different semantics depending on how conflicts are solved.
n this dissertation, we argue that real-world applications require custom RDTs that are tailored to the needs of the application. However,existing RDTs cannot be customized as they exhibit hardcoded conflict resolution semantics. This leaves programmers no choice but to build their own RDTs or modify existing RDTs using ad-hoc conflict detection and resolution mechanisms. This is known to be error-prone and results in brittle systems. Moreover, these new or modified RDTs are seldom verified due to the complexity of software verification, and thus bugs are likely to go unnoticed.
To address the aforementioned issues, this dissertation explores programming language abstractions and techniques to support the design,implementation, and verification of geo-distributed systems using RDTs.This led to a solution that is twofold. First, we propose a general approach for replicating existing sequential data types instead of building dedicated RDTs for every use case. Our approach lets programmers augment sequential data types with a declarative specification of the desired conflict resolution semantics expressed through application invariants. We statically analyze the data type to detect all conflicts and find solutions that adhere to the desired semantics. At runtime, our novel replication protocol efficiently serializes operations such that replicas converge to the same state and maintain application-specific invariants with minimal co-ordination. To validate our approach, we successfully built an extensive portfolio of RDTs and several real-world applications that exhibit performance similar to state-of-the-art approaches.
Second, we devise VeriFx, a novel high-level programming language to implement and verify ad-hoc RDTs. Programmers implement custom RDTs in VeriFx and automatically get a proof of correctness or a counterexample if the implementation is wrong. Verified RDTs can be transpiled to mainstream languages in order to deploy them in an existing system. To demonstrate its effectiveness, we used VeriFx to implement and verify a portfolio comprising more than 40 RDTs, distilled from the literature on Conflict-free Replicated Data Types (CRDTs) and Operational Transformation (OT) and from industrial databases.
Our results show two important insights. First, it is possible to build efficient RDTs with application-specific concurrency semantics, without having to manually handle conflicts. Second, the implementation and verification of RDTs can be unified in a high-level language such that software engineers without deep knowledge of verification can nevertheless implement RDTs and verify them automatically.
Original languageEnglish
Awarding Institution
  • Vrije Universiteit Brussel
  • Gonzalez Boix, Elisa, Supervisor
Award date19 Dec 2022
Publication statusPublished - 2022

Cite this