Project Details
Description
Big data and scalability have been buzzwords in recent years. The trend is to use a set of clusters to
perform a computation over massive amounts of data. These new large-scale systems call for
innovative ways to understand the complexity of its computation in which case we can often ignore
the I/O factor by assuming that the data is available in the local memories of the clusters performing
the computation. At the same time, the complexity of a distributed computation often incurs other
factors that are not relevant to the centralized setting such as the communication cost and the
number of rounds of communication. From a database perspective, it is essential to understand how
to evaluate queries over such large-scale systems efficiently. Out of the classical set of operators,
join is considered the most challenging because of its unique nature that requires data from different
tables; and in a distributed setting, it is more likely to have such data located on different clusters,
so the data needs to be sent from one server to another. Therefore, understanding the complexity of
join in a distributed setting in terms of the communication cost is essential for having efficient largescale systems. In this proposal, we are interested in developing distributed join algorithms that
minimize the communication cost of the evaluation while keeping the number of communication
rounds constant. In particular, we are interested in providing optimality guarantees for our
algorithms
perform a computation over massive amounts of data. These new large-scale systems call for
innovative ways to understand the complexity of its computation in which case we can often ignore
the I/O factor by assuming that the data is available in the local memories of the clusters performing
the computation. At the same time, the complexity of a distributed computation often incurs other
factors that are not relevant to the centralized setting such as the communication cost and the
number of rounds of communication. From a database perspective, it is essential to understand how
to evaluate queries over such large-scale systems efficiently. Out of the classical set of operators,
join is considered the most challenging because of its unique nature that requires data from different
tables; and in a distributed setting, it is more likely to have such data located on different clusters,
so the data needs to be sent from one server to another. Therefore, understanding the complexity of
join in a distributed setting in terms of the communication cost is essential for having efficient largescale systems. In this proposal, we are interested in developing distributed join algorithms that
minimize the communication cost of the evaluation while keeping the number of communication
rounds constant. In particular, we are interested in providing optimality guarantees for our
algorithms
| Acronym | FWOTM1224 |
|---|---|
| Status | Active |
| Effective start/end date | 1/10/24 → 30/09/27 |
Keywords
- Query processing with optimality guarantees
- Database management
- Distribution schemes for parallel systems
Flemish discipline codes in use since 2023
- Database theory
- Database systems and architectures
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.
Research output
- 1 Article
-
Expressiveness within Sequence Datalog
Aamer, H., Hidders, J., Paredaens, J. & Van den Bussche, J., 30 Jun 2025, In: ACM Transactions on Database Systems. 50, 3, p. 1-38 38 p., 12.Research output: Contribution to journal › Article › peer-review
Open AccessFile
-
Distributed Multi-way Joins: Worst-case Optimality
Aamer, H. (Speaker) & Ketsman, B. (Contributor)
26 Nov 2025Activity: Talk or presentation › Talk or presentation at a workshop/seminar
File -
PAC: Computing Join Queries with Semi-Covers
Aamer, H. (Speaker) & Ketsman, B. (Contributor)
25 Mar 2025Activity: Talk or presentation › Talk or presentation at a conference
File