I am a researcher, employed by Inria, at the Boreal joint project-team
(LIRMM, Inria, University of Montpellier, CNRS) in Montpellier, France.
My research focuses on the theory of database query optimization.
Contact.
nofar.carmeli at inria.fr
(!) I no longer have access to my old email address at cs.technion.ac.il.
My research interests revolve around the theory of data management, often using fine-grained complexity and enumeration complexity. My main line of research aspires to determine, for any given query that the user wants to compute over a database, what is an algorithm that runs with optimal guarantees. So far, I focused mostly on identifying the query structures that can be answered with near-optimal time guarantees. For a full list of publications, see Google Scholar or dblp.
An array holding the sorted list of query answers can be useful in many scenarios, even when we do not need all answers (e.g., for finding the median or creating a boxplot), but such an array can be very big, and creating it may take a lot of time. Can we simulate such an array more efficiently than materializing it? We call the task of simulating such an array "direct access".
Dichotomies for conjunctive queries. When can we simulate an array containing the answers with only linear preprocessing and logarithmic overhead? We looked into conjunctive queries, when the answers are required to be sorted lexicographically or by sum of weights. Published as "Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries".(1)
Arxiv PODS 21 TODS 23 TalkOptimal preprocessing for join queries. How much preprocessing is required to achieve logarithmic overhead? We looked into full conjunctive queries, when the answers are required to be sorted lexicographically. Published as "Tight Fine-Grained Bounds for Direct Access on Join Queries".(1)
Arxiv PODS 22 TODS 24 TalkQuantile Queries. Simulating the sorted array of answers can only be done efficiently for queries of some structures. However, sometimes we only need to read a small number of answers from this array. We propose an efficient algorithm for a single access to the answers array, and when this is provably not possible - an approximation algorithm. We looked into full conjunctive queries, when the answers are required to be sorted lexicographically, by sum of weights, or by the minimum of attributes. Published as "Efficient Computation of Quantiles over Joins".
Arxiv PODS 23Conjunctive queries with aggregation. We looked into conjunctive queries with grouping and aggregation, when the answers are required to be sorted lexicographically. When can we simulate an array containing the answers with only linear preprocessing and logarithmic overhead? Published as "Direct Access for Answers to Conjunctive Queries with Aggregation".(1)
Arxiv ICDT 24 TalkJoin queries with negation. We looked into join queries where some atoms are negated, when the answers are required to be sorted lexicographically. We showed that the preprocessing needed in such case is the same as for the worst (positive) join query containing all positive atoms and some negated atoms. Published as "Direct Access for Conjunctive Queries with Negations".(1)
Arxiv(1) Invited for a special issue of the journal on selected publications from the conference.
Suppose we want to efficiently list all query answers. If the queries we consider are Conjunctive Queries (CQs) without self joins over general databases, we know exactly for which queries this can be done with ideal time guarantees (linear preprocessing and constant delay). We look into how this classification extends to more general settings.
Answering queries with self joins. Known hardness results apply only for queries without self joins. We study how self joins affect the complexity of listing all query answers and identify several cases where they turn queries with an intractable structure into tractable queries. Published as "Conjunctive Queries With Self-Joins, Towards a Fine-Grained Enumeration Complexity Analysis".
Arxiv PODS 23 TalkAnswering queries in random order. We would like to efficiently list all query answers in uniformly random order, so every prefix of the answers is statistically meaningful. When can this be achieved with only linear preprocessing and polylogarithmic delay? We propose an algorithm that uses direct access to the query answers, and we inspect when conjunctive queries and unions of conjunctive queries admit efficient algorithms for these two tasks. Published as "Answering (Unions of) Conjunctive Queries using Random Access and Random-Order Enumeration".
Arxiv PODS 20 TODS 22 SlidesAnswering unions of conjunctive queries. When can we list the answers to unions of conjunctive queries with optimal time guarantees (linear preprocessing and constant delay)? Among our results, we show that the tractable cases are not limited to unions of tractable CQs, and in fact, some unions consisting of only intractable CQs are tractable too. Published as "Enumeration Complexity of Unions of Conjunctive Queries".(1)(2)
Arxiv PODS 19 TODS 21 TalkWe show that unbalanced triangle detection is a closely related problem, and assuming its hardness, the previous work captures all tractable unions of 2 self-join-free CQs.
ArxivAnswering CQs with functional dependencies. Very often databases contain dependencies among attributes, and some queries classified as hard over general databases become tractable if we are allowed to exploit these dependencies. We inspect in which cases the dependencies are enough to make conjunctive queries solvable with optimal time guarantees (linear preprocessing and constant delay). Published as "Enumeration Complexity of Conjunctive Queries with Functional Dependencies".(1)
Arxiv ICDT 18 TOCS 20 Corrigendum Slides
(1) Invited for a special issue of the journal on selected publications from the conference.
(2) PODS 2019 Best Student Paper.
While the previous sections treated cases where things can be done very efficiently, things are not always that simple. This section proposes solutions for when databases contain uncertainty (in the form of noise that needs to be cleaned or probabilistic databases), when data does not appear in a neat structured form, or when the queries we want to answer cannot be quickly tweaked into a tractable acyclic form.
Repairing databases using soft functional dependencies. Assume we have a database that contains noise in the form of redundant facts, and we have constraints in the form of Functional Dependencies (FDs) that are interpreted as soft: they do not necessarily always hold. Instead, every violation and every removal of a fact from the database are penalized. When can we efficiently find an optimal subset of the database? Published as "Database Repairing with Soft Functional Dependencies".(1)
Arxiv ICDT 21 TODS 24 TalkRepresenting infinite probabilistic databases. Infinite Probabilistic Databases (PDBs) model uncertainty when the possible values are not restricted to a finite domain. In the finite case, tuple-independent representations have many advantages, but in the infinite, they cannot always be used. When can they represent the desired infinite PDB? Published as "Tuple-Independent Representations of Infinite Probabilistic Databases".(1)
Arxiv PODS 21 TalkSummarizing reviews with structural explanations. Given a large number of reviews expressing opinions regarding a product, we want to automatically build a graph describing the (deduplicated) opinions expressed in the reviews, together with the explanation relationship (e.g. "walking distance to the beach" explains "great location"). Published as "ExtremeReader: An interactive explorer for customizable and explainable review summarization" and "Constructing Explainable Opinion Graphs from Reviews".
Arxiv WWW 20 WWW 21 Project PageFinding tree decompositions. Tree decompositions can be used to optimize join queries, but finding the best decomposition is known to be hard. We propose an algorithm that enumerates all the proper tree decompositions (or minimal triangulations) of a graph in incremental polynomial time. This can be used as an anytime algorithm to improve on any method for finding a single tree decomposition (or triangulation). Published as "Efficiently Enumerating Minimal Triangulations".
Arxiv PODS 17 DAM 20 Code Talk(1) Invited for a special issue of ACM Transaction on Database Systems (TODS) on selected publications from the conference.
The complexity of enumerating (i.e., listing) query answers over a database depends on the query structure (acyclic is easy). However, a query with a naturally difficult structure can sometimes be adjusted into an easy form by using commonly available additional information (hence the acyclicity is implicit). My work aims at identifying how fast we can solve any given query. This is done either by providing an efficient algorithm for answering the query or by proving that an algorithm with a given complexity cannot be achieved.
Thesis TalkDate | Length | Topics | Venue | |
---|---|---|---|---|
Video | 15/11/23 | 28:45 | Direct access for conjunctive queries with aggregation | Logic and Algebra for Query Evaluation workshop, Simons Institute, Berkeley |
Part 1 Part 2 | 25/09/23 | 1:15:00 | Enuemrating answers to join queries, CQs and UCQs | Fine-Grained Complexity, Logic, and Query Evaluation workshop, Simons Institute, Berkeley |
Video | 29/03/22 | 47:00 | Invited tutorial: answering UCQs with ideal time guarantees (enumeration and related problems) | International Conference on Database Theory (ICDT) |
Video | 22/06/21 | 12:29 | Direct access to CQ answers | The Symposium on Principles of Database Systems (PODS) |
Video | 02/03/21 | 48:35 | Enumerating tree decompositions, enumerating answers to CQs with FDs, enumerating answers to UCQs, and random-order enumeration. | The seminar of the Formal Methods group at LaBRI, Bordeaux |
Video Slides | 02/01/20 | 16:02 | Random-order enumeration and UCQs | TAU Theory Fest - Fine Grained Complexity workshop, Tel Aviv |
Video | 01/07/19 | 19:29 | Enumerating answers to UCQs | The Symposium on Principles of Database Systems (PODS), Amsterdam |
Video | 16/05/17 | 20:00 | Enumerating tree decompositions | The Symposium on Principles of Database Systems (PODS), Chicago |
STACS PC member (2025), PODS proceedings chair (2025), WEPA PC member (2024), SWERC programming contest judge (2023-2024, 2020-2021), PODS PC member (2024, 2023), ICDT test-of-time award committee member (2024), BDA PC member (2023), ICDT PC member (2022), PODS publicity chair (2021), Dagstuhl seminar collector (2019).
Technion, 2015-2020
Thesis: The Power of Implicit Acyclicity in the Enumeration Complexity of Database Queries. Advised by Prof. Benny Kimelfeld at Technion Data & Knowledge Laboratory.
2019-2020 Google PhD Fellowship.
2019 Summer research internship working on the Extreme Reading project with Megagon Labs.
2019 Visiting researcher at the Factorised Databases group at University of Oxford.
2018-2019 Jacobs Excellence Scholarship.
2016 Summer engineering internship with the Cultural Institute front-end team at Google Paris.
2015 Summer engineering internship with the Live Results back-end team at Google Haifa.
Faculty of Computer Science, Technion, 2013-2019
Combinatorics for CS: lecturer (summer18, summer17), TA in charge (winter16, winter15, spring15), TA (spring16, spring13).
Workshop in Competitive Programming: TA in charge (spring19, spring18), TA (spring17, spring16).
Logic for CS: TA (winter19).
Data Structures 1: TA (spring14).
Technion, 2011-2015
B.Sc. in Computer Science. Graduated summa cum laude.
Member of Lapidim excellence program.
2014 Built an Android app for EyeControl to helps locked-in individuals communicate using eye movements with Hello World NPO & Prize4Life.
2013 Built a prototype of real-time hand detection applied to a Pong game on a Samsung Beam cellphone with GIP lab.
2013 Exchange student at University of Toronto.
Based on a website template by w3.css