I am a researcher, employed by Inria, at the BOREAL joint project-team (LIRMM, Inria, University of Montpellier, CNRS) in Montpellier, France.
My research focuses on theoretical aspects of database query optimization.
nofar.carmeli at inria.fr
(!) I no longer have access to my old email address at cs.technion.ac.il.
|12/11/23||Logic and Algebra for Query Evaluation Workshop at the Simons Institute for the Theory of Computing||Berkeley, California|
|15/01/24||Dagstuhl Seminar 24032 on Representation, Provenance, and Explanations in Database Theory and Logic||Dagstuhl, Germany|
My research interests revolve around the theory of data management and enumeration complexity. As the data we wish to process becomes bigger, there is an increasing interest in the fine-grained complexity analysis of query answering problems over databases. My main line of research aspires to determine, for any given query that the user wants to compute, what is an algorithm that runs with optimal guarantees. So far, I focused mostly on identifying the query structures that can be answered with near-optimal time guarantees. For a full list of publications, see Google Scholar or dblp.
An array holding the sorted list of query answers can be useful in many scenarios, even when we do not need all answers (e.g., for finding the median or creating a boxplot), but such an array can be very big, and creating it may take a lot of time. Can we simulate such an array much more efficiently than materializing it?
Dichotomies for Conjunctive Queries (CQs). When can we simulate such an array with only linear preprocessing and polylogarithmic overhead? We looked into conjunctive queries with lexicographic or sum of weights orders. Published as "Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries".(1)Arxiv PODS 21 TODS 23 Talk
Preprocessing for Join Queries. How much preprocessing do we need to achieve polylogarithmic overhead? We looked into full conjunctive queries with lexicographic orders. Published as "Tight Fine-Grained Bounds for Direct Access on Join Queries".Arxiv PODS 22 Talk
Quantile Queries. Simulating an array efficiently only works for queries of some structures. For other queries, we propose an efficient algorithm for a single access to the array, and when this is provably not possible - an approximation algorithm. Published as "Efficient Computation of Quantiles over Joins".Arxiv PODS 23
CQs with Aggregation. We look into when such an array simulation can be done with only linear preprocessing for CQs with grouping and aggregation. To appear in ICDT 2024 as "Direct Access for Answers to Conjunctive Queries with Aggregation".Arxiv Talk
Suppose we want to efficiently list all query answers. If the queries we consider are Conjunctive Queries (CQs) without self-joins over general databases, we know exactly for which queries this can be done with ideal time guarantees (linear preprocessing and constant delay). We look into how this classification extends to more general settings.
Answering queries with self joins. Known hardness results apply only for queries without self-joins. We study how self-joins affect the complexity of listing all query answers and identify several cases where they turn queries with an intractable structure into tractable queries. Published as "Conjunctive Queries With Self-Joins, Towards a Fine-Grained Enumeration Complexity Analysis".Arxiv PODS 23 Talk
Enumerating answers in random order. We would like to efficiently list all query answers in uniformly random order, so every prefix of the answers is statistically meaningful. When can this be achieved with only linear preprocessing and polylogarithmic delay? We propose an algorithm that uses direct access to the query answers, and we inspect when conjunctive queries and unions of conjunctive queries admit efficient algorithms for these two tasks. Published as "Answering (Unions of) Conjunctive Queries using Random Access and Random-Order Enumeration".Arxiv PODS 20 TODS 22 Slides
Answering Unions of Conjunctive Queries (UCQs). When can we list the answers to UCQs with optimal time guarantees (linear preprocessing and constant delay)? Among our results, we show that the tractable cases are not limited to unions of tractable CQs, and in fact, some unions consisting of only intractable CQs are tractable too. Published as "Enumeration Complexity of Unions of Conjunctive Queries".(1)(2)Arxiv PODS 19 TODS 21 Talk
We show that unbalanced triangle detection is a closely related problem, and assuming its hardness, the previous work captures all tractable unions of 2 self-join-free CQs.Arxiv
Answering CQs with Functional Dependencies (FDs). Very often databases contain dependencies among attributes, and some queries classified as hard over general databases become tractable if we are allowed to exploit these dependencies. We inspect in which cases the dependencies are enough to make conjunctive queries solvable with optimal time guarantees (linear preprocessing and constant delay). Published as "Enumeration Complexity of Conjunctive Queries with Functional Dependencies".(1)Arxiv ICDT 18 TOCS 20 Corrigendum Slides
(1) Invited for a special issue of the journal on selected publications from the conference.
(2) PODS 2019 Best Student Paper.
While the previous section focused on cases where things can be done very efficiently, life is not always that simple. This section proposes solutions for when databases contain uncertainty (in the form of noise that need to be cleaned or probabilistic databases), when data does not appear in a neat structured form, or when the queries we want to answer cannot be quickly tweaked into a tractable acyclic form.
Repairing using soft FDs. Assume we have a database that contains noise in the form of redundant facts, and we have constraints in the form of Functional Dependencies (FDs) that are interpreted as soft: they do not necessarily always hold. Instead, every violation and every removal of a fact from the database are penalized. When can we efficiently find an optimal subset of the database? Published as "Database Repairing with Soft Functional Dependencies".(1)Arxiv ICDT 21 Talk
Representing infinite PDBs. Infinite Probabilistic DataBases (PDBs) model uncertainty when the possible values are not restricted to a finite domain. In the finite case, tuple-independent representations have many advantages, but in the infinite, they cannot always be used. When can they represent the desired infinite PDB? Published as "Tuple-Independent Representations of Infinite Probabilistic Databases".(1)Arxiv PODS 21 Talk
Summarizing reviews with structural explanations. Given a large number of reviews expressing opinions regarding a product, we want to automatically build a graph describing the (deduplicated) opinions expressed in the reviews, together with the explanation relationship (e.g. "walking distance to the beach" explains "great location"). Published as "ExtremeReader: An interactive explorer for customizable and explainable review summarization" and "Constructing Explainable Opinion Graphs from Reviews".Arxiv WWW 20 WWW 21 Project Page
Finding Tree Decompositions (TDs). Tree decompositions can be used to optimize join queries, but finding the best decomposition is known to be hard. We propose an algorithm that enumerates all the proper tree decompositions (or minimal triangulations) of a graph in incremental polynomial time. This can be used as an anytime algorithm to improve on any method for finding a single tree decomposition (or triangulation). Published as "Efficiently Enumerating Minimal Triangulations".Arxiv PODS 17 DAM 20 Code Talk
(1) Invited for a special issue of ACM Transaction on Database Systems (TODS) on selected publications from the conference.
The complexity of listing (= enumerating) query answers over a database depends on the query structure (acyclic is easy). However, a query with a naturally difficult structure can sometimes be adjusted into an easy form by using commonly available additional information (hence the acyclicity is implicit). My work aims at identifying how fast we can solve any given query. This is done either by providing an efficient algorithm for answering the query or by proving that an algorithm with a given complexity cannot be achieved.Thesis Talk
|Video||15/11/23||28:45||Direct Access for Conjunctive Queries with Aggregation||Logic and Algebra for Query Evaluation workshop, Simons Institute, Berkeley|
|Part 1 Part 2||25/09/23||1:15:00||Enuemrating answers to join queries, CQs and UCQs||Fine-Grained Complexity, Logic, and Query Evaluation workshop, Simons Institute, Berkeley|
|Video||29/03/22||47:00||Invited tutorial: answering UCQs with ideal time guarantees (enumeration and related problems)||International Conference on Database Theory (ICDT)|
|Video||22/06/21||12:29||Ranked access to CQ answers||The Symposium on Principles of Database Systems (PODS)|
|Video||02/03/21||48:35||TDs, FDs, UCQs, random order||The seminar of the Formal Methods group at LaBRI, Bordeaux|
|Video Slides||02/01/20||16:02||Random-order enumeration and UCQs||TAU Theory Fest - Fine Grained Complexity workshop, Tel Aviv|
|Video||01/07/19||19:29||Enumerating answers to UCQs||The Symposium on Principles of Database Systems (PODS), Amsterdam|
|Video||16/05/17||20:00||Enumerating Tree Decompositions||The Symposium on Principles of Database Systems (PODS), Chicago|
Combinatorics. This is the first class I taught, two years into my undergraduate studies, and my favorite "standard" class to teach. For me, the satisfaction in teaching comes from showing students how to think more effectively, and I enjoy the type of thinking that is at the core of this class. This class is taken by up to 350 students in normal semesters and 50 students in summer semesters. I received the Permanent Excellent Faculty Instructor Award for teaching this class in spring 2015 and winter 2016.
Competitive Programming. Workshop in Competitive Programming is a very unique class where students gain experience solving problems quickly in teams of three using short programs. The problems sometimes require ad-hoc solutions, sometimes use known paradigms (such as dynamic programming, divide and conquer, clever complete search, or a greedy approach), and sometimes incorporate known algorithms involving graphs, strings, maths, computational geometry, or any topic that can be considered computer science. Teamwork is also a big component of this class as each team of three students works together on one computer to try to solve up to 10 problems in 5 hours. To this aim, it is important to identify bottlenecks, prioritize the work, and communicate effectively with the other teammates. This is where students often get hands-on experience with topics they had only learned theoretically before. A competition at the end of the semester determines which team represents the Technion in the following Southwestern Europe Regional Contest (SWERC) of the International Collegiate Programming Contest (ACM-ICPC). I took this class myself as a student, and this was undoubtedly the best class of my degree! My team then competed in SWERC 2014 in Porto. As a TA, I pushed for expanding this class significantly to equip the students with the tools and tips we gained with experience. We added an hour of teaching prior to the weekly two-hour practices, and built lessons covering what we thought are the most useful parts in each of the aspects mentioned above.
Based on a website template by w3.css