I am a post-doctoral researcher at the Value from Data (Valda) team at École Normale Supérieure (ENS) Paris. My research focuses on theoretical aspects of database query optimization. On December 2020, I completed a PhD advised by Prof. Benny Kimelfeld at the Technion Data & Knowledge Laboratory.
Contact. Nofar.Carmeli at ens.fr
My research interests revolve around the theory of data management and enumeration complexity. For a full list of publications, see Google Scholar or dblp.
As the data we wish to process becomes bigger, there is an increasing interest in the fine-grained analysis of query answering problems over databases. This line of research aspires to determine, for any given query that the user wants to compute, what is an algorithm that runs with optimal guarantees. So far, I focused mostly on identifying the query structures that can be answered with near-optimal time guarantees.
Accessing ranked answers. An array holding the sorted list of query answers can be useful in many scenarios, even when we do not need all answers (e.g., for creating a boxplot). Can we simulate such an array much more efficiently than materializing it?
When can we simulate such an array with only linear preprocessing and polylogarithmic overhead? We looked into conjunctive queries with lexicographic or sum of weights orders. Published as "Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries".^{(1)}
Arxiv PODS 21 TalkHow much preprocessing do we need to achieve polylogarithmic overhead? We looked into full conjunctive queries with lexicographic orders. Published as "Tight Fine-Grained Bounds for Direct Access on Join Queries".
Arxiv PODS 22 TalkEnumerating answers in random order. We would like to efficiently list all query answers in uniformly random order, so every prefix of the answers is statistically meaningful. When can this be achieved with only linear preprocessing and polylogarithmic delay? We propose an algorithm that uses direct access to the query answers, and we inspect when conjunctive queries and unions of conjunctive queries admit efficient algorithms for these two tasks. Published as "Answering (Unions of) Conjunctive Queries using Random Access and Random-Order Enumeration".
Arxiv PODS 20 TODS 22 SlidesAnswering Unions of Conjunctive Queries (UCQs). When can we list the answers to UCQs with optimal time guarantees (linear preprocessing and constant delay)? Among our results, we show that the tractable cases are not limited to unions of tractable CQs, and in fact, some unions consisting of only intractable CQs are tractable too. Published as "Enumeration Complexity of Unions of Conjunctive Queries".^{(1)(2)}
Arxiv PODS 19 TODS 21 TalkAnswering CQs with Functional Dependencies (FDs). Very often databases contain dependencies among attributes, and some queries classified as hard over general databases become tractable if we are allowed to exploit these dependencies. We inspect in which cases the dependencies are enough to make conjunctive queries solvable with optimal time guarantees (linear preprocessing and constant delay). Published as "Enumeration Complexity of Conjunctive Queries with Functional Dependencies".^{(1)}
Arxiv ICDT 18 TOCS 19 Corrigendum Slides
(1) Invited for a special issue of the journal on selected publications from the conference.
(2) PODS 2019 Best Student Paper.
While the previous section focused on cases where things can be done very efficiently, life is not always that simple. This section proposes solutions for when databases contain uncertainty (in the form of noise that need to be cleaned or probabilistic databases), when data does not appear in a neat structured form, or when the queries we want to answer cannot be quickly tweaked into a tractable acyclic form.
Repairing using soft FDs. Assume we have a database that contains noise in the form of redundant facts, and we have constraints in the form of Functional Dependencies (FDs) that are interpreted as soft: they do not necessarily always hold. Instead, every violation and every removal of a fact from the database are penalized. When can we efficiently find an optimal subset of the database? Published as "Database Repairing with Soft Functional Dependencies".^{(1)}
Arxiv ICDT 21 TalkRepresenting infinite PDBs. Infinite Probabilistic DataBases (PDBs) model uncertainty when the possible values are not restricted to a finite domain. In the finite case, tuple-independent representations have many advantages, but in the infinite, they cannot always be used. When can they represent the desired infinite PDB? Published as "Tuple-Independent Representations of Infinite Probabilistic Databases".^{(1)}
Arxiv PODS 21 TalkSummarizing reviews with structural explanations. Given a large number of reviews expressing opinions regarding a product, we want to automatically build a graph describing the (deduplicated) opinions expressed in the reviews, together with the explanation relationship (e.g. "walking distance to the beach" explains "great location"). Published as "ExtremeReader: An interactive explorer for customizable and explainable review summarization" and "Constructing Explainable Opinion Graphs from Reviews".
Arxiv WWW 20 WWW 21 Project PageFinding Tree Decompositions (TDs). Tree decompositions can be used to optimize join queries, but finding the best decomposition is known to be hard. We propose an algorithm that enumerates all the proper tree decompositions (or minimal triangulations) of a graph in incremental polynomial time. This can be used as an anytime algorithm to improve on any method for finding a single tree decomposition (or triangulation). Published as "Efficiently Enumerating Minimal Triangulations".
Arxiv PODS 17 DAM 20 Code Talk(1) Invited for a special issue of ACM Transaction on Database Systems (TODS) on selected publications from the conference.
The complexity of listing (= enumerating) query answers over a database depends on the query structure (acyclic is easy). However, a query with a naturally difficult structure can sometimes be adjusted into an easy form by using commonly available additional information (hence the acyclicity is implicit). My work aims at identifying how fast we can solve any given query. This is done either by providing an efficient algorithm for answering the query or by proving that an algorithm with a given complexity cannot be achieved.
Thesis TalkDate | Length | Topics | Venue | |
---|---|---|---|---|
Video | 29/03/22 | 47:00 | invited tutorial: answering UCQs with ideal time guarantees (enumeration and related problems) | International Conference on Database Theory (ICDT) |
Video | 22/06/21 | 12:29 | ranked access | The Symposium on Principles of Database Systems (PODS) |
Video | 02/03/21 | 48:35 | TDs, FDs, UCQs, random order | The seminar of the Formal Methods group at LaBRI, Bordeaux |
Video Slides | 02/01/20 | 16:02 | random order, UCQs | TAU Theory Fest - Fine Grained Complexity workshop, Tel Aviv |
Video | 01/07/19 | 19:29 | UCQs | The Symposium on Principles of Database Systems (PODS), Amsterdam |
Video | 16/05/17 | 20:00 | TDs | The Symposium on Principles of Database Systems (PODS), Chicago |
Combinatorics. This is the first class I taught, two years into my undergraduate studies, and my favorite "standard" class to teach. For me, the satisfaction in teaching comes from showing students how to think more effectively, and I enjoy the type of thinking that is at the core of this class. This class is taken by up to 350 students in normal semesters and 50 students in summer semesters. I received the Permanent Excellent Faculty Instructor Award for teaching this class in spring 2015 and winter 2016.
Competitive Programming. Workshop in Competitive Programming is a very unique class where students gain experience solving problems quickly in teams of three using short programs. The problems sometimes require ad-hoc solutions, sometimes use known paradigms (such as dynamic programming, divide and conquer, clever complete search, or a greedy approach), and sometimes incorporate known algorithms involving graphs, strings, maths, computational geometry, or any topic that can be considered computer science. Teamwork is also a big component of this class as each team of three students works together on one computer to try to solve up to 10 problems in 5 hours. To this aim, it is important to identify bottlenecks, prioritize the work, and communicate effectively with the other teammates. This is where students often get hands-on experience with topics they had only learned theoretically before. A competition at the end of the semester determines which team represents the Technion in the following Southwestern Europe Regional Contest (SWERC) of the International Collegiate Programming Contest (ACM-ICPC). I took this class myself as a student, and this was undoubtedly the best class of my degree! My team then competed in SWERC 2014 in Porto. As a TA, I pushed for expanding this class significantly to equip the students with the tools and tips we gained with experience. We added an hour of teaching prior to the weekly two-hour practices, and built lessons covering what we thought are the most useful parts in each of the aspects mentioned above.
Technion, 2015-2020
Advised by Prof. Benny Kimelfeld at Technion Data & Knowledge Laboratory.
2019-2020 Google PhD Fellowship.
2019 Visiting researcher at the Factorised Databases group at University of Oxford.
2018-2019 Jacobs Excellence Scholarship.
ThesisMegagon Labs, 2019
Summer internship working on the Extreme Reading project.
About the ProjectGoogle, 2015-2016
2016 Summer internship with the Cultural Institute front-end team at Google Paris.
2015 Summer internship with the Live Results back-end team at Google Haifa.
Technion, 2011-2015
Graduated summa cum laude.
Lapidim excellence program.
2013 exchange student at University of Toronto.
2014 An Android app for EyeControl to helps locked-in individuals communicate using eye movements. Hello World NPO & Prize4Life.
2013 A prototype of real-time hand detection applied to a Pong game on a Samsung Beam cellphone. GIP lab.
Pong VideoBased on a website template by w3.css