Query-Efficient Computation in Property Testing and Learning Theory David GarcĂ­a Soriano Abstract: How can we solve computational tasks when we simply don't have enough time to process all the data? For example, given a sequence of numbers, can we determine if they start repeating by looking at just a few of them? Or, given query access to some boolean function $f$, how many queries to $f$ do we need to test whether it is monotone? How about deciding if $f$ is ``essentially'' the same as another, known, function $g$? One of the central goals of theoretical computer science is to understand the limits of computation in a variety of models, and to characterize the resources required to solve certain tasks. The kinds of problems outlined above are particularly amenable to study in the property testing model, and it is under this prism that we investigate them. In property testing, algorithms are required to distinguish those objects which satisfy the desired property from those which are far from it. We seek query-efficient randomized testers to solve them (upper bounds), along with rigorous proofs explaining why no significantly better solutions are possible, even in principle (lower bounds). The results presented here draw from techniques in probability theory, graph theory, extremal combinatorics, the theory of permutation groups, coding theory, the analysis of boolean functions, and number theory. The starting point is our work on the problem of testing function isomorphism in Chapter 2. Two boolean functions on $n$-bit inputs are isomorphic if they are the same up to permutations of the $n$ input variables. The main case of interest is when $g$ is known and $f$ needs to be queried. It is known that the task can be accomplished with $\widetilde{O}(n)$ queries, but this is exponentially larger than the best lower bounds from prior work. Here we bridge the gap by contributing an almost-optimal adaptive lower bound of $\Omega(n)$ queries for worst-case functions~$f$. Several other variants of the problem are also discussed. In addition, we show in Chapter 3 that when $f$ is a $k$-junta (meaning that it is determined by just $k$ of the $n$ input variables), the complexity of testing isomorphism to $f$ is reduced to $\widetilde{O}(k)$. (In contrast, another result of ours is that if we impose the seemingly weak restriction that the tester have one-sided error, the complexity of testing isomorphism to $k$-juntas becomes roughly $\log\binom{n}{k}$, which is much larger for small $k$.) In the process we construct objects of independent interest called sample extractors. These are efficient algorithms that allow us to draw samples from the truth table of the ``core'' function on $k$ variables that determines a given $k$-junta function. Next we give a partial characterization of the set of functions against which it is possible to test isomorphism with constant query complexity. We show in Chapter 4 that, for any function $f$ with polynomially many different permutations, isomorphism to $f$ is testable with constantly many queries. This theorem extends previous results from junta testing and in fact covers all functions known to date for which isomorphism testing is easy. On a related note, observing the connection between testing function isomorphism and testing hypergraph isomorphism, we turn our attention to testing isomorphism for uniform hypergraphs. We give a characterization of the class of hypergraphs of constant rank for which isomorphism can be efficiently tested, generalizing a result of Fischer (STOC'04) regarding graph isomorphism. In Chapter 5 we establish links with the field of group testing, and observe that ideas from isomorphism testing can be applied to the study of a naturally-arising relaxation of group testing problems; both the lower bound methods and the algorithms are of use. We determine the exact query complexity of the relaxed group testing problem for non-adaptive algorithms, up to constant factors. The question of obtaining explicit lower bounds for these problems is also addressed; it turns out that parity functions (XORs of subsets of the input variables) exemplify the worst-case lower bounds for testing function isomorphism, both for one-sided and for two-sided testers. We move on to discuss other property testing problems in Chapter 6. It turns out that our sample extractors can be used to enhance the best algorithms known for many other testing problems, specifically those defined by the property of having concise representations, such as testing if $f$ can be computed by small circuits, or decision trees of small size. We also provide new lower bounds for some of them, resolving several open questions posed by Diakonikolas et al.~(FOCS'07). In Chapter 7 we examine parity functions under a different light: computational learning theory. Testing and learning are two closely related areas. Instead of testing whether the function $f$ is a parity on a small number of variables, in testing we assume that it is. Then, based on some example values of $f$, the learner tries to predict $f(x)$ for other inputs $x$ with good accuracy. We work in the mistake-bound model of learning, which is stronger than the (more usual) $\PAC$ model. It is a standard fact that parities can be learned with mistake bound (or sample complexity) $O(k \log n)$, but no polynomial-time implementation of such a learning algorithm is known. We design a simple, deterministic, polynomial-time algorithm for learning $k$-parities with mistake bound $O(n^{1-\frac{1}{k}})$. This is the first polynomial-time algorithm to learn $\omega(1)$-parities with mistake bound $o(n)$, and using standard conversion techniques it implies an improvement over the results of Klivans and Servedio (COLT'04) for learning $k$-parities in the $\PAC$ model. Apart from this, we also consider one of the fundamental problems in property testing: monotonicity of functions (not necessarily boolean). In Chapter 8 we consider functions defined on the $n$-dimensional hypercube, and give an $\Omega(n)$ lower bound for one-sided, non-adaptive testers of monotonicity. As there is still a sizable gap between this and the best upper bounds known, we look at a natural approch to obtaining upper bounds via the study of routing properties of the hypercube. It had been previously observed that if any set of source-sink pairs on the directed hypercube (where all sources and sinks are distinct) could be connected with edge-disjoint paths, then monotonicity of functions on the $n$-dimensional hypercube would be testable with $O(n)$ queries. Determining whether this property holds was posed as an open problem by Lehman and Ron (J.~Comb.~Theory, Ser.~A, 2001), but the question had remained elusive for nearly a decade. By analyzing the combinatorial properties of the hypercube, we show that the answer is negative, and that this approach must fail short of matching the current lower bounds, or even approaching them. In the final chapter we address the cycle detection problem. Perhaps counter-intuitively, we prove that if the smallest periodic part of the sequence contains distinct elements, the period length $r$ can be determined by making a number of probes to the sequence that is sublogarithmic in $r$; moreover, this is not too far from optimal. We also study variants of the problem where random access to the sequence is not allowed, but we can instead ``jump'' between positions that are not too far away from each other. The latter improves on the related results of Cleve (CCC'00).