## Abbas Mehrabian's Research Projects

• New algorithms for batched bandit problems (2019 – 2020) (the paper). A bandit problem is a sequential game between a player/learner and an environment. In each round, first the player chooses an action from a set of actions and then the environment presents a reward. The player's goal is to maximize her total accumulated reward. The standard performance measure for a bandit policy is its regret, defined as the difference between the total expected reward collected by the policy and the total expected reward collected by the optimal policy. In the batched version of this problem, the player commits to a sequence of actions and observes the rewards after all actions in that sequence are played. The player is subject to using at most a given number of batches, and the total number of actions played by the player must sum to a given number.

With Hossein Esfandiari, Amin Karbasi, and Vahab Mirrokni we presented simple and efficient algorithms for batched stochastic multi-armed bandit and batched stochastic linear bandit problems. We proved bounds for their expected regrets that improve over the best known regret bounds for any number of batches. In particular, our algorithms in both settings achieve the optimal expected regrets by using only a logarithmic number of batches. We also studied the batched adversarial multi-armed bandit problem for the first time and found the optimal regret, up to logarithmic factors, of any algorithm with predetermined batch sizes (the paper).

• A new randomized algorithm for bandit problems (2019) (the paper). Two standard algorithms for bandit problems are upper-confidence-bound (UCB) and Thompson sampling (TS) algorithms. In general, UCB has better theoretical guarantees, while TS has better empirical performance. With Audrey Durand, Branislav Kveton, and Sharan Vaswani we proposed RandUCB, a bandit strategy that uses theoretically derived confidence intervals similar to UCB algorithms, but akin to Thompson sampling, uses randomization to trade off exploration and exploitation. In the $K$-armed bandit setting, we show RandUCB achieves the minimax-optimal $\widetilde{O}(\sqrt{K T})$ regret after $T$ rounds. For linear bandits, we prove RandUCB achieves the minimax-optimal $\widetilde{O}(d \sqrt{T})$ regret even in the case of infinite arms, where $d$ denotes the dimension. We demonstrate the practical effectiveness of RandUCB with experiments in both the multi-armed and linear bandits. Our results illustrate that RandUCB matches the empirical performance of TS while obtaining the theoretically optimal regret bounds of UCB algorithms, thus achieving the best of both worlds (the paper).

• New algorithms for homogenous and heterogeneous multiplayer bandits (2018 – 2019) (paper 1, paper 2). Consider a multiplayer stochastic multi-armed bandit game in which the players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward.

1. Two feedback models can be considered: a model in which the players can observe whether a collision has occurred, and a more difficult setup when no collision information is available. With Gabor Lugosi we gave the first theoretical guarantees for the second model: an algorithm with a logarithmic regret, and an algorithm with a square-root regret type that does not depend on the gaps between the means. For the first model, we gave the first square-root regret bounds that do not depend on the gaps. Building on these ideas, we also gave an algorithm for reaching approximate Nash equilibria quickly in stochastic anti-coordination games (the paper).

2. Next consider the setting where each arm has a different mean for each player. In the model in which the players can observe whether a collision has occurred, an algorithm with regret $O((\log T)^{2+\kappa})$ for any constant $\kappa$ was presented in this paper, who left the existence of an algorithm with $O(\log T)$ regret as an open question ($T$ is the number of rounds). With Etienne Boursier, Emilie Kaufmann and Vianney Perchet we provided an affirmative answer to this question in the case when there is a unique optimal assignment of players to arms. For the general case we presented an algorithm with expected regret $O((\log T)^{1+\kappa})$, for any constant $\kappa$ (the paper).

• Density estimation, or sample complexity of distribution learning (2017 – 2019) (paper 1, paper 2, a tutorial covering the results of papers 1 and 2, paper 3). Given samples from an unknown probability distribution with density $f$, we want to output a density function $g$ such that with high probability $\|g-f\|_1\leq\varepsilon$. Given that $f$ belongs to (or can be approximated by) a class of distributions $\mathcal F$, what is the minimum number of samples required?

1. Suppose we know $f$ is a mixture of $k$ Gaussians in $d$ dimensions, and we would like $g$ to have a similar structure. With Hassan Ashtiani, Shai Ben-David, Nick Harvey, Chris Liaw and Yaniv Plan, we proved that $\widetilde \Theta (kd^2/\varepsilon^2)$ samples are necessary and sufficient for this task, improving both the known upper and lower bounds. Moreover, for mixtures of axis-aligned Gaussians, $\widetilde \Theta (kd/\varepsilon^2)$ samples are necessary and sufficient. For proving these bounds we have developed a novel distribution learning technique based on compression schemes (the paper).
With Luc Devroye and Tommy Reddad, we removed the logarithmic factors in the lower bound, and showed that $\Omega(kd^2/\varepsilon^2)$ samples are necessary for this problem (the paper).

2. Suppose the sample complexity for learning class $\mathcal F$ is $m(\varepsilon)$. Under mild technical conditions, with Hassan Ashtiani and Shai Ben-David we showed the sample complexity for learning $k$-mixtures of distributions in $\mathcal F$ is bounded by $O( m(\varepsilon) k \log (k) / \varepsilon^2)$ (the published paper had a bug which was fixed in the arXiv version).

3. Let $G$ be an undirected graph with $m$ edges and $d$ vertices, and let $\mathcal F$ be the class of Ising models defined on $G$. With Luc Devroye and Tommy Reddad, we showed the optimal sample complexity for learning $\mathcal F$ is $\Theta((m+d)/\varepsilon^2)$ (the paper).

• Total variation distance between high-dimensional Gaussians (2018) (the paper). What is the total variation distance between two high-dimensional Gaussians $N(\mu_1,\Sigma_1)$ and $N(\mu_2,\Sigma_2)$? If $\mu_1=\mu_2$, we show the answer is $\Theta(\min\{1,\|\Sigma_1^{-1/2}\Sigma_2\Sigma_1^{-1/2}-I\|_F\})$, where $\|A\|_F$ denotes the Frobenius norm of $A$. If $\mu_1\neq\mu_2$, then let $v= \mu_1-\mu_2$, and let $\Pi$ be a matrix whose columns form a basis for the subspace orthogonal to $v$. Then the total variation distance is $\Theta \left(\min\left\{1, \max\left\{ \frac{|v^{\mathsf{T}} (\Sigma_1-\Sigma_2)v|}{v^{\mathsf{T}} \Sigma_1 v},\frac{{v^{\mathsf{T}} v}}{\sqrt{v^{\mathsf{T}} \Sigma_1 v}}, \| (\Pi^{\mathsf{T}} \Sigma_1 \Pi)^{-1/2} (\Pi^{\mathsf{T}} \Sigma_2 \Pi) (\Pi^{\mathsf{T}} \Sigma_1 \Pi)^{-1/2} - I \|_F\right\} \right\} \right)$. Joint work with Luc Devroye and Tommy Reddad.

• VC-dimension of neural networks (2016 – 2017) (the extended abstract and the full version of the paper). Deep neural networks underlie many of the recent breakthroughs of applied machine learning, particularly in image and speech recognition. These successes motivate a renewed study of these networks' theoretical properties. Classification is one of the learning tasks in which deep neural networks have been particularly successful, e.g., for image recognition. A natural foundational question that arises is: what are the theoretical limits on the classification power of these networks? The established way to formalize this question is by considering the VC-dimension, as it is well known that this asymptotically determines the sample complexity of PAC learning with such classifiers.

With Peter L. Bartlett, Nick Harvey and Chris Liaw, we proved new upper and lower bounds on the VC-dimension of deep neural networks with the ReLU activation function. These bounds are tight for almost the entire range of parameters. Letting $W$ be the number of parameters and $L$ be the number of layers, we proved that the VC-dimension is $O(W L \log(W))$ and $\Omega(WL\log(W/L)).$ This improves both the previously known upper bounds and lower bounds. These bounds generalize to arbitrary piecewise linear activation functions, and also hold for the pseudodimension of these networks. In terms of the number $U$ of non-linear units, we proved a tight bound $\Theta(W U)$ on the VC-dimension and pseudodimension, which holds for arbtirary piecewise polynomial activation functions (the paper).

• Growing a tree in a graph (2016 – 2017) (the paper). Given a graph and a starting vertex, we iteratively build a random spanning tree as follows: initially the tree has just one vertex. in each round, we select a random edge with exactly one endpoint in the tree, and add that edge to the tree. We give bounds for the height of the resulting tree, in terms of parameters of the graph (diameter, maximum degree, genus, degeneracy, edge expansion, etc).

• Randomized rumour spreading (2013 – 2017) (paper 1, paper 2, paper 3, a survey in Persian). The synchronous push&pull protocol is a round-robin rumour spreading protocol, defined as follows. Suppose that one node in a network is aware of a piece of information, the rumour. The protocol proceeds in rounds. In each round, every informed node contacts a random neighbour and sends the rumour to it (pushes the rumour), and every uninformed nodes contacts a random neighbour and gets the rumour if the neighbour possibly knows it (pulls the rumour). This model was defined in this 1987 paper and popularized by this 2000 paper.

1. Random $k$-trees are a class of low-diameter scale-free random graphs with large clustering coefficient, built as follows: initially we have a $k$-clique. In every step a new vertex is born, a random $k$-clique of the current graph is chosen, and the new vertex is joined to all vertices of the $k$-clique. With Ali Pourmiri I studied the behaviour of the synchronous push&pull protocol on random $k$-trees and proved that for each fixed $k>1$, if initially a random vertex is aware of the rumour, then with high probability after polylogarithmic (in the number of vertices) many rounds, the rumour propagates to almost all vertices. We proved a similar result for a closely related class of graphs, random $k$-Apollonian networks. On the other hand, we proved that if we want to inform each and every vertex, with high probability we need at least polynomially many rounds (the paper, see also its talk video).

2. The asynchronous variant of the push&pull protocol is defined as follows. Independent Poisson clocks of rate 1 are associated with the vertices of graph $G$, one to each vertex. Initially, one vertex of $G$ knows the rumour. Whenever the clock of a vertex $x$ rings, it calls a random neighbour $y$: if $x$ knows the rumour and $y$ does not, then $x$ tells $y$ the rumour, and if $x$ does not know the rumour and $y$ knows it, $y$ tells $x$ the rumour. The guaranteed spread time is the smallest time that with high probability all vertices know the rumour. With Huseyin Acan, Andrea Collevecchio and Nick Wormald, we found the worst and best possible spread times, up to constant factors, for both variants. We also proved the first theoretical relationships between the guaranteed spread times in the two versions. Firstly, in all graphs the guaranteed spread time in the asynchronous version is within an $O(\log n)$ factor of that in the synchronous version, and this is tight. Next, we found examples of graphs whose asynchronous spread times are logarithmic, but the synchronous versions are polynomially large. Finally, we show for any graph that the ratio of the synchronous spread time to the asynchronous spread time is $O(n^{2/3})$. (the paper).

3. With Omer Angel and Yuval Peres, we showed for any graph that the ratio of the guaranteed synchronous spread time to the guaranteed asynchronous spread time is $O(n\log n)^{1/3}$, which is tight up to logarithmic factors (the paper).

• Load balancing via randomized local search (2015 – 2016) (the paper). Consider $n$ identical resources and $m$ identical users. Each user must be assigned to exactly one resource. We analyze a distributed randomized local search protocol introduced in 2004 by Paul W. Goldberg, whose goals is to assign users to resources such that the load is perfectly balanced. In this protocol, users are activated by independent exponential clocks of rate 1. On activation, a user samples a random resource and compares that resource's load with its current load. If the new load is lower, the user switches. It is shown in this paper that the expected balancing time is $O(\ln^2 n+\ln(n)\cdot n^2/m)$. With Petra Berebrink, Peter Kling, and Chris Liaw, we improved this to $O(\ln(n)+n^2/m)$, which is tight.

• Properties of the spatial preferential attachment model (2014 – 2016) (the paper). Rumour spreading is a protocol that models the spread of information through a network via user-to-user interaction. The spread time of a graph is the number of rounds needed to spread the rumour to the entire graph. The Spatial Preferred Attachment (SPA) model (introduced here) is a random graph model for complex networks: vertices are placed in a metric space, and the link probability depends on the metric distance between vertices, and on their degree. With Jeannette Janssen we showed that the SPA model typically produces graphs that have small effective diameter, i.e. $O(\log^2 n)$, while rumour spreading is relatively slow, namely polynomial in $n$.

• Geometric bounds for the deviation of random matrices (2015 – 2016) (the paper). Let $A$ be an isotropic, random sub-gaussian $m\times n$ matrix. The Gaussian complexity of a set $T\subseteq \mathbb{R}^n$ is defined as $\mathbb E \sup_{x \in T} |\langle g , x \rangle |$, where $g$ has i.i.d. standard normal entries. With Chris Liaw, Yaniv Plan, and Roman Vershynin, we showed that for any bounded set $T$, the deviation of $\|Ax\|_2$ around its mean is uniformly bounded by the Gaussian complexity of $T$. We also proved a local version of this theorem, which allows for unbounded sets. These theorems have various applications, in particular, we gave a new result regarding model selection in the constrained linear model.

• Diameters of real-world network models (2013 – 2014) (paper 1, paper 2)
1. I have developed a new technique for proving logarithmic upper bounds for the diameters of evolving random graph models, which is based on defining a coupling between random graphs and variants of random recursive trees. The advantage of the technique is two-fold: it is quite simple and provides short proofs, and it is applicable to a broad variety of models, including those incorporating preferential attachment. Using this I proved, for the first time, logarithmic upper bounds for the diameters of the following well known models: the forest fire model, the copying model, the PageRank-based selection model, the Aiello-Chung-Lu models, the generalized linear preference model, directed scale-free graphs, and the Cooper-Frieze model (the paper).

2. The random-surfer Webgraph model is defined as follows. Let $d$ be a positive integer and let $p \in (0,1]$. Generate a random directed rooted $n$-vertex multigraph, with all vertices having out-degree $d$, as follows. Start with a single vertex $v_0$, the root, with $d$ self-loops. At each subsequent step $s=1,2,\dots,n-1$, a new vertex $v_s$ appears and $d$ new edges are created from it to some of the present vertices, by doing the following probabilistic procedure $d$ times, independently: choose a vertex $u$ uniformly at random from $\{v_0,v_1,\dots,v_{s-1}\}$, and a fresh geometric random variable $X$ with parameter $p$; perform a simple random walk of length $X$ starting from $u$, and join $v_s$ to the last vertex of the walk. This model is equivalent to the PageRank-based selection model. With Nick Wormald, we proved the diameter of both models is at most $8 e^p \log n / p$. This is the first logarithmic upper bound for these models, and verifies that they are small-world. In the special case $d=1$, we obtain a directed tree with some self-loops at the root. We proved sharp results for the underlying undirected tree. When $p \geq 0.21$ we determined the asymptotic value of its height and diameter, and when $p < 0.21$, we provided logarithmic lower and upper bounds (the paper).

• Properties of Random Apollonian Networks (2012 – 2014) (paper 1, paper 2)
1. Start with a triangle. In each step, choose a random face (not the unbounded face), add a point in that face and join it to the vertices on the face. After $n$ steps, you will have a (random) triangulated plane graph with $n+3$ vertices. With Ehsan Ebrahimzadeh, Linda Farczadi, Jane Gao, Cristiane Sato, Nick Wormald, and Jonathan Zung, we showed that asymptotically almost surely (a.a.s.) as $n \rightarrow \infty$, every path in a RAN has length $o(n)$, refuting a conjecture of Frieze and Tsourakakis. We also showed that a RAN always has a path of length $(2n-5)^{\log(3)/\log(2)}$, and that the expected length of its longest path is $\Omega\left(n^{0.88}\right)$. Finally, we proved that a.a.s. the diameter of a RAN is asymptotic to $c \ln n$, where $c \approx 1.668$ is the solution of an explicit equation (the paper).

2. With Andrea Collevecchio and Nick Wormald, we proved there exists a constant $a<1$ such that a.a.s. every path in a RAN has length smaller than $n^a$, verifying a conjecture of Cooper and Frieze. Using a similar technique, we showed that if $r < d$ are fixed constants, then a.a.s. every $r$-ary subtree of a random $d$-ary recursive tree on $n$ vertices has less than $n^b$ vertices, for some $b=b(d,r)<1$ (the paper).

• Unique-Path Labellings or Good Edge-Labellings (2011 – 2013) (paper 1, paper 2)
1. A unique-path labelling (a.k.a. good edge-labelling) of a simple graph is a labelling of its edges with real numbers such that, for any ordered pair of vertices $(u,v)$, there is at most one nondecreasing path from $u$ to $v$. Say a graph is good if it admits a good edge-labelling, and is bad otherwise. Bode, Farzad, and Theis conjectured that all graphs with large enough girth are good (their paper). I proved that if the maximum degree of a graph is within a constant factor of its average degree, and the graph is good, then it has at most $O\left(n^{1+o(1)}\right)$ edges, disproving their conjecture (the paper).

2. Araújo, Cohen, Giroire, and Havet observed that the maximum number of edges of a good graph on $n$ vertices is $\Omega(n \log(n))$ and $O(n\sqrt{n})$, and asked about more precise bounds (their paper). With Dieter Mitsche and Pawel Pralat, we solved this problem by proving that any good $n$-vertex graph has at most $n \log_2 (n)$ edges and for every $n$ there exists a good $n$-vertex graph with $n \log_2 (n) - O(n)$ edges. (the paper)

• Stretch Factor of a Randomly Embedded Random Graph (2011 – 2012) (the paper). Consider a random graph $G(n,p)$ whose vertex set has been randomly embedded in the unit square and whose edges are given weight equal to the geometric distance between their end vertices. Then each pair $u,v$ of vertices have a distance in the weighted graph, and a Euclidean distance. The stretch factor of the embedded graph is defined as the maximum ratio of these two distances over all pairs $u,v$. With Nick Wormald, we gave upper and lower bounds on the stretch factor (holding asymptotically almost surely), and showed that for $p$ bounded away from 0 and 1, these bounds are best possible in a certain sense. Our results imply that the stretch factor is bounded with probability tending to $1$ if and only if $n(1-p)\rightarrow 0$, answering a question of Joseph O'Rourke.

• DAG Partitioning Problem (2010 – 2012) (the paper). Consider the following DAG Partitioning problem: given a directed acyclic graph with arc weights, delete a set of arcs of minimum total weight so that each of the resulting connected components has exactly one sink. With Soroush Alamdari, we proved that the problem is hard to approximate, namely, there is no $n^{1-\varepsilon}$-approximation algorithm, for any fixed $\varepsilon>0$. We also gave a polynomial time algorithm for solving the DAG Partitioning problem in graphs with bounded pathwidth.

• Network Creation Games (2010 – 2011) (the conference paper, the full version). Consider a network creation game, in which each player (vertex) has a fixed budget to establish links to other players; each link has unit price and each agent tries to minimize its cost, which is either its local diameter or its total distance to other players in the (undirected) underlying graph of the created network. This is similar to a directed version of this basic network creation game. With Shayan Ehsani, MohammadAmin Fazli, Sina Sadeghian Sadeghabad, MohammadAli Safari, Morteza Saghafian and Saber Shokatfadaee, we proved some bounds for price of anarchy of this game, and in particular, we showed a nonmonotone property of this game, (similar to the Braess' paradox in network routing games).

• The Cops and Robber Game (2009 – 2012) (paper 1, paper 2.1, paper 2.2, paper 3.1, paper 3.2, paper 3.3)
1. My BSc thesis was on the Cops and Robber game, a round-robin vertex-to-vertex pursuit game played on a graph. In my thesis, written in Persian in 2009 under the supervision of MohammadAli Safari, I surveyed some results on this game. I also proved a few results about the capture time of grids (the capture time of a graph, introduced here, is the minimum number of steps needed to capture the robber in that graph) (the paper).

2. In my Master’s I continued working on this game. I specially focused on the variant where the robber is faster than the cops (introduced here). With Noga Alon, we proved that if the robber has speed $s$, then the cop number of a connected $n$-vertex graph can be as large as $\Omega\left(n^{s/(s+1)}\right)$ (the paper). I had previously proved this result for $s=2, 4$, and had conjectured that this bound is asymptotically tight, i.e., that $O\left(n^{s/(s+1)}\right)$ cops can always capture the robber (the paper). This generalizes Meyniel's well known conjecture (see this survey), which states the same thing for $s=1$.

3. Next, I worked on the variant in which the robber has infinite speed. With Noga Alon, we showed that the cop number and treewidth of planar graphs are of the same order, and determined the cop number of random graphs (the paper). Then I characterized graphs with cop number one and gave an algorithm for their detection, and proved some bounds for the cop number of expander graphs (the paper). Finally, I showed that any interval graph has cop number $O(\sqrt n)$ but there are chordal graphs with cop number $\Omega(n / \log n)$ (the paper).

• Flows in Graphs (2006 – 2008) (paper 1, paper 2). An unoriented flow in a graph (introduced here) is an assignment of real numbers to the edges, such that the sum of the values of all edges incident with each vertex is zero. This is equivalent to a flow in a bidirected graph all of whose edges are extraverted. A nowhere-zero unoriented $k$-flow is an unoriented flow with values from the set $\{\pm 1,... ,\pm (k-1)\}$. Together with Saieed Akbari, Aliakbar Daemi, Omid Hatami and Adel Javanmard, we worked on the problem of finding the minimum $k$ such that a graph has an unoriented $k$-flow. We proved several results for regular graphs (the paper) and hamiltonian graphs (the paper).

• Pacman Intelligent Controller (2007 – 2008) (the technical report). I worked along with Arian Khosravi and Ali Dehghan, under the supervision of Ramin Halavati, to implement a software controller for Ms. Pac-Man game, a simple real-time game. We played, discussed and implemented a variety of approaches, and finally came up with a heuristic-based program, which became first in the IEEE Congress on Evolutionary Computation (CEC 2007) competition and was the best known agent for the game for two years (the technical report).

• Complexity of Graph Homomorphisms (2007). I studied cylindrical construction, a concept introduced by Amir Daneshgar in order to prove hardness results for Graph Homomorphism Problem. I was able to move a few small steps towards the Bang-Jensen and Hell's conjecture about the dichotomy of digraph homomorphism problem, but meanwhile the whole conjecture was proved in this paper.

Back to my homepage