In this short note, we discuss applications of some concepts of Compressed Sensing (sparsity, compressibility)
in the recent work on invertibility of random matrices
due to Rudelson and the author.
We sketch an argument leading to the optimal bound N^{-1/2} on the median
of the smallest singular value of an N × N matrix with random independent
entries. We highlight the parts of the argument where sparsity ideas played a key role.

In this short note, we respond to some concerns raised by Y. Censor, G. Herman, and M. Jiang about the randomized Kaczmarz method, which we proposed earlier. Basically, we emphasize that the convergence rate of this algorithm does depend on a condition number of the system of linear equations (which in turn depends on the normalization of rows). For example, multiplying a single equation through by a constant obviously does not change the solution of the system, but it can change its condition number (and, as a result, the convergence rate of the randomized Kaczmarz algorithm).

This point was not probably very clear in our original paper. To clarify this, JFAA decided to publish the concerns of Y. Censor, G. Herman, and M. Jiang along with our response, as two notes in the same journal issue.

This paper is an attempt to understand random matrices with non-independent
entries, but which can be *factorized* through random matrices with independent entries.
In the language suitable to statistical applications,
we are interested in sample covariance matrices of a wide class of random vectors --
the linear transformations of vectors with independent entries.

Here we study the spectral norm of such matrices.
Suppose a matrix M can be factored as M=BA,
where A is a random matrix with independent mean zero entries and B is a fixed matrix.
Under the (4+epsilon)-th moment assumption on the entries of A, we show that
the spectral norm of such an m × n matrix M is bounded by
m^{1/2} + n^{1/2}, which is sharp.
In other words, in regard to the spectral norm, products of random and deterministic
matrices behave *similarly to random matrices with independent entries*.
Note that this bound is independent of the intermediate dimension through which
M factors (i.e. the number of rows of A and the number of columns of B).

The main motivation to prove this result was to an application in the
recent work of M. Rudelson and the author
on the smallest singular value of random rectangular matrices.
We showed there that the smallest singular value of a random
m × n matrix with i.i.d. mean zero entries with unit variance
is bounded below by m^{1/2} - (n-1)^{1/2} with high probability.
This was proved under the assumptions that the entries have subgaussian tails.
We left it open in that work whether the same holds under some finite moment
assumption. Using the main result of the present paper, this bound automatically
extends to the case when entries have bounded (4+epsilon)-th moment.

It would be interesting to know if the main result of this paper can be
stated as an "isoperimetric" result. Namely, we may ask -- what matrices B maximize
the expected spectral norm of M=BA, subject to natural constraints?
A natural constraint, for example, is that B is an orthogonal projection
onto R^{m}. Is it true then that the coordinate projections B are the
maximizers? If so, the result of this paper would quickly follow from
known estimates by
Y. Seginer
and R. Latala.

In absence of the isoperimetric picture, we prove our main result as follows.
Let us assume m=n for simplicity.
The columns of M are independent random vectors in R^{n} (even though
the rows of M are not necessarily independent).
So we can write M^{*}M as the sum of independent rank one linear
operators X_{j} × X_{j} (the tensor products of the columns of M with themselves).
A sharp estimate for sums of independent random operators is given by
the well known
M. Rudelson's theorem.
This approach leads to the bound
O(n log n) on the spectral norm, i.e. we are off by a logarithmic factor
from the correct answer.

The loss of a logarithmic factor is unfortunately inherent to the use of M. Rudelson's theorem,
where it is needed in full generality.
It would be interesting to understand the situations where the logarithmic
term can be removed from M. Rudelson's theorem.
So far, only one such situation is known from the
work of G. Aubrun --
when the independent random vectors X_{j} are uniformly distributed in an unconditional
convex body.

In absence of a suitable lossless form of M. Rudelson's theorem, the next
stage of the argument generally resembles
R. Latala's approach,
where one first reduces the problem to Gaussian matrices A (by a standard symmetrization),
and then uses fairly standard concentration of measure technique in the
Gauss space, coupled with delicate constructions of nets.
An interesting feature of this part of the argument is the use of the techique one
might call the *entropy-concentration tradeoff*. This method has roots in
some earlier arguments, including the one by R. Latala, and it has been developed in
the work of
M. Rudelson and the author
on invertibility of random matrices.

To illustrate the entropy-concentration tradeoff, recall that we need to bound
the spectral norm ||M|| = sup_{x}||Mx||, where the supremum is taken over
all vectors x in the unit Euclidean sphere S^{n-1}. The well known
"epsilon-net" method proceeds by first fixing x and bounding ||Mx|| with high
probability (using standard concentration inequalities),
then finishes by taking the unit bound over a sufficiently fine
epsion-net of the unit sphere.
Unfortunately, a single probability bound for every fixed vector x
is not strong enough to make this method work.
*Sparse vectors* -- those which have few but large nonzero coordinates --
produce worse concentration bounds than *spread vectors*,
which have many but small nonzero coordinates.
(The reason being that sparse vectors produce worse Lipshitz constants).
What helps us is that there are fewer sparse vectors on the sphere than there are spread vectors.
This leads to a tradeoff between concentration and entropy, i.e. between
the probability with which ||Mx|| is nicely bounded, and the size of an epsilon-net for the
vectors x which achieve this probability bound.
One then divides the unit sphere into classes of vectors according
to their ``sparsity'', and uses the entropy-concentration tradeoff for
each class separately.

Here we settle the other direction of the conjectural bound on the least
singular value of a random square matrix. Let A be a matrix whose
entries are real i.i.d. centered random variables with unit variance
and suitable moment assumptions. Then the smallest singular value of A
is of order n^{-1/2} with high probability. We have proved with Mark
Rudelson the lower estimate of this
type; in this note we establish the matching upper estimate.

Here we further develop our inevrtibility method from our
first paper on the topic.
We consider a *rectangular* random matrix (N × n),
and we show that with high probability its smallest singular value is at least
of order N^{1/2} - (n-1)^{1/2}.
Known results of random matrix theory show that *in the limit*,
as the dimension n increases to infinity while
the aspect ratio n/N converges to a constant, the smallest singular
value converges almost surely to the quantity above.
On the contrary, our estimate holds *for all fixed dimensions*
(up to a constant factor), i.e. without taking the limit, and with very high
probability. Estimates in fixed dimensions are crucial for various
problems in geometric functional analysis, numerical analysis and other
fields.

Our estimate above bridges the known results for tall matrices and
square matrices. For square matrices, our estimate is of order n^{-1/2},
which is our result from the first paper.
For tall matrices, where n is any fixed proportion of N, our estimate is of
order N^{1/2}, which yields the result of
Litvak, Pajor, Rudelson and Tomczak, now with the
*optimal dependence on the proportion*.

Our argument is a development of our method from the first paper for square matrices. However, dealing with rectangular matrices is in several ways considerably harder. Several new tools are developed in this paper, which are of independent interest.

One new key ingredient is a *small ball probability* bound
for sums of independent random vectors in R^{d}.
This is the probability that such a sum falls into a given small
Euclidean ball in R^{d}. Our method is a development of the
Littlewood-Offord theory, which connects the small ball probabilities
to the additive structure of the sum. The less additive structure, the
smaller is the small ball probability. We include a version of a new
and simple argument sue to Sodin and Friedland, which allows to avoid
any use of Halasz regularity lemma.

We use small the ball probability estimate to prove an optimal bound
for the distance problem:
*how close is a random vector X to an independent random subspace H*?
We show that the distance is at least of order m^{1/2} with high
probability, where m is the codimension of H. To prove this, we first
show the intuitive (albeit nontrivial) fact that
*random subspaces have no additive structure*.
As said above, this fact makes our small ball probabiilty
estimates powerful enough.

Finally, we use the distance estimate to bound below the smallest
singular value of a random matrix A. Our random vector X is a column of
A, and our random subspace H is the span of the other columns. The
simple rank argument shows that the smallest singular valueis zero if
and only if X lies in H for some column. A simple quantitative version
of this argument is that a lower estimate on the smallest singular
value yields a lower bound on the distance dist(X,H). We show how to
*reverse the rank argument*, and thus
derive bounds for the smallest singular value from the distance problem.

This is our second paper with my student
Deanna Needell,
where we continue to study alternatives to linear programming methods
in Compressed Sensing.
In our first paper, we developed a greedy algorihtm
called ROMP, which can perfectly recover any sparse signal
from a small number of linear non-adaptive measurements.
In the present paper, we prove *stability* of ROMP.

ROMP seems to be the first greedy algorithm in
Compressed Sensing that is provably stable in the sense that the
recovery error is proportional to the level of the noise. In
particular, the recovery is exact in the absence of noise.
Also, one can use ROMP to accurately recover
*arbitrary* signals, not necessarily sparse.
This kind of stability was known for the linear programming methods
from the work of Candes,
Romberg and Tao, but has been previously unknown for any greedy
method (like OMP).

This is the first paper we wrote with my student Deanna Needell. We settle there one problem in the area of Compressed Sensing: how to construct a greedy algorithm for sparse recovery with uniform guarantees.

Compressed Sensing is focused on the *sparse recovery problem*:
how to reconstruct a vector X that has very few non-zero coordinates
from few linear measurements of X. Such measurements can be described
as a measurement vector AX, which is the product of some measurement matrix A by X.
There are two major algorithmic approaches to the
sparse recovery problem: methods based on Linear Programming (a.k.a.
*L1-minimization*, Basis Pursuit) and greedy iterative methods (for example,
*Orthogonal Matching Pursuit*, Tresholding Algorithms).
Each of the two approaches has its own advantages and disadvantages.

L1 minimization method has strongest known guarantees.
Once the measurement matrix A satisfies the abstract form of the
*Uncertainty Principle* (in the form of the Restricted Isometry Condition),
then a
result of Candes and Tao
states that L1-minimization yields exact recovery.
Specifically, the unique vector with minimal L1-norm over all vectors
with measurements AX is X.
The L1-minimization can be described as a linear program, for which
abundance of solvers is available.

An alternative approach is using *greedy iterative algorithms*.
The measurement vector AX is a linear combination of very few columns
of A (because X has few nonzero coefficients), and we need to know
*which* columns.
It would then be natural to select the columns that are most correlated
with AX (i.e. whose inner products with AX is biggest in magnitude).
However, the set selected this way will not in general recover the
nonzeros of X exactly, even for very nice measurement matrices A.
Instead, one could select only one column of A most correlated with X,
subtract its contribution from AX (making the residual orthogonal to that column)
and iterate. This simple algorithm is
called the Orthogonal Matching Pursuit (OMP), see the
paper by Tropp and Gilbert.

The advantage of OMP is its transparency, ease of implementation, and great speed (OMP is currently considerably faster than L1-mimimization). The disadvantage of OMP is its weaker guarantees. It is actually known that OMP does not have uniform guarantees (with high probability, correctly recovers all vectors) for natural random matrices. Furthermore, no analysis at all is known for some important classes of measurements, such as random Fourier measurements (where the measurements are random frequencies of the signal X).

Our paper essentially settles these concerns about OMP.
We prove that *a simple regularized version of OMP (ROMP) has essentially
the same guarantees as L1-minimization*. Once the measurement
matrix A satisfies the Uncertainty Principle (a.k.a. Restricted Isometry Condition),
then ROMP recovers every sparse vector X from its measurements AX exactly.
ROMP is therefore the first greedy algorithm for sparse recovery with uniform
guarantees.

ROMP is based on the following idea. To recover the signal X from its
measurements Y = AX, we can use the "observation vector" U = A*Y as a
good *local approximation* to X.
Indeed, U encodes correlations of Y with the columns of A. By the
Uniform Uncertainty Principle, every n columns form approximately an
orthonormal system. Therefore, every n coordinates of U look like
correlations of Y with the orthonormal basis and therefore are close to
the coefficients of X.

This suggests to make use of the n biggest coordinates of the observation
vector U, rather than one biggest coordinate as OMP does. ROMP thus
selects n biggest coordinates, and performs *regularization*
by further selecting only the coordinates with comparable sizes
(to ensure that the information is spread evenly among them).

**Added May 4, 2009:** Stephane Mallat brought up to us a reference that
we unfortunately omitted in the published version of our paper. It is a reference to his work with
Zhifeng Zhang in 1993-94, where they first introduced matching pursuits and specifically
the OMP. The published version of this work introduces and discusses matching pursuits (among
other things), while Zhifeng Zhang's thesis in 1993 introduced the OMP.

Here we prove two basic conjectures on the invertibility of random
matrices. One of them has been a prediction of Von Neumann and his
associates, who speculated that
*the condition number
of a random n by n matrix with i.i.d. entries should be linear in n*
(with high probability). The condition number is the ratio of the
largest to the smallest singlar values of the matrix, or equivalently
the product of the norms of the matrix and its inverse.

The largest singular value of random matrices is well understood -- it
is of order n^{1/2} with high probability, and it converges to a
Tracy-Widom distribution as n increases. The hard part is the smallest
singular value, conjectured to be of order n^{-1/2}. This was only
known for Gaussian matrices
(Smale's
problem solved by Edelman in 1988).
Rudelson and Tao and Vu
proved weaker estimates in some more general cases. The full conjecture is
proved in the present paper. Thus
*the norm of random matrix and its inverse are both of order n ^{1/2} with
high probability*. This confirms Von Neumann's prediction in full
generality.

The next question is: what is this high probability? For random
Bernoulli matrices (whose entries are 1 and -1 with probability 1/2),
Spielman and Teng conjectured at ICM 2002
the following asymptotics for the
smallest singular value normalized (i.e. divided) by its mean n^{-1/2}.
The probability that it is smaller than t should be linear in t for all
t > c^{n} (where 0 < c < 1 is some absolute constant). This asymptotics must break
down for smaller t because Bernoulli matrices are singular wiht nonzero
probability. The general Spielman-Teng's conjecture is proved in this
paper (for all subgaussian matrices).

One immediate consequence of the general Spielman-Teng's conjecture
(now a theorem) for t=0 is that
*every subgaussian matrix is singular with probability exponentially small in n*.
For Bernoulli matrices, this was proved by Kahn, Komlos and Szemeredi
in 1995. Even the fact that this singularity probability converges to
zero as n increases is nontrivial; Komlos proved this in 1967. Erdos
conjectured the singularity probability is (1/2 + o(1))^{n}. The best
known bound (3/4 + o(1))^{n} is due to
Tao and Vu.

Our random matrix results come as a consequence of our attempts to
develop the Littlewood-Offord theory of *small ball probability*.

The classical concentration inequalities of probability theory demonstrate that nice random variables, such as the sums of i.i.d.'s, are nicely concentrated about their means. Less studied but often needed is the reverse phenomenon -- that the concentration is not too tight. The Littlewood-Offord Problem asks, for i.i.d. random variables X_k and real numbers a_k, to put an upper bound on the probability P that the sum of a_k X_k lies near some number v.

Unlike the concentration inequalities, which are controlled only by the
magnitude of the coefficients a_k (e.g. the norms of the coefficient
vectors, like in Berstein's inequality), the anti-concentration
inequalities are less stable; they are controlled by the
*arithmetic structure* in the coefficients a_k.
Tao and Vu put forth a program
to show that if the concentration
probability P is large then there is a strong additive structure in the
coefficients a_1, ... , a_n: this sequence should be embedded in a
short arithmetic progression (or a generalization thereof). This says
that the only reason for a random sum to concentrate too much is due to
(essential) cancellations, which can only be caused by a rich additive
structure of the summands.

We give an optimal result for the Littlewood-Offord problem for
arbitrary coefficients a_k
of the same order of magnitude by showing that these
*coefficients essentially lie in an arithmetic progression of length 1/P*.

This paper is written so that Von Neumann's prediction can be read separately from the rest. The Littlewood-Offord theory is not needed there; it becomes crucial in the proof of the stronger Spielman-Teng's conjecture.

Here we study random submatrices of large matrices. The main problem
is: how can one *learn a matrix from a random sample of its entries*?

We show how to *approximately compute any matrix from its random submatrix*
of the smallest possible size O(r log r) with a small error in the spectral norm,
where r is the *numerical rank* of the matrix.
The numerical rank is the ratio of the Frobenius (a.k.a.
Hilbert-Schmidt) norm squared to the operator (a.k.a. spectral) norm
squared. The numerical rank is thus always bounded by the usual rank,
and unlike the usual rank it is stable under perturbations. The
stability is crucial for numerical applications, where the matrix is
blurred by noise (e.g. roundoff errors), so the matrix is of full rank
but can be of low numerical rank.

Our approximation result gives an asymptotically optimal guarantee for
an algorithm to *compute low-rank approximations* of A.
We also give asymptotically optimal estimates on the
*spectral norm and the cut-norm of random submatrices* of A.
The result for the cut-norm yields a slight
improvement on the best known sample complexity for an approximation
algorithm for MAX-2CSP problems.

We use methods of Probability in Banach spaces, in particular the law of large numbers for operator-valued random variables. This project started in the summer of 2002 and has undergone several major transformations since then.

Here we touch a computational aspect of the basic problem in asymptotic
convex geometry: *what matrices realize Euclidean projections of the cube*?
A sufficient condition for such matrices turns out to be some abstract
form of the *Uncertainty Principle* (UP).
This Uncertainty Principle was recently set
forth by Candes and Tao in the context of the fast
developing applied area of
Compressed Sensing.
For orthogonal matrices, UP says that all
submatrices have norm slightly smaller than 1. For matrices that
realize the (discrete) Fourier transform, this leads to variants of the
classical Uncertainty Principle in harmonic analysis. Gluskin
[unpublished] and Talagrand [Selection of proportion of
characters] observed that some form of UP implies Euclidean
sections of L_1 (thus Euclidean sections of the cube).

We use this idea to observe a new connection
between the Uncertainty Principle and the vector
quantization theory. We show that for frames in N dimensions that
satisfy the Uncertainty Principle, one can quickly convert every frame
representation into a more regular Kashin's representation,
whose coefficients all have the same magnitude N^{-1/2}.
Information tends to spread evenly among these
coefficients. As a consequence, Kashin's representations have a great
power for reduction of errors in their coefficients. In particular, scalar
quantization of Kashin's representations yields robust vector quantizers.

This project started in the Fall of 2003 and has undergone several major transformations since then.

The Kaczmarz method for solving linear systems of equations Ax=b is a simple iterative algorithm with many applications ranging from computer tomography to digital signal processing, but whose performance is poorly understood. The algorithm starts with an arbitrary guess x_0 of the solution, and iteratively projects the running approximation x_k onto the solution spaces of each equation, in the cyclic order. Despite the simplicity and popularity of this method, there are no useful estimates for its rate of convergence.

It has been observed several times in tomography literature that Kaczmarz algorithm should become faster if one projects onto solution spaces in a random, rather than cyclic, order. However, no estimates on the convergence of such randomized version have been known either.

In this paper, we prove that a randomized Kaczmarz algorithm converges with expected exponential rate. This is the first solver whose rate does not depend on the number of equations in the system. The solver does not even need to know the whole system, but only its small random part. It thus outperforms all previously known methods on extremely overdetermined systems. Even for moderately overdetermined systems, numerical simulations reveal that our algorithm can converge faster than the celebrated conjugate gradient algorithm.

Here we introduce a new effective phase-I in linear programming, and significantly improve the smoothed analysis of the simplex method.

There exist algorithms which perform nicely in practice, but can be very slow in (specifically designed) worst cases. The celebrated simplex method is one of them. In practice, the simplex method is observed to solve linear programs in time linear in the dimension, but there exist linear programs for which it takes exponential time (Klee-Minty counterexample).

To explain the discrepancy between the empirical evidence and theoretical analysis, Spielman and Teng introduced the concept of smoothed analysis of algorithms, where we measure the expected running time of an algorithm under slight random perturbations of arbitrary inputs. The idea is that the worst cases should be so few and isolated that one can rule them out by slightly perturbing the input.

Spielman and Teng proved that the (shadow-vertex) simplex method had polynomial smoothed complexity. On a slight random perturbation of arbitrary linear program, the simplex method finds the solution after a walk on the vertices of the feasible polytope with expected length polynomial in the number of constraints n, the number of variables d and the inverse standard deviation of the perturbation 1/sigma.

We show that the length of walk in the
*simplex method is actually polylogarithmic in the number of constraints n *.
Spielman-Teng's
bound on the walk was O^{*}(n^{86} d^{55} sigma^{-30}),
up to logarithmic factors. We improve this to O^{*}(d^{9} sigma^{-4}).
This shows that the
tight conjectural bound n-d on the length of walk on polytopes (Hirsch
Conjecture) is not a limitation for the smoothed Linear Programming.
Random perturbations create short paths between vertices.

One ingredient of independent interest that is developed here is
*a new phase-I for solving arbitrary linear programs*.
Instead of finding a vertex of a feasible set, we add
a vertex at random to the feasible set. This does not affect the
solution of the linear program with constant probability. So, in
expectation it takes a constant number of independent trials until a
correct solution is found. This overcomes one of the major difficulties
of smoothed analysis of the simplex method -- one can now statistically
decouple the walk from the smoothed linear program. This yields a much
better reduction of the smoothed complexity to a geometric quantity --
*the size of planar sections of random polytopes*.
We also improve upon the known estimates for that size.

Compressible signals are pervasive in signal processing applications.
The essential feature of a compressible signal is that it can be
approximated well by a sparse signal. It has recently been observed
that it is possible to collect the essential information from a
compressible signal by projecting it onto a low-dimensional random
subspace. This idea is called *sketching*
in computer science and *compressed sensing* in signal processing.

The *geometric functional analysis*
has studied non-algorithmic aspects of projections onto random
low-dimensional subspaces since 1980s. Many tools are now available to
check if such a projection is an almost isometry on a class of
functions (a set of vectors in R^{d}). None of these, however,
suggest *reconstruction algorithms*.
Suppose we know that a certain projection is an almost isometry on a
certain set in R^{d}.
*Is it possible to reconstruct points in the set from their projections
in polynomial time?*

Much progress on this problem was made in 2004 by
Candes-Tao,
Donoho
and others, including
Rudelson-Vershynin.
See our joint FOCS presentation
Candes-Tao-Rudelson-Vershynin;
also papers by
Muthukrishnan
and the Compressed Sensing Page.
The reconstruction problem has been solved via *linear programming*,
this beautiful idea going back to Donoho and his collaborators. Due to the development
of interior point methods in the 1980s, the complexity of the linear
program is polynomial in d. Howeverm this might be too slow in
applications, especially when d is very large
(in streaming algorithms,
d=2^{32} is not uncommon).

This paper develops the first "superfast" reconstruction algorithm whose complexity is polylogarithmic, rather than polynomial, in dimension d (with one specifically designed projection that works for all vectors).

This is a journal version of the conference paper below. We added an
(exponential) probability estimate for our result on random Fourier
measurements, and we included a more accurate calculation of the
constants for Gaussian measurements. Recently, Donoho and Tanner have
been able to compute *precise* asymptotic behavior for these constants.

Here we try to understand how to
*compute a sparse signal from few frequencies*, and we prove the best
known estimate on the number of frequencies so that one can compute f
using linear programming.

So we want to compute a signal f, viewed as a vector in R^n with small support r, from k = k(r,n) frequencies of f -- the values of its discrete Fourier transform. A simple linear algebra shows that k = 2r frequencies suffice. But is there a formula or an effective (polynomial time) algorithm to compute f from these frequencies?

Donoho and his
collaborators advocated that one should be able to "convexify" the
reconstruction problem -- relax it to a convex problem with the same
solution, which can then be solved using methods of linear programming.
When exactly the convex relaxation is equivalent to the original
problem is an open question.
Candes and Tao found a
sufficient condition for this equivalence -- the Restricted
Isometry Condition (R.I.C.). This reduces the reconstruction problem to
veryfying R.I.C. We prove the best known bound for the number of
measurements where R.I.C. holds:
k(r,n) = O(r log(n) log^{2}(r) log(r log n)).
The optimal bound should be O(r log n).

Our arguments are based on the techniques of Geometric Functional Analysis and Probability in Banach spaces, in particular on the development of Rudelson's sampling method for random vectors in the isotropic position.

This note deals with a problem of the "probabilistic Ramsey theory". Given a linear
operator T on a Hilbert space with an orthogonal basis, we define the
*isomorphic structure* Sigma(T)
as the family of all finite subsets of the basis so that T restricted to their
span is a nice isomorphism. We give a dimension-free optimal lower
bound on the size of Sigma(T). This improves and extends in several
ways the principle of restricted invertibility due to Bourgain and
Tzafriri. With an appropriate notion of randomness, we obtain a
randomized principle of restricted invertibility.

We develop an approach through geometric functional analysis to error correcting codes and to reconstruction of signals from few linear measurements. An error correcting code encodes an n-letter word x into an m-letter word y in such a way that x can be decoded correctly when any r letters of y are corrupted. We prove that most linear orthogonal transformations Q from R^n into R^m form efficient and robust robust error correcting codes over reals. The decoder (which corrects the corrupted components of y) is the metric projection onto the range of Q in the L_1 norm.

An equivalent problem arises in signal processing: how to reconstruct a signal that belongs to a small class from few linear measurements? We prove that for most sets of Gaussian measurements, all signals of small support can be exactly reconstructed by the L_1 norm minimization. This is a substantial improvement of recent results of Donoho and of Candes and Tao.

Yet one more equivalent problem in combinatorial geometry is the
existence of *neighborly polytopes*
-- polytopes with fixed number of facets and maximal number of
lower-dimensional facets. We prove that most sections of the cube form
such polytopes.

Large deviation estimates are by now a standard tool in the Asymptotic Convex Geometry, contrary to small deviation results. In this note we present a novel application of a small deviations inequality to a problem related to the diameters of random sections of high dimensional convex bodies. Our results imply an unexpected distinction between the lower and the upper inclusions in Dvoretzky Theorem.

In modern communication systems such as the Internet, random losses of information can be mitigated by oversampling the source. This is equivalent to expanding the source using overcomplete systems of vectors (frames), as opposed to the traditional basis expansions.

Dependencies among the coefficients in frame expansions often allow for better performance comparing to bases under random losses of coefficients. We show that for any n-dimensional frame, any source can be linearly reconstructed from only n log n randomly chosen frame coefficients, with a small error and with high probability. Thus every frame expansion withstands random losses better (for worst case sources) than the orthogonal basis expansion, for which the n log n bound is attained.

The proof reduces the problem to M.Rudelson's selection technique for random vectors in the isotropic position, which in turn is based on the non-commutative Khinchine's inequality.

Existence of nicely bounded sections of two symmetric convex bodies K and L implies
that the intersection of random rotations of K and L is nicely bounded.
For L = subspace, this main result immediately yields the following phenomenon:
*"If K has one nicely bounded section, then most sections of K are nicely bounded"*.
This "existence implies randomness" consequence was proved independently in
[Giannopoulos, Milman and Tsolomitis] and even before that, in
[Mankiewicz-Tomczak]. Our main result represents a new
connection between the local asymptotic convex geometry (study of
sections of convex bodies) and the global asymptotic convex geometry
(study of convex bodies as a whole). The method relies on the new
'isoperimetry of waists' on the sphere due to Gromov.

We find a sharp combinatorial bound for the metric entropy of sets
in R^{n} and general classes of functions.
This solves two basic combinatorial conjectures on the empirical
processes:

- A class of functions satisfies the uniform Central Limit Theorem if the square root of its combinatorial dimension is integrable.
- The uniform entropy is equivalent to the combinatorial dimension under minimal regularity.

Here a version of the classical Minkowski's theorem is proved for
integer cells rather than integer points. Namely,
*every convex set K in R ^{n} admits a coordinate projection that contains at
least vol(0.1 K) cells of the integer lattice*.

Our proof is based on an extension of the
*combinatorial density theorem*
of Sauer, Shelah and Vapnik-Chervonenkis from the Boolean cube {0,1}^{n}
to Z^{n}. This leads to a new approach to
*coordinate sections of convex bodies*.
In particular, the Volume Ratio Theorem and Milman's duality of the diameters
admit natural versions for coordinate sections.