Lecture 1: Overview
In this lecture, I give an introduction to Mazur's theorem, including a sketch of the proof. The purpose of this course is to develop these ideas in more detail. I also talk a little about some related results: Serre's uniformity theorem and Merel's uniform boundedness theorem.
Mazur's theorem
Consider the following problem:
Problem. Let $f \in \bQ[x,y]$ be a polynomial. Describe the set of points $(x,y) \in \bQ^2$ such that $f(x,y)=0$.
This can be phrased equivalently as:
Problem. Let $C/\bQ$ be an algebraic curve (connected, smooth, projective). Describe the set $C(\bQ)$.
This is an extremely fundamental problem, and cases of it have been considered for thousands of years. Much about this problem has been discovered in the last century. The first thing to mention, probably, is the fundamental trichotomy depending on the genus of $C$:

Genus 0. There are two possibilities: either $C$ has no rational points, or $C$ is isomorphic to $\bP^1$, in which case it has infinitely many rational points, and these points form a 1parameter algebraic family.

Genus 1. If $C$ is nonempty then $C(\bQ)$ has the structure of a finitely generated commutative group. The hard part of this statement (the finite generation) is Mordell's theorem, from 1922. (First conjectured by Poincaré.)

Genus $\ge 2$. the set $C(\bQ)$ is finite. This is Faltings' theorem (MR0718935), proved in 1983 and first conjectured by Mordell in 1922.
Given these results, one can start to ask more quantitative questions. For example:
Question. How many rational points can a genus 2 curve have?
It is conjectured that there is an absolute bound, i.e., there exists a number $N$ such that if $C/\bQ$ is any genus 2 curve then $\# C(\bQ) \le N$. This has not been proved, however. So far, the record for number of rational points seems to be 642, found by Michael Stoll in 2008. It is worth mentioning here a recent result of Manjul Bhargava (arXiv:1308.0395): most genus 2 curves have no rational points. The situation in higher genus is similar.
In genus 1, one should not simply ask "how many points" but "what is the structure of the group of points." Suppose $C$ is a genus 1 curve with a point. Then, according to Mordell's theorem, we have a decomposition $C(\bQ)=C(\bQ)_{\mathrm{tors}} \times \bZ^r$, where $C(\bQ)_{\mathrm{tors}}$ is a finite abelian group (the torsion subgroup) and $r \ge 0$ is an integer, called the rank. So, one would like to know the possibilities for $r$ and $C(\bQ)_{\mathrm{tors}}$.
Very little is known about the rank. For instance, it is unknown if it can be arbitrarily large. The current record is $r \ge 28$, found by Noam Elkies in 2006.
The situation is much better for the torsion subgroup; in fact, this is exactly what Mazur's theorem describes:
Theorem (Mazur, 1977, MR488287). $C(\bQ)_{\mathrm{tors}}$ is isomorphic to one of the following 15 groups:

$\bZ/n \bZ$ with $1 \le n \le 10$ or $n=12$.

$\bZ/2 \bZ \times \bZ/n \bZ$ with $n=2,4,6,8$.
Furthermore, each of these groups does occur.
One can view this theorem as a first step to a quantitative answer to the original problem.
Overview of the proof
The goal of this course is to prove Mazur's theorem. I'll now give an overview of the proof, which I'll break into three steps. I must thank Jacob Tsimerman here, as he and I came to this organization while reading Mazur's paper together.
Step 1. A criterion for the nonexistence of $N$torsion.
The hard part of the proof is to show that an elliptic curve over $\bQ$ cannot have an $N$torsion point, for $N$ a prime $\gt 7$. Let $Y_1(N)$ be the set of isomorphism classes of pairs $(E, P)$ where $E/\mathbf{C}$ is an elliptic curve and $P \in E$ is a point of exact order $N$. Then $Y_1(N)$ is actually (the set of complex points of) an algebraic curve defined over the rational numbers. Furthermore, the set of rational points on $Y_1(N)$ is exactly what you'd expect: it consists of those $(E, P)$ for which $E$ is defined over $\bQ$ and $P \in E(\bQ)$. Thus Mazur's theorem essential amounts to showing that $Y_1(N)(\bQ)$ is empty for $N \gt 7$. The proof will appeal to the dual nature of $Y_1(N)$: it can be thought of as a single geometric object, or as a set of geometric objects.
We'll need some slight variants of $Y_1(N)$. Let $Y_0(N)$ be the set of pairs $(E, G)$ where $E/\mathbf{C}$ is an elliptic curve and $G \subset E$ is a cyclic subgroup of order $N$. This is also an algebraic curve over $\bQ$. There is a natural map $Y_1(N) \to Y_0(N)$ (take the subgroup generated by the point). The curve $Y_0(N)$ is affine: it's missing two points, which are labeled 0 and $\infty$, and called the cusps. The compactified curve is denoted $X_0(N)$. We can now state the criterion:
Theorem (Theorem A). Suppose $N \gt 7$ and there exists an abelian variety $A/\bQ$ and a map of varieties $f \colon X_0(N) \to A$ (defined over $\bQ$) such that the following conditions hold:

$A$ has good reduction away from $N$.

$f(0) \ne f(\infty)$.

$A(\bQ)$ has rank 0.
Then no elliptic curve defined over $\bQ$ has a point of order $N$.
Proof. We just offer a sketch here. Suppose $E/\bQ$ is an elliptic curve which has a point of order $N$. Let $x \in X_0(N)(\bQ)$ be the resulting rational point. We first remark that $X_0(N)$ naturally extends to a scheme over $\bZ[1/N]$ (or even all of $\bZ$) and $x$ extends to a section over this base as well. By studying the reduction of $E$ mod 3, and using the fact that 3 is small compared to $N$, one finds that $E$ must have bad reduction at 3. This means that $x$ must reduce to either 0 or $\infty$ mod 3, and in fact it must be $\infty$. To see this, one must be familiar with the modular interpretations of the two cusps, which we will cover later in the course.
Now for the key step: the difference $f(x)f(\infty)$ is an element of $A(\bZ[1/N])$ which reduces to 0 in $A(\bF{F}_3)$. However, $f(x)f(\infty)$ is a torsion point (since $A$ has rank 0), and the reduction map is injective on torsion. We conclude that $f(x)=f(\infty)$. It follows from this, and the assumption that $f(\infty) \ne f(0)$, that if $p$ is any prime of bad reduction for $E$ then $x$ reduces to $\infty$ mod $p$.
Now, consider $E[N]$ as a 2dimensional representation of the absolute Galois group $G_{\bQ}$ over the finite field $\bZ/N\bZ$. Since $E$ has an $N$torsion point, this representation contains the trivial representation $\bZ/N\bZ$ as a sub. The Weil pairing implies that the quotient is $\mu_N$. The modular interpretation of the cusp $\infty$ shows that the resulting extension is actually split at all the bad primes. A numbertheoretic argument then shows that the extension is split globally, i.e., $E[N]$ is isomorphic to $\bZ/N\bZ \oplus \mu_N$. One can apply the same argument to $E/\mu_N$ to see that its $N$torsion is split; continuing in this way, one finds that the $N$adic Tate module of $E$ is reducible, which cannot happen. This contradiction completes the proof. ◾
Step 2. A criterion for rank 0.
To apply Theorem A, we must find the abelian variety $A$ and verify the conditions of the theorem. The hardest of these is the rank 0 condition. We now give a criterion for an abelian variety to have rank 0. This may look like a general criterion, but the hypotheses are actually very restrictive; it will apply to the case of interest, however.
Theorem (Theorem B). Let $A/\bQ$ be an abelian variety and let $N$ and $p$ be distinct prime numbers, with $N$ odd. Suppose the following conditions hold:

$A$ has good reduction away from $N$.

$A$ has completely toric reduction at $N$.

The JordanHolder constituents of $A[p](\ol{\bQ})$ are 1dimensional, and either trivial or cyclotomic.
Then $A(\bQ)$ has rank 0.
Proof. Again, just a sketch. Let $\cA/\bZ$ be the Néron model of $A$. One first shows that the group scheme $\cA[p^n]$ is built of very simple pieces: it has a filtration such that the successive quotients are each one of four very specific group schemes. Computing explicitly with these specific group schemes, one shows that the order of $\rH^1_{\rm fppf}(\mathrm{Spec}(\bZ), \cA[p^n])$ is bounded independent of $n$. This implies that the inverse limit over $n$ of these cohomology groups is finite, which completes the proof, as $A(\bQ)$ injects into the inverse limit. This proof is closely related to the proof of the MordellWeil theorem, which I'll talk about some. ◾
Step 3. Completion of the proof.
We now wish to prove Mazur's theorem by applying the above criteria. But first we must find the abelian variety $A$. Every curve has a Jacobian, a universal abelian variety to which it maps (given a point). Thus we are more or less forced to try to find $A$ as a quotient of the Jacobian $J_0(N)$ of $X_0(N)$.
Using the modular interpretation of $X_0(N)$, one constructs certain Hecke operators $T_p$ on $J_0(N)$. These generate a commutative ring of operators, called the Hecke algebra. We'll find $A$ by defining an explicit ideal in the Hecke algebra (closely related to the Eisenstein ideal appearing in the title of Mazur's paper), and forming the corresponding quotient of $J_0(N)$. We'll then go through each of the hypotheses in the two criteria and verify that $A$ satisfies them.
In fact, this argument will only end up working for $N \gt 13$, so auxiliary arguments are needed for $N=11,13$. For $N=11$, the result was first established in 1939 by BillingMahler. For $N=13$, it was established by MazurTate in 1973.
Plan of the course
The course will be divided into three parts, corresponding to the three steps above (though out of order!).
Part I. Elliptic curves and abelian varieties

Theory over fields. I will give very few proofs here. I'll assume you're either familiar with this, or can do outside reading to learn it.

Group schemes. I won't assume you know much at all here, and I'll attempt to prove nearly everything we'll need.

Theory in mixed characteristic, including Néron models (though not the proof of their existence).

Jacobians.

The culmination of Part I will be the proof of Theorem B.
Part II. Moduli of elliptic curves

Modular curves, over $\mathbf{C}$, $\bQ$, and $\bZ$.

Modular forms and Hecke operators.

The EichlerShimura theorem, and the Galois representation attached to a modular form.

The culmination of Part II will be the proof of Theorem A.
Part III. Proof of Mazur's theorem

The Eisenstein ideal and the Eisenstein quotient of $J_0(N)$.

The special fiber at $N$ of $J_0(N)$.

Ogg's theorem on the order of $[0][\infty]$ in $J_0(N)$.

Application of Theorems A and B.

Auxiliary results (MazurTate, etc).
Related results
To end this lecture, I'll discuss two families of results that generalize Mazur's theorem. Unfortunately, we probably won't have time in this course to discuss these results further.
Merel's theorem
You might be wondering: does a version of Mazur's theorem exist over general number fields? The answer is yes! For an integer $d \ge 1$, let $S(d)$ be the set of prime numbers $p$ for which there exists a number field $K/\bQ$ of degree $\le d$ and an elliptic curve $E/K$ such that $E$ has a $K$point of order $p$. Mazur's theorem is that $S(1)=\{2,3,5,7\}$. Kamienny proved (in 1992, MR1172689) that $S(2)=\{2,3,5,7,11,13\}$. Mazur and Kamienny then conjectured that $S(d)$ is always finite (the Uniform Boundedness Conjecture, or UBC), which was proven by Merel:
Theorem (Merel, 1996, MR1369424). The set $S(d)$ is finite. In fact, if $p \in S(d)$ then $p \le d^{3d^2}$.
In 2003, Parent computed $S(3)$ and found it to be the same as $S(2)$ (MR2142238, see also MR1779891). The sets $S(4)$, $S(5)$, and $S(6)$ have also been computed, as I learned from Maarten's comment below. See his slides for more information (his website has other relevant notes and slides).
The review of Merel's paper by Darmon, linked above, contains a thorough overview of the UBC.
Serre's uniformity theorem
Let $E/\bQ$ be an elliptic curve, and let $N$ be a prime. Then $E[N](\overline{\bQ})$ is isomorphic to $(\bZ/N \bZ)^2$, and carries an action of the absolute Galois group $G_{\bQ}$. We can therefore regard it as a representation $\rho_{E,N} \colon G_{\bQ} \to \mathrm{GL}_2(\bZ/N \bZ)$. Serre proved the following:
Theorem (Serre, 1972, MR387283). Assume $E$ does not have complex multiplication. Then there exists a number $N_0(E)$ such that $\rho_{E,N}$ is surjective for all $N \gt N_0(E)$.
Serre posed the following question:
Question (Serre's uniformity problem). Can $N_0(E)$ be taken independent of $E$? Precisely, does there exist a number $N_0$ such that $\rho_{E,N}$ is surjective whenever $E$ is nonCM and $N \gt N_0$?
It is thought that the answer to this question is yes, and that one can even take $N_0=37$.
If $\rho_{E,N}$ is not surjective, then its image is a proper subgroup of $\mathrm{GL}_2(\bZ/N\bZ)$, and thus contained in a maximal proper subgroup. One can therefore attack Serre's question by proving, for each maximal proper subgroup $G$ of $\mathrm{GL}_2(\bZ/N\bZ)$, that the image of $\rho_{E,N}$ is not contained in $G$ (for $N$ large enough). It's not difficult to enumerate the maximal proper subgroups:

The Borel subgroup (uppertriangular matrices).

The normalizer of the split Cartan (monomial matrices).

The normalizer of the nonsplit Cartan.

Exceptional subgroups (those having projective image $A_4$, $S_4$, or $A_5$).
Serre himself dealt with the exceptional case: the image of $\rho_{E,N}$ is not contained in an exceptional subgroup if $N \gt 7$ check this!. (In this and what follows, $E$ is nonCM.)
Mazur's theorem that we have been discussing above is close to handling the Borel, but doesn't quite: it shows that the image of $\rho_{E,N}$ is not contained in the group of matrices of the form $\left( \begin{array}{cc} 1 & \ast \\ 0 & \ast \end{array} \right)$ for $N \gt 7$. However, Mazur extended his results (MR482230) and handled the Borel case: he showed that the image of $\rho_{E,N}$ is not contained in a Borel for $N \gt 37$.
In 2009, Bilu and Parent MR2753610 handled the split Cartan case: the image of $\rho_{E,N}$ is not contained in the normalizer of the split Cartan for $N \gt N_0$, for some constant $N_0$. I took a very brief look at their paper and wasn't able to see if $N_0$ can be made explicit or not.
The nonsplit Cartan case is still open!