Fractal Broccoli

Probability Digital Music Linear Algebra Sorting Lists Mathematics
Abstract Algebra Numerical Methods Optimization Parsing Quantum Signal Processing
Cellular Automata Information Theory Strings Graph Theory Physics Cryptography
Graphics Compression Geometry VLSI Data Structures Supervised Learning
Unsupervised Learning Reinforcement Learning Time Series Robotics Control Theory Computer Vision


Mersenne twister

Description: The default pseudo-random number generator for Python, Ruby, R, and MATLAB.

Blum Blum Shub

Description: Cryptographic pseudo-random number generator; not appropriate for Monte Carlo simulations.

Linear feedback shift register

Description: A simple pseudo-random number generator often implemented in hardware and used in cryptography. The code includes a Galois LFSR and a Fibonacci LFSR.

Linear congruential generator

Description: Another simple pseudo-random number generator used for the rand() function in glibc.
Code: lcg.c

Park-Miller random number generator

Description: Also known as the Lehmer random number generator. This is a variant of the linear congruential generator that operates in the multiplicative group of integers mod n. Unlike the linear congruential generator, it requires the seed to be coprime to the modulus n. MINSTD and RANF from the GNU Scientific Library are Lehmer RNGs as well as the famously terrible RANDU.


Description: RANDU is considered the worst random number generator ever conceived. It is a Park-Miller type random number generator that was developed by IBM in the 1960's.

Box-Muller transform

Description: Pseudo-random number generator for a pair of independent normal distributions (zero mean, unit variance). The algorithm requires a source of uniform random numbers.

Marsaglia polar method

Description: Pseudo-random number generator for a pair of independent normal distributions (zero mean, unit variance). The algorithm requires a source of uniform random numbers. Whereas the Box-Muller transform requires computing a square root, natural log, and cosine, the polar method only requires a square root and natural log. However, it requires rejection sampling to draw a uniformly distributed point on the unit circle.

Smirnov transform / Inverse transform sampling

Description: A general pseudo-random sampling method for an arbitrary distribution given the inverse CDF of that distribution. Note that the inverse CDF does not always exist analytically; consider, for example, the normal distribution. This is the main reason why this method isn't used universally, though the R language actually does use this method for sampling from a normal distribution using polynomial approximations. No relation to the alcoholic beverage.

Rejection sampling

Description: Rejection sampling is a simple but inefficient way to generate samples from any distribution. For the distribution you want to sample from, find a different distribution we know how to sample from (say, through the Smirnov transform) whose probabilities are greater than or equal to probabilities of the original distribution at each point. Sample a point x from this new distribution, then draw another sample from the uniform distribution (0,p(x)), where p(x) is the probability of the point at x in the proposal distribution. If the uniform sample is greater than the probability of x in the original distribution, reject the sample (hence the name), otherwise, consider it a sample from the original distribution.


Description: Metropolis-Hastings is a famous Markov chain Monte Carlo method for generating random samples from an arbitrary probability distribution or computing an integral for an expected value. Unlike normal Monte Carlo methods, which draws independent samples from a uniform distribution, the sequence of samples in Metropolis-Hastings forms a Markov chain, hence the name of the algorithm. In the context of drawing samples from an arbitrary distribution P(x), we can use any distribution P'(x) that is proportional to P(x). First, we choose a random starting point x. To sample our next point, we draw from a jumping distribution Q(x), which is usually just a Gaussian distribution centered around the latest sampled point. We compute an acceptance ratio, defined as P'(xnew)/P'(xold). If the acceptance ratio is greater than one, we accept the new point as our new sample. If the ratio is less than one, we accept the point with probability equal to the acceptance ratio. If we reject the new point, we reuse our old point as the new sample. We then continue to repeat this procedure to generate samples.
Code: metropolis.m

Monte Carlo integration

Description: Unlike integration methods that cut up the function into small rectangles or trapezoids and add up the area, Monte Carlo integration leverages the law of large numbers and integrates using random samples. In the general case where we are integrating a function f(x) from A to B, we can reinterpret the integral as an expected value with respect to a uniform distribution from A to B. The expectation would take the form of an integral over h(x)*p(x), where p(x) is the uniform distribution with bounds A and B and h(x)=f(x)*(B-A). By taking many uniform samples between A and B, evaluating each sample at h(x), and averaging all of the results, the computed average will converge to the true value of the integral.
Code: mc.m

Multivariate normal generator

Description: This method produces samples from a multivariate normal distribution with arbitrary mean and covariance. I give a small proof in the comments that multiplying a sample from a zero mean, unit variance normal distribution by the triangular matrix of the new distribution covariance's Cholesky decomposition and adding the mean of the new distribution results in a sample from the new distribution.
Code: multivariate_normal.m

Multivariate normal conditional generator

Description: This method uses the property that conditional distributions of a multivariate normal distribution are also normally distributed.
Code: multivariate_normal_conditional.m

Gibbs sampling

Collapsed Gibbs sampling

Blocked Gibbs sampling

Baum-Welch algorithm

Expectation-maximization algorithm

Importance sampling

Berlekamp-Massey algorithm

Ziggurat algorithm

Viterbi algorithm


Forward-backward algorithm

Ising model


Umbrella sampling

Gillespie algorithm

Affine invariant MCMC ensemble sampler

Digital Music


Description: Used to make Nintendo-like sounds and the "yoi yoi" effect in dubstep music. Bitcrushing refers to reducing the bit resolution in the samples and reducing the sampling rate from 44100 Hz to as low as 4096 or 8192 Hz. Like overdrive below, this effect generates square waves that fill the spectrum with harmonic frequencies.
Code: Bitcrush.m


Description: This the characterstic over-amplified guitar sound from bands like Led Zeppelin and Jimi Hendrix. Once you start amplifying the signal too much, the highest peaks will begin hitting their maximum limit. The tops of sine waves will be flattened to look like square waves, and since square waves are composed of the original frequency plus odd-integer harmonic frequencies, this results in a full and rich spectrum of new frequencies.
Code: Overdrive.m

Karplus-Strong string synthesis






Ping-pong delay

Auto pan

Vinyl distortion

Linear Algebra

Cholesky decomposition

Description: Matrix decomposition for symmetric, positive-definite matrices. It's useful for solving systems of linear equations and transforming a multivariate normal distribution to a new mean and covariance.

Von Mises iteration

Description: An iterative method for finding the eigenvector of a matrix with the largest eigenvalue. Also known as the power method. Used in Google's PageRank algorithm to find the most important websites according to hyperlinks.

Freivald's algorithm

Description: Verifies matrix multiplication in a randomized way. If A*B=C is correct, the algorithm always returns true. If A*B!=C, the algorithm returns true half of the time. The algorithm can be called multiple times to reduce the error probability to .5^k (if called k times).
Time: O(n^2)

Woodbury matrix identity

Description: This is a trick commonly used in Kalman filters for computing a matrix inverse. In this case, you have already computed the inverse of a matrix A, but now you need to compute the inverse of A plus another matrix BCD. With this identity, you can reuse the inverse of A and calculate inverses on matrices of matrix C's size. When the dimension of C has a lower dimension than A, this method can be more efficient than computing A + BCD and inverting the new matrix.
Code: woodbury.m

Arnoldi iteration

Lanczos iteration

Gaussian elimination


QR decomposition

Singular value decomposition (SVD)

Jacobi method

Strassen algorithm

Coppersmith-Winograd algorithm

Gauss-Seidel method

Minimum degree algorithm



Description: Quicksort is a divide-and-conquer comparison sort algorithm.
Time: Worst: O(n^2), Average O(n log n)
Space: Worst: O(n), O(log n)


Description: Mergesort is a divide-and-conquer comparison sort algorithm.
Time: Worst: O(n log n)
Space: Worst: O(n)

Bubble sort

Description: Bubble sort is a comparison sort algorithm. It's typically used as a motivating example for the O(n log n) selection sorts, although its behavior is acceptable on small lists.
Time: Average: O(n^2)
Space: Worst: O(1)

In-place quicksort

Description: The original quicksort algorithm builds the sorted list in a new memory area, separate from the original input list. This requires O(n) space, where n is the number of elements in the list. In this variation, quicksort sorts the original input list instead of building a new list.
Space: O(log^2 n), so it's not really an in-place algorithm. It's O(log^2 n) because the call stack takes up O(log n) space, and each function call contains a pointer to the list of size O(log n), so log n * log n = log^2 n.

Counting sort

Description: In contrast the the O(n log n) comparison sorts, this is a linear integer sorting algorithm. Unfortunately, it is also linear in the range of elements, so this algorithm only works well for sorting small integers. For larger integers, radix sort is more efficient. Counting sort is a stable sorting algorithm.
Time: O(n + m), where n is the number of list elements and m is the range of the elements
Code: counting_sort.c

Radix sort

Description: This is another linear integer sorting algorithm that uses counting sort as a subroutine at each digit. Surprisingly, radix sort sorts in order from least significant digit to most significant digit.
Time: O(kn), where n is the number of list elements and k is the number of digits in the largest element
Code: radix_sort.c





In-place mergesort

Pancake sort


Binary search

Description: Searches for a given value in a sorted list.

Kadane maximum subarray

Fibonacci search

Median of medians



Sieve of Eratosthenes

Description: Sieve of Eratosthenes is an ancient and efficient way to discover all the prime numbers less than some number N. My favorite fact about this algorithm is its computational complexity of O(n log log n), which is a consequence of Euler's lower bound on the prime harmonic series.

Sieve of Atkin

Description: Sieve of Atkin is the more modern and efficient version of the Sieve of Eratosthenes - it's able to discover all prime numbers less than some number N in O(n / log log n) time.

Quake III fast inverse square root

Description: This is the famous magic fast inverse square root algorithm used to compute angle of incidence and reflection in Quake III Arena. Computing angle of incidence requires normalized vectors, and normalized vectors are computed by dividing the vector by its length, the square root of the sum of squares of the vector components. The code is the same as the original, but I've added some comments explaining the magic number step and the Newton iteration.
Code: quake_inv_sqrt.c

Gauss-Legendre algorithm

Description: Pi calculator. Each iteration doubles the number of correct digits.

Borwein's pi algorithm

Description: Pi calculator. Each iteration quadruples the number of correct digits, but the update step is more complicated than Gauss-Legendre.

Collatz conjecture

Description: Take any natural number n. If n is even, divide it by 2 (n/2). If n is odd, multiply it by 3 and add 1 (3n+1). If you repeat this process forever, Collatz conjectured that every natural number will always eventually reach 1. Despite how simple this conjecture sounds, it has never been proven to be true. Conway proved that a general version of this conjecture is undecidable, so it can never be proven nor disproven. The function below returns the sequence of numbers the input number took to reach 1.

Bakhshali method

Description: Ancient method for calculating the square root of a number

Babylonian method

Description: Ancient method for calculating the square root of a number

Discrete Legendre transformation

Risch algorithm

Goldschmidt iteration

General number field sieve

Special number field sieve

Geometric algebra

Borwein's Dirichlet eta function method

Gauss-Kronrod quadrature integration

Vedic duplex method

Newton-Raphson divison

SRT division

Pell/Brahmagupta equation


Karatsuba algorithm

Toom-Cook multiplication

Euler factorization

Abstract Algebra

Todd-Coxeter algorithm

Chien search

Numerical Methods

Runge-Kutta method (RK4)

Description: Popular iterative method for approximating solutions to an ODE.

Galerkin projection/Reduced order modeling (ROM)

Multigrid method

Verlet integration

Euler integration

Finite difference method

Finite element method


Newton's method

Description: Iteratively approximates the roots of a function (f(x)=0) using the derivative of the function and the function itself. Many of the other methods in this section are elaborations on this basic method.

Simplex method

Interior point method

Sequential minimum optimization

Branch and bound





Simulated annealing

Genetic algorithm

Simplex method

Ant colony optimization


Packrat parser

LL parser

LR parser

LALR parser

Recursive descent parser

CYK algorithm

Earley parser


Quantum teleportation

Shor's algorithm

Grover's algorithm

Simon's algorithm

Signal Processing

Hann window

Description: Popular windowing function with low aliasing but slightly decreased resolution from a wider main lobe.

Hamming window

Description: Popular windowing function for minimizing the worst-case side lobe error.

Blackman window

Description: Popular windowing function with 18dB side lobe roll off.

Blackman-Harris window

Description: Similar to the Blackman window with one extra term. Side lobe levels are lower than the Blackman window.

Constant Q transform

Remez algorithm

Discrete Fourier transform (DFT)

Fast Fourier transform (FFT)

IIR filter

FIR filter

Compressed sensing

Fractional discrete Fourier transform

Linear prediction


Perceptual hash


Slepian window

Fast wavelet transform

Cellular Automata


Rule 30

Rule 110

Information Theory

Reed-Solomon code

Raptor code

LT code

Online code


Fisher information metric

Kullback-Leibler divergence


In-place reversal

Description: Reverses a string in place.
Time: O(n)
Code: reverse_inplace.c

Levenshtein distance

Description: Calculates the difference between two strings in terms of inserting, switching, or deleting letters. For example, the Levenshtein distance between 'human' and 'tan' is 3 because you need to (1) delete 'h', (2) delete 'u', and (3) switch 'm' to 't' to change 'human' to 'tan'.



Burrows-Wheeler transform


Graph Theory

Union Find

Description: Also known as the disjoint set, this an efficient data structure that keeps track of the disjoint subsets of a set. The data structure has two main functions: union, which combines two subsets together, and find, which reports the subset that an element belongs to.

Kruskal's algorithm

Description: Kruskal's is a greedy algorithm for finding the minimum spanning tree in a weighted graph. It's one of my favorite algorithms because it uses both the union-find algorithm and radix sort (assuming integer weights in the graph).

Path addition planarity testing

Vertex addition planartiy testing

Edge addition planarity testing


Dijkstra's algorithm

Planar Dijkstra

Planar max flow

Floyd-Warshall algorithm

Prim's algorithm

Christofides algorithm

Greedy traveling salesman

Boruvka's algorithm

Breath-first search

Depth-firsth search

In-order traversal

Ford-Fulkerson algorithm

Edmonds-Karp algorithm

Push-relabel algorithm

FKT algorithm


Fast multipole method

Barnes-Hut simulation


Lucas-Lehmer primality test

Description: Given a Mersenne number, this algorithm decides if it is also a Mersenne prime.







Full homomorphic encryption

Elliptical curve cryptography

AKS primality test

Miller-Rabin randomized primality test

Euler totient function


Phong shading

Gouraud shading



Lempel-Ziv (LZ77)

Huffman coding


Convex hull

Delauney triangulation

Voronoi diagram


Iterative improvement

Simulated annealing

Quadratic placement

Recursive partitioning (PROUD)

Boolean constraint propagation (DPLL)




Shannon expansion

Boolean difference

Unate recursive complement

Binary decision diagrams


Data Structures

van Emde Boas tree

y-fast tree

Bloom filter

Link-cut tree

Red-black tree

Cuckoo hashing

Splay tree

Tango tree

Suffix tree


Circular buffer

Linked list





Adjacency list

Adjacency matrix


Skip list

Judy array

Hash table


Disjoint Set

Fibonacci heap

BSP tree

Finger tree

Hash tree

Inverted index

Partial persistence BST

Full persistence BST

Partial retroactive BST

Full retroactive BST

Supervised Learning

Gaussian process regression

Description: Also known as kriging, GPR approximates a function from a set of independent (X) and dependent (Y) variables. When a new X is given, its corresponding Y is computed as a weighted average of the other Y's. Those weights depend on the distance of the other X's to the new X, so that closer X's to the new X will have a stronger affect on its Y value than X's which are farther away.
Code: gpr.m

k-nearest neighbors

Description: K-nearest neighbors assigns a label to an input vector based on the labels of its closest neighbors. The 'K' refers to the K closest training vectors to the input vector, and the majority class determines the input vector class. K is typically an odd integer like 1, 3, and 5 in order to avoid ties during the voting process for binary classification problems. Increasing K leads to better noise immunity but less class boundary granularity.
Code: knn.m

Ordinary least squares

Description: Ordinary least squares models the target variables as an affine function of the predictor variables with independent and normally distributed errors. The algorithm outputs a line equation that minimizes the sum of the squared residuals between the line and the target variables.
Code: least_squares.m

Weighted least squares

Description: This model is identical to ordinary least squares with the exception that the target variables can be modeled with different variances in their normally distributed error terms. This manifests itself in the routine as an information matrix W that is the inverse of the covariance matrix and multiplies the data matrix.
Code: weighted_least_squares.m

Total least squares

Description: Total least squares models errors in both the dependent and independent variables. The resulting line minimizes the squared projection error of each point to the line.
Code: total_least_squares.m

Linear discriminant analysis

Naive Bayes

Locally linear regression

Logistic regression

Decision tree

Linear support vector machine

Kernel support vector machine

One-class support vector machine

Kernel least squares

Kernel logistic regression

Bayes network

Ridge regression

Whitening transform

Feed forward neural network

Restricted Boltzmann machine

Deep Boltzmann machine

Deep belief network

Linear decoder

Explicit Gaussian kernel

Least squares SVM

SVM with squared hinge loss

Convolutional neural network

Unsupervised Learning

Kernel k-means

Description: This is the kernelized version of the k-means algorithm. The algorithm avoids explicitly computing the cluster means in the higher dimensional space by embedding the step within the cluster assignment step and kernelizing it. Unfortunately, this algorithm easily gets stuck in local minima, so researchers typically initialize the cluster assignments using the output of spectral clustering.
Code: kernel_kmeans.m

Kernel density estimation

Description: KDE is a nonparametric method for estimating the data's probability density function. The intuition behind this method is that the density function is going to be higher in more densely clustered areas and lower in less dense areas, so the density estimate at any given point is the average of a function of the distances from that point to all of the other data points.
Code: kde.m


Description: Iterative algorithm for independent component analysis that maximizes the non-Gaussianity of the unmixed signal as a substitute for statistical independence.
Code: fastICA.m

Non-negative matrix factorization (Euclidean metric)

Description: Iterative algorithm for factorizing a non-negative matrix V into two non-negative matrices W and H such that V = WH. This method uses two simple multiplicative update rules that were proposed by Lee and Seung back in 2001, but it doesn't always produce a sparse solution. Their paper also offers a gradient-based update rule, but those require tuning a step-size parameter.
Code: nmf.m

Non-negative matrix factorization (KL divergence)

Description: This method is identical in principle to the NMF method for minimizing the Euclidean metric. The multiplicative update rule has been modified to minimize the KL divergence between V and WH.
Code: nmf_kl.m

Non-negative sparse coding

Robust kernel density estimation

Blind source separation

Kernel blind source separation

Expectation-Maximization (EM) algorithm

k-means clustering


Spectral clustering


Principal component analysis (PCA)

Kernel PCA

Independent component analysis

Maximum variance unfolding

Latent Dirichlet allocation

Hierarchical latent Dirichlet allocation

Self-organizing map

Sparse autoencoder

Indian buffet process

Reinforcement Learning


Temporal difference learning

Actor-critic model


Time Series

Viterbi algorithm

Hopfield network

Long short term memory



Weighted least squares SLAM

Max-mixtures method


Correlative scan matching

Particle filter

Joint compatibility branch and bound (JCBB)

Spectral cluster graph partitioning (SCGP)

Kinematic Jacobian


Control Theory

Popov-Belevitch-Hautus test (PBH)

Description: This is another way to test if a system is controllable (i.e. there is a way to move the system from any initial state to any final state). The test states that the system is controllable if and only if no left eigenvectors of A are orthogonal to B.
Code: pbh_test.m

Left eigenvectors

Description: This routine finds the left eigenvectors of a matrix and is used as a subroutine in the PBH test. It assumes that the matrix has a complete set of distinct right eigenvectors.
Code: left_eig.m

Controller canonical form

Code: controller_canonical.m

Full state feedback

Code: fsf.m

Observer canonical form

Code: observer_canonical.m

Linear quadratic Gaussian

Code: lqg.m

Luenberger observer

Code: luenberger_observer.m

Kalman decomposition

Code: kalman_decomposition.m

Controllability matrix

Code: controllability.m

Observability matrix

Code: observability.m

Jordan form

Code: jordan_form.m

Modal decomposition

Code: modal_decomposition.m

Adjoint system

Code: adjoint_system.m

Bass-Gura formula

Code: bass_gura.m

Ackermann's formula

Code: ackermann.m

Gilbert realization

Code: gilbert_realization.m

Lyapunov stability (linear systems)

Code: lyapunov_lti.m

Lyapunov equation solver

Riccati equation solver

Markov parameter system identification

Feedback linearization

Kalman filter

Extended Kalman filter

Unscented Kalman filter

Vector space intersection

Linear quadratic regulator

Proportional-integral-derivative controller (PID)

Model predictive control

Computer Vision

Direct Linear Transformation (DLT)




Trifocal tensor

Difference of Gaussians


Hough transform

Iterative closest point (ICP)

Lucas-Kanade optical flow

Horn-Schunck optical flow

Phase correlation

Poisson surface reconstruction

Marching cubes

Viola-Jones object detection

Euler video magnification

Parallel tracking and mapping (PTAM)

Loopy belief propagation

Camera calibration

Object-Oriented Design Patterns


Description: This pattern allows a class to select different behaviors or algorithms at runtime.
Code: strategy.cpp


Description: This pattern restricts the instantation of a class to one object.



Abstract factory

Object pool

Dependency injection








Rate monotonic

O(1) scheduler

O(n) scheduler

Completely fair scheduler

Earliest deadline first


Universal hashing

K-wise indepedent hashing

Simple tabulation hashing


Cuckoo hashing

Dynamic perfect hashing

Linear probing

Game Theory

Iterative removal

Mixed strategy Nash equilibrium

Dominant strategy







Online Algorithms

Online variance

Description: The typical formula for variance, var(X) = E[X^2] - E[X]^2, requires you to average over all the elements of X at once, an O(N) step. This algorithm lets you update the variance in an O(1) step with each new element added and in O(1) space, so the elements do not need to be kept around to compute new variances.
Code: online_stat.cpp

Online average

Description: This algorithm updates the average of a set of points in O(1) time and O(1) space with each new element added.
Code: online_stat.cpp

Canadian traveller problem

Image Processing

Discrete wavelet transform

Multiresolution analysis

Active contour model

Histogram equalization

GrowCut algorithm

Felzenszwalb-Huttenlocher segmentation

Floyd-Steinberg dithering

Canny edge detector

Seam carving

RGB-HSV conversion



White balancing

Tone mapping

High dynamic range (HDR)