 Probability Digital Music Linear Algebra Sorting Lists Mathematics Abstract Algebra Numerical Methods Optimization Parsing Quantum Signal Processing Cellular Automata Information Theory Strings Graph Theory Physics Cryptography Graphics Compression Geometry VLSI Data Structures Supervised Learning Unsupervised Learning Reinforcement Learning Time Series Robotics Control Theory Computer Vision

## Probability

#### Mersenne twister

Description: The default pseudo-random number generator for Python, Ruby, R, and MATLAB.
Code: mersenne_twister.py

#### Blum Blum Shub

Description: Cryptographic pseudo-random number generator; not appropriate for Monte Carlo simulations.
Code: blum_blum_shub.py

#### Linear feedback shift register

Description: A simple pseudo-random number generator often implemented in hardware and used in cryptography. The code includes a Galois LFSR and a Fibonacci LFSR.
Code: lfsr.py

#### Linear congruential generator

Description: Another simple pseudo-random number generator used for the rand() function in glibc.
Code: lcg.c

#### Park-Miller random number generator

Description: Also known as the Lehmer random number generator. This is a variant of the linear congruential generator that operates in the multiplicative group of integers mod n. Unlike the linear congruential generator, it requires the seed to be coprime to the modulus n. MINSTD and RANF from the GNU Scientific Library are Lehmer RNGs as well as the famously terrible RANDU.
Code: lehmer.py

#### RANDU

Description: RANDU is considered the worst random number generator ever conceived. It is a Park-Miller type random number generator that was developed by IBM in the 1960's.
Code: randu.py

#### Box-Muller transform

Description: Pseudo-random number generator for a pair of independent normal distributions (zero mean, unit variance). The algorithm requires a source of uniform random numbers.
Code: box_muller.py

#### Marsaglia polar method

Description: Pseudo-random number generator for a pair of independent normal distributions (zero mean, unit variance). The algorithm requires a source of uniform random numbers. Whereas the Box-Muller transform requires computing a square root, natural log, and cosine, the polar method only requires a square root and natural log. However, it requires rejection sampling to draw a uniformly distributed point on the unit circle.
Code: marsaglia.py

#### Smirnov transform / Inverse transform sampling

Description: A general pseudo-random sampling method for an arbitrary distribution given the inverse CDF of that distribution. Note that the inverse CDF does not always exist analytically; consider, for example, the normal distribution. This is the main reason why this method isn't used universally, though the R language actually does use this method for sampling from a normal distribution using polynomial approximations. No relation to the alcoholic beverage.
Code: smirnov.py

#### Rejection sampling

Description: Rejection sampling is a simple but inefficient way to generate samples from any distribution. For the distribution you want to sample from, find a different distribution we know how to sample from (say, through the Smirnov transform) whose probabilities are greater than or equal to probabilities of the original distribution at each point. Sample a point x from this new distribution, then draw another sample from the uniform distribution (0,p(x)), where p(x) is the probability of the point at x in the proposal distribution. If the uniform sample is greater than the probability of x in the original distribution, reject the sample (hence the name), otherwise, consider it a sample from the original distribution.
Code: rejection.py

#### Metropolis-Hastings

Description: Metropolis-Hastings is a famous Markov chain Monte Carlo method for generating random samples from an arbitrary probability distribution or computing an integral for an expected value. Unlike normal Monte Carlo methods, which draws independent samples from a uniform distribution, the sequence of samples in Metropolis-Hastings forms a Markov chain, hence the name of the algorithm. In the context of drawing samples from an arbitrary distribution P(x), we can use any distribution P'(x) that is proportional to P(x). First, we choose a random starting point x. To sample our next point, we draw from a jumping distribution Q(x), which is usually just a Gaussian distribution centered around the latest sampled point. We compute an acceptance ratio, defined as P'(xnew)/P'(xold). If the acceptance ratio is greater than one, we accept the new point as our new sample. If the ratio is less than one, we accept the point with probability equal to the acceptance ratio. If we reject the new point, we reuse our old point as the new sample. We then continue to repeat this procedure to generate samples.
Code: metropolis.m

#### Monte Carlo integration

Description: Unlike integration methods that cut up the function into small rectangles or trapezoids and add up the area, Monte Carlo integration leverages the law of large numbers and integrates using random samples. In the general case where we are integrating a function f(x) from A to B, we can reinterpret the integral as an expected value with respect to a uniform distribution from A to B. The expectation would take the form of an integral over h(x)*p(x), where p(x) is the uniform distribution with bounds A and B and h(x)=f(x)*(B-A). By taking many uniform samples between A and B, evaluating each sample at h(x), and averaging all of the results, the computed average will converge to the true value of the integral.
Code: mc.m

#### Multivariate normal generator

Description: This method produces samples from a multivariate normal distribution with arbitrary mean and covariance. I give a small proof in the comments that multiplying a sample from a zero mean, unit variance normal distribution by the triangular matrix of the new distribution covariance's Cholesky decomposition and adding the mean of the new distribution results in a sample from the new distribution.
Code: multivariate_normal.m

#### Multivariate normal conditional generator

Description: This method uses the property that conditional distributions of a multivariate normal distribution are also normally distributed.
Code: multivariate_normal_conditional.m

## Digital Music

#### Bitcrushing

Description: Used to make Nintendo-like sounds and the "yoi yoi" effect in dubstep music. Bitcrushing refers to reducing the bit resolution in the samples and reducing the sampling rate from 44100 Hz to as low as 4096 or 8192 Hz. Like overdrive below, this effect generates square waves that fill the spectrum with harmonic frequencies.
Code: Bitcrush.m

#### Overdrive

Description: This the characterstic over-amplified guitar sound from bands like Led Zeppelin and Jimi Hendrix. Once you start amplifying the signal too much, the highest peaks will begin hitting their maximum limit. The tops of sine waves will be flattened to look like square waves, and since square waves are composed of the original frequency plus odd-integer harmonic frequencies, this results in a full and rich spectrum of new frequencies.
Code: Overdrive.m

## Linear Algebra

#### Cholesky decomposition

Description: Matrix decomposition for symmetric, positive-definite matrices. It's useful for solving systems of linear equations and transforming a multivariate normal distribution to a new mean and covariance.
Code: matrix.py

#### Von Mises iteration

Description: An iterative method for finding the eigenvector of a matrix with the largest eigenvalue. Also known as the power method. Used in Google's PageRank algorithm to find the most important websites according to hyperlinks.
Code: matrix.py

#### Freivald's algorithm

Description: Verifies matrix multiplication in a randomized way. If A*B=C is correct, the algorithm always returns true. If A*B!=C, the algorithm returns true half of the time. The algorithm can be called multiple times to reduce the error probability to .5^k (if called k times).
Time: O(n^2)
Code: matrix.py

#### Woodbury matrix identity

Description: This is a trick commonly used in Kalman filters for computing a matrix inverse. In this case, you have already computed the inverse of a matrix A, but now you need to compute the inverse of A plus another matrix BCD. With this identity, you can reuse the inverse of A and calculate inverses on matrices of matrix C's size. When the dimension of C has a lower dimension than A, this method can be more efficient than computing A + BCD and inverting the new matrix.
Code: woodbury.m

## Sorting

#### Quicksort

Description: Quicksort is a divide-and-conquer comparison sort algorithm.
Time: Worst: O(n^2), Average O(n log n)
Space: Worst: O(n), O(log n)
Code: quicksort.py

#### Mergesort

Description: Mergesort is a divide-and-conquer comparison sort algorithm.
Time: Worst: O(n log n)
Space: Worst: O(n)
Code: mergesort.py

#### Bubble sort

Description: Bubble sort is a comparison sort algorithm. It's typically used as a motivating example for the O(n log n) selection sorts, although its behavior is acceptable on small lists.
Time: Average: O(n^2)
Space: Worst: O(1)
Code: bubble_sort.py

#### In-place quicksort

Description: The original quicksort algorithm builds the sorted list in a new memory area, separate from the original input list. This requires O(n) space, where n is the number of elements in the list. In this variation, quicksort sorts the original input list instead of building a new list.
Space: O(log^2 n), so it's not really an in-place algorithm. It's O(log^2 n) because the call stack takes up O(log n) space, and each function call contains a pointer to the list of size O(log n), so log n * log n = log^2 n.

#### Counting sort

Description: In contrast the the O(n log n) comparison sorts, this is a linear integer sorting algorithm. Unfortunately, it is also linear in the range of elements, so this algorithm only works well for sorting small integers. For larger integers, radix sort is more efficient. Counting sort is a stable sorting algorithm.
Time: O(n + m), where n is the number of list elements and m is the range of the elements
Code: counting_sort.c

Description: This is another linear integer sorting algorithm that uses counting sort as a subroutine at each digit. Surprisingly, radix sort sorts in order from least significant digit to most significant digit.
Time: O(kn), where n is the number of list elements and k is the number of digits in the largest element

## Lists

#### Binary search

Description: Searches for a given value in a sorted list.
Code: binary.py

## Mathematics

#### Sieve of Eratosthenes

Description: Sieve of Eratosthenes is an ancient and efficient way to discover all the prime numbers less than some number N. My favorite fact about this algorithm is its computational complexity of O(n log log n), which is a consequence of Euler's lower bound on the prime harmonic series.
Code: sieve_of_eratosthenes.py

#### Sieve of Atkin

Description: Sieve of Atkin is the more modern and efficient version of the Sieve of Eratosthenes - it's able to discover all prime numbers less than some number N in O(n / log log n) time.
Code: sieve_of_atkin.py

#### Quake III fast inverse square root

Description: This is the famous magic fast inverse square root algorithm used to compute angle of incidence and reflection in Quake III Arena. Computing angle of incidence requires normalized vectors, and normalized vectors are computed by dividing the vector by its length, the square root of the sum of squares of the vector components. The code is the same as the original, but I've added some comments explaining the magic number step and the Newton iteration.
Code: quake_inv_sqrt.c

#### Gauss-Legendre algorithm

Description: Pi calculator. Each iteration doubles the number of correct digits.
Code: gauss_legendre.py

#### Borwein's pi algorithm

Description: Pi calculator. Each iteration quadruples the number of correct digits, but the update step is more complicated than Gauss-Legendre.
Code: borwein.py

#### Collatz conjecture

Description: Take any natural number n. If n is even, divide it by 2 (n/2). If n is odd, multiply it by 3 and add 1 (3n+1). If you repeat this process forever, Collatz conjectured that every natural number will always eventually reach 1. Despite how simple this conjecture sounds, it has never been proven to be true. Conway proved that a general version of this conjecture is undecidable, so it can never be proven nor disproven. The function below returns the sequence of numbers the input number took to reach 1.
Code: collatz.py

#### Bakhshali method

Description: Ancient method for calculating the square root of a number
Code: bakhshali.py

#### Babylonian method

Description: Ancient method for calculating the square root of a number
Code: babylonian.py

## Numerical Methods

#### Runge-Kutta method (RK4)

Description: Popular iterative method for approximating solutions to an ODE.
Code: rk4.py

## Optimization

#### Newton's method

Description: Iteratively approximates the roots of a function (f(x)=0) using the derivative of the function and the function itself. Many of the other methods in this section are elaborations on this basic method.
Code: newton.py

## Signal Processing

#### Hann window

Description: Popular windowing function with low aliasing but slightly decreased resolution from a wider main lobe.
Code: hann.py

#### Hamming window

Description: Popular windowing function for minimizing the worst-case side lobe error.
Code: hamming.py

#### Blackman window

Description: Popular windowing function with 18dB side lobe roll off.
Code: blackman.py

#### Blackman-Harris window

Description: Similar to the Blackman window with one extra term. Side lobe levels are lower than the Blackman window.
Code: blackman_harris.py

## Strings

#### In-place reversal

Description: Reverses a string in place.
Time: O(n)
Code: reverse_inplace.c

#### Levenshtein distance

Description: Calculates the difference between two strings in terms of inserting, switching, or deleting letters. For example, the Levenshtein distance between 'human' and 'tan' is 3 because you need to (1) delete 'h', (2) delete 'u', and (3) switch 'm' to 't' to change 'human' to 'tan'.
Code: levenshtein.py

## Graph Theory

#### Union Find

Description: Also known as the disjoint set, this an efficient data structure that keeps track of the disjoint subsets of a set. The data structure has two main functions: union, which combines two subsets together, and find, which reports the subset that an element belongs to.
Code: union_find.py

#### Kruskal's algorithm

Description: Kruskal's is a greedy algorithm for finding the minimum spanning tree in a weighted graph. It's one of my favorite algorithms because it uses both the union-find algorithm and radix sort (assuming integer weights in the graph).
Code: kruskal.py

## Cryptography

#### Lucas-Lehmer primality test

Description: Given a Mersenne number, this algorithm decides if it is also a Mersenne prime.
Code: lucas_lehmer_test.py

## Supervised Learning

#### Gaussian process regression

Description: Also known as kriging, GPR approximates a function from a set of independent (X) and dependent (Y) variables. When a new X is given, its corresponding Y is computed as a weighted average of the other Y's. Those weights depend on the distance of the other X's to the new X, so that closer X's to the new X will have a stronger affect on its Y value than X's which are farther away.
Code: gpr.m

#### k-nearest neighbors

Description: K-nearest neighbors assigns a label to an input vector based on the labels of its closest neighbors. The 'K' refers to the K closest training vectors to the input vector, and the majority class determines the input vector class. K is typically an odd integer like 1, 3, and 5 in order to avoid ties during the voting process for binary classification problems. Increasing K leads to better noise immunity but less class boundary granularity.
Code: knn.m

#### Ordinary least squares

Description: Ordinary least squares models the target variables as an affine function of the predictor variables with independent and normally distributed errors. The algorithm outputs a line equation that minimizes the sum of the squared residuals between the line and the target variables.
Code: least_squares.m

#### Weighted least squares

Description: This model is identical to ordinary least squares with the exception that the target variables can be modeled with different variances in their normally distributed error terms. This manifests itself in the routine as an information matrix W that is the inverse of the covariance matrix and multiplies the data matrix.
Code: weighted_least_squares.m

#### Total least squares

Description: Total least squares models errors in both the dependent and independent variables. The resulting line minimizes the squared projection error of each point to the line.
Code: total_least_squares.m

## Unsupervised Learning

#### Kernel k-means

Description: This is the kernelized version of the k-means algorithm. The algorithm avoids explicitly computing the cluster means in the higher dimensional space by embedding the step within the cluster assignment step and kernelizing it. Unfortunately, this algorithm easily gets stuck in local minima, so researchers typically initialize the cluster assignments using the output of spectral clustering.
Code: kernel_kmeans.m

#### Kernel density estimation

Description: KDE is a nonparametric method for estimating the data's probability density function. The intuition behind this method is that the density function is going to be higher in more densely clustered areas and lower in less dense areas, so the density estimate at any given point is the average of a function of the distances from that point to all of the other data points.
Code: kde.m

#### FastICA

Description: Iterative algorithm for independent component analysis that maximizes the non-Gaussianity of the unmixed signal as a substitute for statistical independence.
Code: fastICA.m

#### Non-negative matrix factorization (Euclidean metric)

Description: Iterative algorithm for factorizing a non-negative matrix V into two non-negative matrices W and H such that V = WH. This method uses two simple multiplicative update rules that were proposed by Lee and Seung back in 2001, but it doesn't always produce a sparse solution. Their paper also offers a gradient-based update rule, but those require tuning a step-size parameter.
Code: nmf.m

#### Non-negative matrix factorization (KL divergence)

Description: This method is identical in principle to the NMF method for minimizing the Euclidean metric. The multiplicative update rule has been modified to minimize the KL divergence between V and WH.
Code: nmf_kl.m

## Control Theory

#### Popov-Belevitch-Hautus test (PBH)

Description: This is another way to test if a system is controllable (i.e. there is a way to move the system from any initial state to any final state). The test states that the system is controllable if and only if no left eigenvectors of A are orthogonal to B.
Code: pbh_test.m

#### Left eigenvectors

Description: This routine finds the left eigenvectors of a matrix and is used as a subroutine in the PBH test. It assumes that the matrix has a complete set of distinct right eigenvectors.
Code: left_eig.m

#### Controller canonical form

Description:
Code: controller_canonical.m

Description:
Code: fsf.m

#### Observer canonical form

Description:
Code: observer_canonical.m

Description:
Code: lqg.m

#### Luenberger observer

Description:
Code: luenberger_observer.m

#### Kalman decomposition

Description:
Code: kalman_decomposition.m

#### Controllability matrix

Description:
Code: controllability.m

#### Observability matrix

Description:
Code: observability.m

#### Jordan form

Description:
Code: jordan_form.m

#### Modal decomposition

Description:
Code: modal_decomposition.m

Description:

#### Bass-Gura formula

Description:
Code: bass_gura.m

#### Ackermann's formula

Description:
Code: ackermann.m

#### Gilbert realization

Description:
Code: gilbert_realization.m

#### Lyapunov stability (linear systems)

Description:
Code: lyapunov_lti.m

## Object-Oriented Design Patterns

#### Strategy

Description: This pattern allows a class to select different behaviors or algorithms at runtime.
Code: strategy.cpp

#### Singleton

Description: This pattern restricts the instantation of a class to one object.
Code: singleton.java

## Online Algorithms

#### Online variance

Description: The typical formula for variance, var(X) = E[X^2] - E[X]^2, requires you to average over all the elements of X at once, an O(N) step. This algorithm lets you update the variance in an O(1) step with each new element added and in O(1) space, so the elements do not need to be kept around to compute new variances.
Code: online_stat.cpp

#### Online average

Description: This algorithm updates the average of a set of points in O(1) time and O(1) space with each new element added.
Code: online_stat.cpp