Chordination

Phil Sisk, Bryan Hughes, Rohan Paul, Ashok Aggarwal, Miheer Patankar

Chordination is a fast, automatic musical chord detection algorithm implemented in MATLAB. It takes audio files as inputs and outputs a string vector with a list of the chords that were detected in the file. It utilizes the Pitch Class Profile algorithm developed by Takuya Fujishima [1]. Chordination is demonstrated to be fairly accurate with clean, relatively noiseless musical data and a simple set of chords.

Musical Background

Notes

Musical notes are the basic building blocks of music. A musical note is a sound produced by a voice or musical instrument characterized by a particular fundamental frequency. In Western music there are twelve unique notes named after the letters of the alphabet: A, A#, B, C, C#, D, D#, E, F, F#, G, and G#. A note with double the fundamental frequency of another is perceived by the ear to be the same note and is said to be one octave higher than the other. For example, the note with fundamental frequency 440 Hz is an A, and the note with fundamental frequency 880 Hz is also an A.

Harmonics

Musical notes also contain frequencies at integer multiples of the fundamental frequency known as overtones or harmonics. For example, the A at 440 Hz has harmonics at 880 Hz, 1320 Hz, 1760 Hz, and so on. The relative strengths of these harmonics has a large impact on the way the note sounds.

FFT of a A3 played on a piano, showing the fundamental frequency and harmonics of the not

Chords

A chord is any set of two or more musical notes played simultaneously. A chord is characterized by the notes it contains; for example, an A major chord contains the notes A, C#, and E. Usually, it does not matter what octave the note is played in. A series of chords played successively is called a chord progression. Certain chords known as major chords, minor chords, and seventh chords are extremely common in Western music, especially popular music.

Algorithm

Given an input of any noisy or non-noisy wav file, the algorithm shown in the block digram above outputs a vector of strings corresponding to the detected chord in each frame. It also outputs a time dependent distribution of likelihood across all 12 notes.

Windowing and Preprocessing

In order to extract information about a given time in a song, the samples in the time domain around that give time must be analyzed. Thus we found it necessary to create a windowing scheme for any input sound signal.

We first implemented rectangular windows simply by using Matlab to splice a full signal into a separate signals of length 0.25 seconds. We found 0.25 seconds to be an optimal length for spectral resolution as well as chord resolution. Using significantly shorter samples ( < 0.1 seconds) created spectral leakage and noise in the FFT. Using significantly longer samples ( > 1 second) combined several different chords into a single analysis; which made it impossible to detect the pitches associated with a single chord.

Non-overlapping Windows

The spliced signals were multiplied by a Hamming window to further improve spectral resolution and reduce leakage.

Initially, we took an approach where all windows were adjacent; the start of each new window was shifted in the time domain by one window length (see Figure 1).

Overlapping Windows

We eventually changed this approach by using overlapping windows; we can specify some variable N such that every kth and (k+N)th window are adjacent. For example, if we specify N = 5, the start of each new window is only shifted by 0.05 seconds from the start of the previous. It thus takes 5 new windows to produce a window adjacent to the initial window (see Figure 2). Our current algorithm implementation uses N = 10.

We took the FFT of our audio sample to convert it into the frequency domain. We used the built in Matlab function fft() to accomplish this.

We pass the windowed audio sample through a bandpass filter that filters out frequencies that aren't associated with any musical note or chord. We found filtering out frequencies lower than 40Hz and higher than 1500Hz gave us the cleanest data.

Harmonic Product Spectrum (HPS)

In every musical signal there are harmonics at every multiple power of two that contribute energy to the signal. These harmonics don’t add any new information that our PCP algorithm could use in detecting which notes were played; thus their impact should be minimized. In order to reduce these harmonics the harmonic product spectrum (HPS) maximizes the fundamental frequency, which the PCP algorithm is most concerned with, which in turn reduces the magnitude of the harmonics. The added benefit of working with the HPS is it does an excellent job in recognizing pitches in polyphonic signals.

When working with the HPS we found that completely decimating a harmonic didn’t have the effect we wanted. By diminishing harmonics we are also diminishing frequencies that our PCP algorithm would have needed to pick up on. To remedy this we took an average of the FFT and the output from the HPS. This average was a good balance for reducing noise and keeping the vital parts of the signal that is needed by the PCP.

Pitch Class Profile (PCP)

The PCP is the heart of our chord detection algorithm. The PCP is effectively a system of binning any frequency spectrum into distinct notes. First, N(w) is generated. N(w) is shown below - steps are centered around the frequencies of notes( A = 110, 220, 440... Hz).

An integer value is assigned to each note: A = 0, A# = 1, B = 2, etc. We are given the FFT of the input signal X(w). Then, for any w such that N(w) = n, we sum over X(w) only on those values of w. This sum is the nth entry of the PCP vector.

The following diagram shows the output of the PCP algorithm when input a B-flat major chord.

The output vector has large magnitudes for A#, D, and F, the notes in the detected chord.

We eventually realized that the N(w) given by [1] is only correct in theory and not in practice. The given N(w) treats every frequency in the FFT as an equal contributor to a note - the magnitude X(440Hz) is weighted the same as the magnitude X(453Hz). This is unintuitive and musically misleading - 453 Hz is midway between A and A# and its magnitude should not be added to either note. Even a low resolution FFT in our processing scheme would not shift the 440 Hz frequency component to 453 Hz due to quantization error of the DFT. A further inspection of several FFTs of real songs indicated that the peaks of the frequency spectra fell within 5 Hz of a theoretical note frequency more than 95% of the time. Thus we generated a new N(w) that discards any frequency further than 10 Hz away from a theoretical note frequency. For example, frequencies between 430 and 450 Hz and between 456 and 476 Hz are summed in the PCP - they are within 10 Hz of A and A# (at 440 and 460 Hz respectively). The frequency component at 453 Hz is dropped.

Our improved PCP was implemented in our final algorithm execution and is successful at reducing noise in the output PCP vectors, thus increasing the accuracy of the Chord Detector

Smoothing Convolution

We often found that the detected notes were “spiky”; certain notes would be detected in certain windows but not in adjacent ones. This would cause the detected chords to vary widely from window to window, resulting in an erratic output. To solve this problem, we implemented a smoothing convolution. We created a vector whose length was three times number of overlaps and set the first half of its elements to 0s and the second half to 1s. For example, if the number of overlaps were 4, the smoothing vector would look be [0 0 0 0 0 0 1 1 1 1 1 1]. We then convolved this vector with each row of notes. The following diagrams show the output of the PCP algorithm with a C major chord, before smoothing:

And after smoothing:

The smoothing removes the higher order variations from the correct notes as well as forces the erroneous notes closer to 0.

ChordFinder

Chord Dictionary

The final task for our algorithm is to use the detected notes output from the smoothed PCP to determine which chords are being played. This is accomplished using our ChordFinder, a function which searches a dictionary of chords to find a closest match for the detected notes.

We discovered an online database of 660 chord names and their corresponding notes. Using C, we parsed this file into two parts: a string array containing the 660 chord names (“Cmaj”, “Cmin”, …) and a parallel array containing 660 1x12 binary vectors. In these vectors, each element corresponds to a note in the chromatic scale (A A# B C C# D D# E F F# G G#), and each 1 corresponds to a note present in the chord. For example, the vector [1 0 0 0 1 0 0 1 0 0 0 0] represents an A major chord (which contains the notes A, C#, and E). We then converted the binary vectors to decimal numbers (for example, [1 0 0 0 1 0 0 1 0 0 0 0] became 2^11 + 2^7 + 2^4 = 2192) and saved the arrays in .csv files. Finally, we created a map in MATLAB with the chord names as values and the decimal values as keys.

Through testing our algorithm, we discovered that the algorithm was often likely to detect erroneous notes and consequently choose very complicated and unusual chords from the dictionary. These chords, such as F#5(add13), are very unlikely to occur in real music, especially popular music. Therefore, we decided to create a smaller chord dictionary containing 64 of the most common chords in popular music: the major chords, the minor chords, the seventh chords, and the minor-seventh chords. Using this simpler dictionary made our algorithm much more robust to noisy data, at the expense of losing the ability to detect unusual chords.

ChordFinder Algorithm

The ChordFinder algorithm first removes notes in the PCP vector below a certain threshold to clean the data. It also “rewards” notes above this threshold by boosting their levels closer to 1. It then computes the differences between the PCP vector and each binary vector in the keys of our chord map. It computes this difference as the sum of the absolute value of the differences between each element of the vectors. The “closer” the PCP vector is to a given chord key, the smaller this difference will be. Finally, it chooses the key with the smallest difference and uses this key to decide which chord is being played. We also created a version of this algorithm which took the dot product of the vectors rather than their difference; this version tended to be accurate with our simplified dictionary but would err towards complicated chords with our full dictionary.

Final Output

The most basic output of our algorithm is a string vector containing the detected chords in each window. Additionally, we created two optional outputs to give the user more insight into how our algorithm functions:

LiveDisplay: If this option is chosen, once Chordination has completed detecting chords it will play the song and simultaneously output the corresponding detected chords in the command window. This allows the user to hear the audio and see whether the detected chords match in real time.

PlotDisplay: If this option is chosen, Chordination will output a plot showing the strength of the notes detected in each window. This gives the user a good visual representation of Chordination’s output as well as an insight into how accurately notes are detected. Here is an example of the plot for the first verse of the Beatle’s “Hey Jude”:

Benchmarking

We wanted a quantifiable metric for how well our algorithm performed. We created a simple chord progression with seven chords:

This chord progression is good for testing because it contains six unique chords, and each chord lasts for exactly one second, so we know exactly what chords should be chosen in each window. We created 8 different audio versions of this chord progression, 5 synthetic and 3 real:

  • 1. A basic synthetic piano version
  • 2. An arpeggiated (broken chord) synthetic piano version
  • 3. A “short” synthetic piano version (each chord plays for half a second rather than a second)
  • 4. A synthetic version without the bass notes in the chords
  • 5. A synthetic organ version
  • 6. A basic real guitar version (a recording of Phil strumming these chords)
  • 7. An arpeggiated real guitar version
  • 8. A “short” guitar version

We then created an “answer key” in MATLAB, a string vector consisting of the correct chords for each windowed time interval. We ran the algorithm with each of the 8 versions on our chord progression: for each version, we compared the detected chords to the answer key and gave a score to the algorithm based on the number of chords it correctly named. These scores gave us an insight into how well our algorithm performed, and allowed us to see the impact of changing our algorithm parameters.

The Baseline, (Case 1) is the final algorithm as presented at the end of the term, and Cases 2-6 show different variants of the algorithm.

Results

We tested our algorithm on several songs. Here are the results from three:

1. “Hey Jude”, The Beatles, 1968 (First Verse)

Detected chords, sampled once a second for the length of the song:

F#maj", "Fmaj", "Fmaj", "Fmaj", "Cmaj", "Cmaj", "Cmaj", "C7", "A#m", "Gmin7", "Dmin7", "Fmaj", "Fmaj", "Fmaj", "A#maj", "A#maj", "A#maj", "Fmaj", "Fmaj", "Fmaj", "Cmaj" "C7" "C7" "Fmaj" "Fmaj"

2. “Cleopatra”, The Lumineers, 2016 (second verse and chorus) Output of PlotDisplay:

Detected chords, sampled once a second for the length of the song:

"G#maj", "F7", "G#maj", "G#maj", "G#maj", "A#7", "G#maj", "G#maj", "G#maj", "C#maj", "G#maj", "G#maj", "Fm", "Cmin7", "G#7", "D#maj", "G#m" , "F7", "A#7", "G#maj", "F7", "D#7", "C#maj", "G#maj", "Cm", "Amin7", "G#maj", "D#maj", "D#maj", "D#maj", "G#m", "G#maj", "D#maj", "G#maj", "C#maj", "Cm", "Gmin7", "Cmin7", "D#maj", "Fmin7", "G#maj", "Cmin7", "D#maj", "D#maj", "Cmaj", "G#maj"

3. “Für Elise”, Ludwig van Beethoven (first section)

Detected chords, sampled once a second for the length of the song:

"Emaj", "Am", "Am", "Emaj", "Am", "G#maj", "Am", "Am", "Emaj", "Amaj", "Amaj", "Gmaj", "Am", "Emaj", "Am", "Amaj", "E7", "Amaj", "Em", "Am", "Em", "Cmaj", "Emin7", "G7", "Amaj", "Am", "Emaj", "G#maj", "C#m", "Emaj", "Am"

The algorithm is fairly accurate for “Hey Jude” and “Für Elise”, but detects many erroneous chords in “Cleopatra”.

Discussion

The benchmarking tests for the algorithm showed that our largest gains in accuracy are from the reduction of our chord dictionary. As seen in benchmarking Cases 6 and 7, the full 660 member dictionary results in an approximately 50% less in accuracy on those test cases. Because the full 660 chord database is so expansive, small variations of noise vastly alter which chord we detect. By limiting the chord choices, the algorithm may not find the “mathematically optimal” chord, but will find the most likely chord given a priori knowledge of common chords in popular songs, i.e. those given in the reduced database. The tests also show that our enhanced PCP algorithm marginally improves accuracy when compared to the original PCP; with filtering even reduces accuracy in a couple cases.

Overall, the benchmarking tests demonstrate a tradeoff between accuracy in specific cases and robustness in all cases. While the original pitch class profile and the discrete fourier transform are input independent, the HPS and the enhanced pitch class profile make assumptions about the underlying structure of the music and frequency spectrum. The harmonic product spectrum assumes that higher frequency coefficients in the DFT are primarily relics of lower frequency notes. The enhanced pitch class profile assumes that the musical piece displays well defined, small spread peaks centered around the expected value of notes in the frequency spectrum. These assumptions provide structure to the data, and when valid add accuracy to our detection algorithm. However, when the underlying assumptions break, these methods lose accuracy. Tuning parameters to gain accuracy for a specific instrument or specific style of music results may result in a loss of accuracy for other instruments or musical forms.

Rubric

We feel that our entire project essentially entails novel application of existing algorithms (HPS, PCP, nearest neighbors):

Algorithms + Creativity

We not only implemented the PCP algorithm first introduced by Fujishima, but discovered ways to improve upon it. we found that the relationships between note magnitudes contain information when used in the nearest neighbor approach.

Algorithms + Technical Merit and Understanding

We improved upon both the HPS and PCP algorithms. We recognized the real-world meaning of the output of the HPS and decided to average the HPS output with the input signal to produce an output more reflective of what we needed. We also recognized the failure in the production of N(w) in the PCP generation, and were able to improve N(w) to suit our application and reduce noise.

Algorithms + Effort

We did considerable research into potential algorithms we could use. We also spent considerable time refining our algorithms, looking at ways they could be improved, and ultimately validating the algorithms with data. We developed tests to quantifiably benchmark our system and determine the best parameters.

Mathematics or Theory + Technical Merit and Understanding

We feel that we were diligent with respect to applying and critiquing our implementation of algorithms. We understand why our algorithm fundamentally succeeds and fails with certain inputs and were able to change the detection parameters and algorithm implementations to meet many of our failures.

Member Contributions

Miheer

Ashok

Rohan

Bryan

Phil

The Team

From left to right:

Phil Sisk, Rohan Paul, Ashok Aggarwal, Miheer Patankar, Bryan Hughes

References