John Hugo Marcoux III - Systems Engineer

Television Commercial Detection

High Level Overview: System Architecture

Our total system architecture involved using the DE2-70 FPGA board for our implemented hardware and a Raspberry Pi to handle our classification algorithm. The full System Architecture is shown below:

Input:
Our total system architecture involved using the DE2-70 FPGA board for our implemented hardware and a Raspberry Pi to handle our classification algorithm. On the DE2-70, we had two input analog signals attached via RCA cables to the Video Line In ports. Each of these signals entered in NTSC (National Television Standard Committee) analog format. This analog feed came in one line at a time, and was stored into a FIFO queue while it was being processed. It was first converted from YCbCr format to RGB, and then stored pixel by pixel into SDRAM. Each frame of video was 720x480 pixels, with each pixel being 16-bits. The input signal was also de-interleaved, meaning that it sends all even lines in succession, followed by all odd lines. Each set is sent at a 30Hz rate, making the total refresh rate 60 Hz. This process worked the same for both of our input lines, which means that there are 2 separate SDRAM modules, each one constantly being updated with their respective television input's frames. The input portion is shown below:

UART Transmission:
After the whole frame was stored into memory, we had a timer which triggered the sending of a cropped, reduced quality portion of the frame from our SDRAM to our Raspberry Pi via a UART Serial connection. These frames were sent at a 10 Hz rate, with a 1.5 Mbps baud rate. This portion of the frame, which contained the logo of the television network, was always 64x46, and was sent as 8-bit grayscale pixels. Since each network had a different position for the logo, 3 dipswitches were used to address which of our 8 networks was being played at any given time. These switches were used to tell our frame capture module which portion of SDRAM to read from as well as to communicate to our Raspberry Pi which channel we were currently on. This process is outlined in the diagram below:

Classification Algorithm:
As previously stated, our entire classification algorithm was implemented on our Raspberry Pi. Our program ran as a large while loop, which would receive the cropped logo portion of the frame, process it, tell our DE2-70 if that frame was part of a commercial or a television program, and then loop again. This process began with reading in each 8-bit pixel being sent via the UART connection into a global array of unsigned chars. Once 1944 bytes had been received, the global array was transposed from row wise data to column wise data, which was the format expected by our histogram of oriented gradients (HoG) algorithm. This data was then passed into our HoG function, and the output was an array of 1260 integers known as a feature descriptor. This feature descriptor, or HoG feature vector, would then be classified using our Support Vector Machine (SVM).

Each network has a previously trained SVM structure saved into a separate header file on our Raspberry Pi. To know which specific structure to use, the Raspberry Pi would then read 3 GPIO pins and determine which SVM structure to use to do its computation. After the SVM structure was chosen, the HoG feature vector was processed and a scalar value was produced. This scalar value was then compared to a pre-computed threshold, which tells our system whether or not each individual frame being processed is that of a television program or commercial. This result is then sent back as a Boolean value to our DE2-70 via a GPIO pin. This process is shown below:

Classification Algorithm on the Raspberry Pi

Commercial Detection:
Upon receiving the 'Commercial Boolean' value sent from the Raspberry Pi, the DE2-70 then used control logic to determine which SDRAM's FIFO to read from. This is the process which determines if the user will be viewing the primary feed (which would mean that there is no commercial) or the secondary feed (meaning that there is a commercial on the primary feed). Once this is determined, that SDRAM's FIFO is then read from, and the full 720x480 16-bit pixel frame is sent to our VGA display driver, pixel by pixel. This process is shown below:

VGA Display:
The VGA display drive receives each pixel, determines which line of the VGA display to write it to, and writes that pixel to that line. It must crop each frame, as the VGA display is only 640x480. This is done by simply ignoring the last 80 pixels of each line. The VGA display driver must also take into account the horizontal and vertical front and back porches, which it has predetermined. This process is shown below:

This concludes the explanation of our system architecture, outlining the flow of data from the input of our system, to the processing of each frame, and finally the output image seen by the user.