John Hugo Marcoux III - Systems Engineer
Television Commercial Detection
We use the presence of a channel logo in order to detect commercial transitions in real time. This method requires work in image processing and classification of the image to determine if a logo is present. It is limited by the fact that the logo is not always present during a television show and sometimes changes during a show, but our implementation is robust enough to solve some of these problems such as an early disappearance of the logo, short disappearances, or logo changes and animation. The determination of whether or not a frame is part of a television show is done with the use of a classification algorithm described below.
Classification Algorithm: Histogram of Oriented Gradients (HoG)
The Histogram of oriented gradients is a feature descriptor that is traditionally used in human detection. Here, it has been implemented as a descriptor of the cropped portion of the frame to describe the different orientations within the portion. The idea is to divide the image into smaller regions called cells. Each cell is comprised of a certain number of pixels. Also, it is to be noted that these cells overlap over each other to better capture shape information. A histogram of oriented gradients is then calculated for each cell. On the next level, a certain number of cells are grouped in blocks and then normalized at this level. Below is a visualization of how a television network logo appears to a person on the left and a HoG visualization of that same image on the right:
Gradient Computation: The first step is the computation of the gradient values for the pixels within a cell. This is done using the discrete derivative mask in both horizontal and vertical directions over the image. In other words, the pixel intensity values are filtered with the kernels shown below:
Orientation Binning: Then the orientations that were obtained from the previous step need to be binned into the respective slots on the histogram. Each pixel within a cell casts a weighted vote for an orientation-based histogram channel based on the values found in the gradient computation. The cells themselves can either be rectangular or radial in shape, and the histogram bins are evenly spread over 0 to 180 degrees. In our algorithm, we use 9 histogram bins to bin the orientations. The vote weight for each pixel is the gradient magnitude itself.
Descriptor Blocks: The cells are then grouped on higher level called blocks. The HoG descriptor is then the vector of the components of the normalized cell histograms from all of the block regions. These blocks typically overlap, meaning that each cell contributes more than once to the final descriptor.
Block Normalization: The L2-norm is used for block normalization at the block level. This normalization results in better invariance to changes in illumination or shadowing. This normalization is shown below:
Classification Algorithm: Support Vector Machine (SVM)
This classification algorithm decides whether a particular frame is part of a TV show or Commercial. This generates an optimal hyperplane for linearly separable data. It has been noted that the HoG feature descriptors of the logo frames and non-logo frames are linearly separable and hence we use the linear kernel (i.e. there is no need to map the data into some new space using a different kernel function). The SVM is initially trained and then used to classify incoming HoG feature descriptors. An example of the SVM Hyperplane is shown below:
Training: The SVM is fed a large number of positive (with logo) and negative (without logo) descriptors for each channel. A small number of positive and negative are selected to be support vectors. These support vectors help form the separating hyperplane. The hyperplane is generated maximizing the margin between the positive and negative support vectors. The hyperplane has a bias value and the support vectors each have a corresponding weight. These are the only values that are then needed for classification. The weights, support vectors and bias are then stored in an SVM Structure which is consulted during classification.
Classification: The classification is done by consulting the SVM Structure for that particular channel. The following equation is used to generate a scalar that is then compared to a threshold to then make the final decision about the frame. The calcuation of this threshold is shown below: