Gowtham Bellala - Research

Home Research Publications Courses

Research

Information Theoretic Methods for Query Learning
with Clayton Scott and Suresh Bhavnani
The standard mathematical formulation for query learning is often idealized relative to many real-world identification tasks in that it does not account for task-specific factors, such as input errors, time constraints and cognitive limitations of a human user in a stressful environment. In this work, we address some of the main task-specific constraints that arise in emergency response applications such as toxic chemical identification, where the standard algorithms such as generalized binary search (GBS) fail. The main contributions of this work can be broken down into three categories: a) group based query learning; b) query learning in the presence of persistent noise; c) query learning with exponential costs. We pose these problems as an optimization problem using their information theoretic lower bounds and provide greedy solutions in each case. The proposed algorithms are extensions of GBS and are based on an re-interpretation of GBS as generalized Shannon-Fano coding. The generality of these algorithms follows from the information-theoretic framework in which they are developed, leading to their applicability to a much broader class of applications, such as other forms of emergency response, fault diagnosis, network failure diagnosis, active learning, computer vision or in Internet based data search.
1. G. Bellala, S. K. Bhavnani and C. Scott, "Query learning with Exponential Query costs," Technical Report, Feb 2010. [pdf]
2. G. Bellala, S. K. Bhavnani and C. Scott, "Group-based Query Learning for rapid diagnosis in time-critical situations," Technical Report, Nov 2009. [pdf]

Symptom Co-occurence patterns in Cancer patients
with Suresh Bhavnani, Clayton Scott and Maria Silveira
Many cancer patients experience multiple concurrent symptoms and because of the additive impact of multiple symptoms, these patients generally fare worse than those who have only a few. Therefore, understanding how symptoms co-occur in cancer patients can lead to more efficient assessment and management of symptoms, and significantly improve the overall function and quality of life for these patients. To address this need, recent research has used data reduction techniques such as hierarchical clustering and factor analysis to study symptom clusters in cancer patients. However, there is no consensus in these results. Also, the limitations and the a priori assumptions in these methods often lead to a biased outcome. To overcome these problems, we used a multi-method approach where we used visual analysis to study symptom co-occurence across cancer patients using existing and novel visualization techniques. These observations were then quantitatively analyzed through carefully selected existing and novel methods, in addition to comparison with random networks. Contradictory to the previous belief of the presence of multiple distinct symptom clusters, our results consistently showed the absence of any such clusters. In addition, the multi-method approach revealed a strongly nested structure of symptom co-occurence. This finding has great implications in terms of both clinical practice and future research.
1. S. K. Bhavnani, G. Bellala, A. Ganesan, R. Krishnan, P. Saxman, C. Scott, M. Silveira and C. Given, "The Nested Structure of Cancer Symptoms: Implications for Analyzing Co-occurrence and Managing Symptoms," Methods of Information in Medicine, in press. [pdf]
2. S. K. Bhavnani, G. Bellala, A. Ganesan, R. Krishnan, P. Saxman, C. Scott, M. Silveira and C. Given, "Network Analysis of Cancer Patients and Symptoms: Implications for Symptom Management and Treatment," Proceedings of American Medical Informatics Association, 2009.

Statistical Error analysis for FDR controlled classification
with Clayton Scott and Rebecca Willett
In this work, we investigated classification performance measurement in terms of the false discovery rate (FDR) and the false nondiscovery rate (FNDR). These measures have received considerable attention in the literature on multiple testing. In multiple testing problems, it is usually assumed that p-values can be calculated or estimated. Then, to control FDR, p-values are adjusted through one of a variety of single step, step-up or step-down procedures, and thresholded. However, in many applications, p-values are difficult to estimate as the null distribution cannot be easily modeled or simulated. Also, thresholding and adjusting p-values is almost never optimal in terms of minimizing FNDR or any other measure of Type II error. In this work, we have demonstrated how training data may be used to adapt to both null and alternative distributions, while making no assumptions whatsoever on either. Adopting the perspective of statistical learning theory, we also developed distribution free results on the generalization error analysis of FDR and FNDR, including uniform deviation bounds, finite sample performance guarantees, strong universal consistency and variance-based bounds.
1. C. Scott, G. Bellala and R. Willett, "The false discovery rate for statistical pattern recognition," Electronic Journal of Statistics, Vol. 3, 651 ‐ 677, 2009. [pdf|bib| EJS]
2. C. Scott, G. Bellala and R. Willett, "Generalization error analysis for FDR controlled classification," IEEE Workshop on Statistical Signal Processing, 792 ‐ 796, Madison, WI, August 2007. [pdf]