Home
Research
Publications
Courses
Research
- Information Theoretic Methods for Query Learning
with Clayton Scott and Suresh Bhavnani
The standard mathematical formulation for query learning is often idealized relative to many real-world identification tasks in that
it does not account for task-specific factors, such as input errors, time constraints and cognitive limitations of a human user in a stressful environment.
In this work, we address some of the main task-specific constraints that arise in emergency response applications such as toxic chemical identification,
where the standard algorithms such as generalized binary search (GBS) fail. The main contributions of this work can be broken down into three categories: a) group based query learning; b) query
learning in the presence of persistent noise; c) query learning with exponential costs. We pose these problems as an optimization problem using their
information theoretic lower bounds and provide greedy solutions in each case. The proposed algorithms are extensions of GBS and are based on an re-interpretation
of GBS as generalized Shannon-Fano coding. The generality of these algorithms follows from the information-theoretic framework in which they are developed, leading to their applicability
to a much broader class of applications, such as other forms of emergency response, fault diagnosis, network failure diagnosis, active learning, computer vision or in Internet based data search.
- G. Bellala, S. K. Bhavnani and C. Scott, "Query learning with Exponential Query costs," Technical Report, Feb 2010. [pdf]
- G. Bellala, S. K. Bhavnani and C. Scott, "Group-based Query Learning for rapid diagnosis in time-critical situations," Technical Report, Nov 2009. [pdf]
- Symptom Co-occurence patterns in Cancer patients
with Suresh Bhavnani, Clayton Scott and Maria Silveira
Many cancer patients experience multiple concurrent symptoms and because of the additive impact of multiple symptoms, these patients generally fare worse than those
who have only a few. Therefore, understanding how symptoms co-occur in cancer patients can lead to more efficient assessment and management of symptoms, and
significantly improve the overall function and quality of life for these patients. To address this need, recent research has used data reduction techniques such
as hierarchical clustering and factor analysis to study symptom clusters in cancer patients. However, there is no consensus in these results. Also, the limitations
and the a priori assumptions in these methods often lead to a biased outcome. To overcome these problems, we used a multi-method approach where we used visual
analysis to study symptom co-occurence across cancer patients using existing and novel visualization techniques. These observations were then
quantitatively analyzed through carefully selected existing and novel methods, in addition to comparison with random networks. Contradictory to
the previous belief of the presence of multiple distinct symptom clusters, our results consistently showed the absence of
any such clusters. In
addition, the multi-method approach revealed a strongly nested structure of symptom co-occurence. This finding has great implications in terms of both
clinical practice and future research.
- S. K. Bhavnani, G. Bellala, A. Ganesan, R. Krishnan, P. Saxman, C. Scott, M. Silveira and C. Given,
"The Nested Structure of Cancer Symptoms: Implications for Analyzing Co-occurrence and Managing Symptoms,"
Methods of Information in Medicine, in press.
[pdf]
- S. K. Bhavnani, G. Bellala, A. Ganesan, R. Krishnan, P. Saxman, C. Scott, M. Silveira and C. Given,
"Network Analysis of Cancer Patients and Symptoms: Implications
for Symptom Management and Treatment,"
Proceedings of American Medical Informatics Association, 2009.
- Statistical Error analysis for FDR controlled classification
with Clayton Scott and Rebecca Willett
In this work, we investigated classification performance measurement in
terms of the false discovery rate (FDR) and the false nondiscovery rate (FNDR). These measures have
received considerable attention in the literature on multiple testing. In
multiple testing problems, it is usually assumed that p-values can be
calculated
or estimated. Then, to control FDR, p-values are adjusted through one of a variety of single step, step-up or step-down procedures, and thresholded. However,
in many applications, p-values are difficult to estimate as the null distribution cannot be easily modeled or simulated. Also, thresholding and adjusting p-values
is almost never optimal in terms of minimizing FNDR or any other measure of Type II error. In this work, we have demonstrated how training data may be
used to adapt to both null and alternative
distributions, while making no assumptions whatsoever on either. Adopting the perspective of statistical learning theory, we also developed
distribution free results
on the generalization error analysis of FDR and FNDR, including uniform deviation bounds, finite sample performance guarantees, strong universal consistency
and variance-based bounds.
- C. Scott, G. Bellala and R. Willett,
"The false discovery rate for statistical pattern
recognition,"
Electronic Journal of Statistics, Vol. 3, 651 ‐ 677,
2009.
[pdf|bib|
EJS]
- C. Scott, G. Bellala and R. Willett,
"Generalization error analysis for FDR controlled
classification,"
IEEE Workshop on Statistical Signal Processing, 792 ‐
796, Madison, WI, August 2007.
[pdf]