Collective Discourse as a Complex System
With the growth of Web 2.0, millions of individuals involve in collective discourse. They participate in online discussions, share their opinions, and generate content about the same artifacts, objects, and news events. This massive amount of text is mainly written on the Web by non-expert individuals with different perspectives, and yet exhibits accurate knowledge as a whole.
My current work is focused on the computational analysis of Collective Intelligence that emerges as a result of collective discourse on the Web. I study real-world, Web-scale datasets from various sources ranging from online news headlines, blog comments and movie reviews to citations of scientific articles, microblogs, and click logs. Tracing the generation of content over many instances reveals temporal patterns that allow us to make sense of the text generated around a particular event or object.
My experiments on different types of collective discourse such as headlines, citations, reviews, and tweet have resulted in a number of observations that confirm the diversity of perspectives among people who discuss the same matter. The diversity is partly because of different phrases that represent the same factoid (i.e., semantic information unit), but mainly due to different factoids that individual contributors write about. Although some factoids are very popular, a large number of factoids are only covered by few users . This confirms that people who write about the same object focus on different aspects of that object. I show that the set of people who engage in collective discourse form latent communities that emerge from the diversity in content generation. This community structure can be explained with a complex system named Latent Networks.
Identifying Misinformation and Credibility Assessment in Microblogs
A rumor is commonly defined as a statement whose true value is unverifiable. Rumors may spread misinformation (false information) or disinformation (deliberately false information) on a network of people. Identifying rumors is crucial in online social media where large amounts of information are easily spread across a large network by sources with unverified authority.
Our definition of a rumor is established based on social psychology, where a rumor is defined as a statement whose truth-value is unverifiable or deliberately false. In-depth rumor analysis such as determining the intent and impact behind the spread of a rumor is a very challenging task and is not possible without first retrieving the complete set of social conversations (e.g., tweets) that are actually about the rumor. In our work, we take this first step to retrieve a complete set of tweets that discuss a specific rumor. In our approach, we address two basic problems. The first problem concerns retrieving online microblogs that are rumor-related. In the second problem, we try to identify tweets in which the rumor is endorsed (the posters show that they believe the rumor). So far, our contributions in this project are two-fold: (1) We propose a general framework that employs statistical models and maximizes a linear function of log-likelihood ratios to retrieve rumorous tweets that match a more general query. (2) We show the effectiveness of the proposed features in capturing tweets that show user endorsement. This will help us identify disinformers or users that spread false information in online social media
iOpener: Generating Surveys of Scientific Paradigms
The goal of iOPENER (Information Organization for PENning Expositions on Research) is to generate readily-consumable surveys of different scientific domains and topics, targeted to different audiences and levels, e.g., expert specialists, scientists from related disciplines, educators, students, government decision makers, and citizens including minorities and underrepresented groups. Surveyed material is presented in different modalities, e.g., an enumerated list of articles, a bulleted list of key facts, a textual summary, or a visual presentation with zoom and filter capabilities. The original contributions of this research are in the creation of an infrastructure for automatically summarizing entire areas of scientific endeavor by linking three available technologies: (1) bibliometric lexical link mining; (2) summarization techniques; and (3) visualization tools for displaying both structure and content.
generating readily-consumable surveys of scientific domains is a challenging tasks. As part of iOpener, I have been involved in a number of projects, all of which are central to our solution to iOpener. Below is a summary of these projects.