Title: IR, IE and QA over Social Media
1IR, IE and QA over Social Media
- Social media (blogs, community QA, news
aggregators) - Complementary to traditional news sources
(Rathergate) - Grow faster than traditional web content, gap
widening - Traditional/published 4Gb/day social media
10gb/day from Andrew Tomkins/Yahoo!, Future or
Web Search, May 2007 - Research challenges
- Low(er) quality
- Content more dynamic
- User interactions crucial
- ratings, comments, link structure
- to retrieve documents and to
- evaluate extracted information
2Finding High Quality Content for IE/QA
E. Agichtein, C. Castillo, D. Donato, A. Gionis,
G. Mishne, Finding High Quality Content in
Social Media, in Proc. of WSDM 2008
- Goal find high-quality content (accurate
well-presented) - Setting Community QA (Yahoo! Answers)
- Classifying social media (e.g., cQA) is
substantially different from document
classification - Sources of information
- Content analysis
- Usage data (page views, etc)
- Community ratings, link analysis
- General framework for quality estimation in
social media - Graph-based model of contributor relationships,
combined with content and usage analysis - Can identify high-quality items with accuracy
human agreement
3Finding Relevant Content for IE/QA
J. Bian, Y. Liu, E. Agichtein and H. Zha. Finding
the Right Facts in the Crowd Factoid Question
Answering over Social Media, to appear in Proc.
of WWW 2008
- Goal given a query, rank social content (cQA) by
expected relevance and quality - Approach Learn ranking functions specifically
for social media retrieval - Features
- Textual content relevance, stylistics, language
models - User Interactions link structure, discussion
threads - User ratings incorporate user-provided content
ratings - Method Gradient boosting (GBrank)
- Developed a new objective function for learning
ranking function using (noisy) preference data. - Results
- Outperform Yahoo! default ranking or naïve
ranking by user votes - Can be made robust to ratings spam same
authors, to appear in AIRWeb 2008