Researcher affiliation extraction from homepages - PowerPoint PPT Presentation

1 / 16

About This Presentation

Title:

Researcher affiliation extraction from homepages

Description:

Locating the homepage of the researcher. name disambiguation. Locating the relevant parts of the site. pages (focused crawling), parts ... – PowerPoint PPT presentation

Number of Views:26

Avg rating:3.0/5.0

Slides: 17

Provided by: farkas6

Category:

Tags: affiliation | extraction | homepages | researcher

Transcript and Presenter's Notes

Title: Researcher affiliation extraction from homepages

1
Researcher affiliation extraction from homepages

I. Nagy, R. Farkas, M. Jelasity
University of Szeged, Hungary

2
Scientific social information

Research interest
Education
Previous and current affiliations, projects
Professional memberships
Teaching activities
Students, supervisors
Personal (nationality, age)

3
Source of social information

Social sites (e.g. )
structured, lots of information
coverage?
Citation databases
limited information (coauthors, affiliations,
citations)
Homepages
thought to be important by the researcher himself
almost every researcher has (a) homepage
unstructured

4
Web Content Mining

Early systems (99-2000) expert rules
Seed-driven systems
Input seed pairs of target information
Extract patterns from unlabeled text (e.g. Web)
Exploits redundancy (celebs)
High precision
Researcher homepage
Long tail, high recall required

5
Case study affiliation information

affiliationpositionstart dateend date
Frequently given
Experiences can be generalised
useful
Collegial relationships (whether they worked with
the same group at the same time)
Do American or European researchers change their
workplace more often?

6
Architecture

Locating the homepage of the researcher
name disambiguation
Locating the relevant parts of the site
pages (focused crawling), parts
Extracting information tuples
Weakly supervised setting
Normalisation
For every source of information

7
Manually tagged corpus

455 sites, 5282 pages for 89 researchers
three-level deep annotation hierarchy with 44
classes
manual annotation in the original HTML format
(WYSWYG) with hyperlinks
low inter-annotation agreement
focus on affiliation

8
Sample
9
Textual information

47 textual, 24 itemised, 29 hybrid
Structured wrapper induction
Textual paragraph longer than 40 characters and
contains at least one verb

10
Relevant parts

Every researcher has (a) homepage
Every homepage can be found in the top10 Google
response (queryname)
CV site always in depth 1
Textual paragraphs contain cluewords
class conditional prob. based 1-DNF
filtering 70k irrelevant paragraphs

11
Slot detection

It is not a NER
just affiliation related entities
surface features are insufficient
Standard procedure (CRF)
with domain specific lists as extra feature
domain specific segmentation
70 phrase level F-measure, one-researcher-leave-o
ut
(37 by lists/regexp)

12
Subject detection

Sometimes information about supervisors,
colleagues
Hypothesis paragraphs are homogeneous
Two procedures
NER for person names (trained on CoNLL)
personal pronouns
70 accuracy on gold standard and on predicted
too

13
Collecting information tuples

affiliation is the head
Heuristic assign each year and position_type to
the nearest affiliation
90 accuracy using the gold-standard labels
70 accuracy using the labels predicted by the
system
(FPs count as misclassified)

14
Problematic issues

I am a Ph.D. Student working under the
supervision of Prof. NAME
Hewlett-Packard Labs in Palo Alto
Ph.D. from MIT in Physics
Department of Computer Science, Waterloo
UniversityBASELINE
I lead the Distributed Systems Group
In-domain name detection
Enumeration detection is important (syntactic
parsers?)

15
Conclusions

Information from homepages of researchers
Special nature of the tasks
long tail
small labeled corpus
lack of domain-specific parsers
Several well defined subtasks
Basic solutions for each subtask

16

Thank you!
www.inf.u-szeged.hu/rgai/homepagecorpus
rfarkas_at_inf.u-szeged.hu

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

World's Best PowerPoint Templates PowerPoint PPT Presentation

World's Best PowerPoint Templates - CrystalGraphics offers more PowerPoint templates than anyone else in the world, with over 4 million to choose from. Winner of the Standing Ovation Award for “Best PowerPoint Templates” from Presentations Magazine. They'll give your presentations a professional, memorable appearance - the kind of sophisticated look that today's audiences expect. Boasting an impressive range of designs, they will support your presentations with inspiring background photos or videos that support your themes, set the right mood, enhance your credibility and inspire your audiences.

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Search Engine Technology for Digital Libraries PowerPoint PPT Presentation

Search Engine Technology for Digital Libraries - Search Engine Technology for Digital Libraries State of the Art and Future 7th International Bielefeld Conference J rgen Oesterle Juergen.oesterle@fastsearch.com | PowerPoint PPT presentation | free to view

Building Structured Web Databases: A Midterm Report from the Cimple Project PowerPoint PPT Presentation

Building Structured Web Databases: A Midterm Report from the Cimple Project - In normal datalog, each predicate in the body is associated with a relation that is either stored, or defined by a Datalog rule. Intuitively, ... | PowerPoint PPT presentation | free to view

Search Engine Technology for Digital Libraries PowerPoint PPT Presentation

Search Engine Technology for Digital Libraries - Fast Search & Transfer Deutschland GmbH. Most prominent problems with digital libraries ' ... In a typical digital library, you have to provide a combined ... | PowerPoint PPT presentation | free to view

A miniexperiment in webbased Social Network Analysis PowerPoint PPT Presentation

A miniexperiment in webbased Social Network Analysis - A largely untapped source of information about the life of the Semantic Web Community ... The proof: a ridiculously unscientific experiment in analyzing the social ... | PowerPoint PPT presentation | free to view

Trust, Influence and Bias in Social Media PowerPoint PPT Presentation

Trust, Influence and Bias in Social Media - Your goal is to campaign for a presidential. candidate. How can you track ... Features: the 2, quick 1, brown 1, fox 1, jumped 1, over 1, lazy 1, white 1, dog ... | PowerPoint PPT presentation | free to view

Communities in Social Media An eyepiece into User Intentions and Context Akshay Java eBiquity Resear PowerPoint PPT Presentation

Communities in Social Media An eyepiece into User Intentions and Context Akshay Java eBiquity Resear - Twitter Network. Facebook Network. What is a Community. Existing Approaches. Clustering Approach ... is our collective wisdom. Twitter. is our collective ... | PowerPoint PPT presentation | free to view

Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection PowerPoint PPT Presentation

Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection - ABSTRACT: In this paper, we describe a Semantic Web application that detects Conflict of Interest (COI) relationships among potential reviewers and authors of ... | PowerPoint PPT presentation | free to view

about%20XML/Xquery/RDF PowerPoint PPT Presentation

about%20XML/Xquery/RDF - an XML document: single root element. well formed XML document: if it has matching tags ... sub-tasks (e.g. HTML for rendering) or specific sub-communities ... | PowerPoint PPT presentation | free to view

Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search PowerPoint PPT Presentation

Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search - Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center | PowerPoint PPT presentation | free to view

ROBERT VAN HOUTEN LIBRARY PowerPoint PPT Presentation

ROBERT VAN HOUTEN LIBRARY - UNCOVER CHANGED TO INGENTA. NEW ONLINE CATALOG. DATABASES FOR ENVIRONMENTAL ... RESEARCH LEVEL BOOKS (MONOGRAPH) - REPORTS RESEARCH DONE OVER A PERIOD OF TIME ... | PowerPoint PPT presentation | free to view

Joseph F. McCarthy PowerPoint PPT Presentation

Joseph F. McCarthy - Newsgroups, Ebay, Amazon, Epinions, Meetup.com, Match.com. Case Study: ... Materials: Steel, Plexiglas, motors, custom electronics, see http://www.normill. ... | PowerPoint PPT presentation | free to view

Introduction to Internet Technology and Applications PowerPoint PPT Presentation

Introduction to Internet Technology and Applications - Middleware : CORBA, ODBC, DCOM. Browsing : HTML, DHTML, XML, VRML ... DCOM : Distributed COM of MS. Software Technology : Distributed Objects. 14 ... | PowerPoint PPT presentation | free to view

Scientific Papers Structure and Approach PowerPoint PPT Presentation

Scientific Papers Structure and Approach - Peer reviewed research articles have long played a significant role in science ... Although we focus specifically on peer reviewed research articles, many of the ... | PowerPoint PPT presentation | free to view

COMPSCI 732: Semantic Web Technologies PowerPoint PPT Presentation

COMPSCI 732: Semantic Web Technologies - COMPSCI 732: Semantic Web Technologies Semantic Web Architecture In the next s, there s the discussion on the parts of the stack. Examples refer to this ... | PowerPoint PPT presentation | free to view

Research Presentation PowerPoint PPT Presentation

Research Presentation - Dynamic Homepage, XML, 3D Shopping Mall. Authoring Tool, VRML ... BargainFinder, Jango. Negotiation Agent. AuctionBot, Kasbah, Tete-a-Tete ... | PowerPoint PPT presentation | free to view

2004 Heartland Symposium St. Louis, MO PowerPoint PPT Presentation

2004 Heartland Symposium St. Louis, MO - More Job Listings than the Private Sector Job Banks ... Post unlimited job listings and search the resume database @ no cost ... Employer Contact Listings) ... | PowerPoint PPT presentation | free to view

Informationswissenschaft PowerPoint PPT Presentation

Informationswissenschaft - they need including peer-reviewed articles, author home pages and university sites ... www.scirus.com, scirus white paper. FAST Search & Transfer (http://www. ... | PowerPoint PPT presentation | free to view

Second ISO-NE Information Technology (IT) Stakeholders Forum PowerPoint PPT Presentation

Second ISO-NE Information Technology (IT) Stakeholders Forum - Second ISO-NE Information Technology (IT) Stakeholders Forum September 9, 2004 ISO New England Opening Remarks Welcome Message and Agenda Jamshid Afnan, Vice ... | PowerPoint PPT presentation | free to view

Preparing for NIH R01 Submissions with Grants'gov PowerPoint PPT Presentation

Preparing for NIH R01 Submissions with Grants'gov - Renewal is equivalent to a Competing Continuation. Continuation is equivalent to a Progress Report. ... agencies, the box for Continuation will not be used and ... | PowerPoint PPT presentation | free to view

Preparing for NIH Electronic Grant Applications Weve Come a Long Way Dec. 5, 2006 PowerPoint PPT Presentation

Preparing for NIH Electronic Grant Applications Weve Come a Long Way Dec. 5, 2006 - Charles Selden, Extramural Staff Training Officer, STEP Program Director. Megan Columbus, NIH Program Manager for Electronic Submission ... | PowerPoint PPT presentation | free to view

Revenue model for virtual communities PowerPoint PPT Presentation

Revenue model for virtual communities - engage members who become users, builders and buyers. Lock in traffic. Stage 1 : generate traffic ... Builds a brand image as reliable guide ... | PowerPoint PPT presentation | free to view

Through the Bytes Darkly, PowerPoint PPT Presentation

Through the Bytes Darkly, - 3. The Data Farm Experiment: Tools That Serve Access Can Also Serve ... Number of pages, forms and directories constituting the library web site.32,000. Inputs ... | PowerPoint PPT presentation | free to view

Video Marketing Revolution Review & GIANT bonus packs PowerPoint PPT Presentation

Video Marketing Revolution Review & GIANT bonus packs - http://crownreviews.com/video-marketing-revolution-review-bonus | PowerPoint PPT presentation | free to view

WordRecon review in detail and (FREE) $21400 bonus PowerPoint PPT Presentation

WordRecon review in detail and (FREE) $21400 bonus - http://crownreviews.com/wordrecon-review-and-bonus/ | PowerPoint PPT presentation | free to view

Center for Engineering Systems Fundamentals (CESF): The Beginnings PowerPoint PPT Presentation

Center for Engineering Systems Fundamentals (CESF): The Beginnings - Recommendation of Review Committee and of ESD Faculty Members via ESD Strategic Plan ... Mammoth dense urban housing projects (St. Louis Pruitt-Igoe, Boston Columbia ... | PowerPoint PPT presentation | free to view

Through the Bytes Darkly, PowerPoint PPT Presentation

Through the Bytes Darkly, - Through the Bytes Darkly, | PowerPoint PPT presentation | free to view

Semantic Web Portal PowerPoint PPT Presentation

Semantic Web Portal - Semantic Web Portal. Ching-Long Yeh ???. Department of Computer Science and Engineering ... As these communities are created, portal owners strive to offer a ... | PowerPoint PPT presentation | free to view