Title: ??a?e???S? Web ?e??e??
1??a?e???S? Web ?e??e??µe??? G??SS??a ???a?e?a
- Information Extraction(??a???? ?????f???a?)
2?????f???a ?e?µ????
- User Generated Content µ? d?µ?µ??? p????f???a
?e?µ????(UGC) - ?????? e?a????? d?µ?µ???? p????f???a? ??a
epe?e??as?a, d?a?e???s? ?a? e?????? ded?µ???? - ??a???? ?????f???a?
3??a???? ?????f???a?
S?µp????s? ped??? ?? ap? µ??? t?? ?e?µ????
October 14, 2002, 400 a.m. PT For years,
Microsoft Corporation CEO Bill Gates railed
against the economic philosophy of open-source
software with Orwellian fervor, denouncing its
communal licensing as a "cancer" that stifled
technological innovation. Today, Microsoft
claims to "love" the open-source concept, by
which software code is made public to encourage
improvement and development by outside
programmers. Gates himself says Microsoft will
gladly disclose its crown jewels--the coveted
code behind the Windows operating system--to
select customers. "We can be open source. We
love the concept of shared source," said Bill
Veghte, a Microsoft VP. "That's a super-important
shift for us in terms of code access. Richard
Stallman, founder of the Free Software
Foundation, countered saying
4??a???? ?????f???a?
S?µp????s? ped??? ?? ap? µ??? t?? ?e?µ????
October 14, 2002, 400 a.m. PT For years,
Microsoft Corporation CEO Bill Gates railed
against the economic philosophy of open-source
software with Orwellian fervor, denouncing its
communal licensing as a "cancer" that stifled
technological innovation. Today, Microsoft
claims to "love" the open-source concept, by
which software code is made public to encourage
improvement and development by outside
programmers. Gates himself says Microsoft will
gladly disclose its crown jewels--the coveted
code behind the Windows operating system--to
select customers. "We can be open source. We
love the concept of shared source," said Bill
Veghte, a Microsoft VP. "That's a super-important
shift for us in terms of code access. Richard
Stallman, founder of the Free Software
Foundation, countered saying
CEO Microsoft Bill Veghte VP
Microsoft Richard Stallman founder Free
5??a???? ?????f???a?
??a???? ?????f???a? segmentation
classification clustering association
October 14, 2002, 400 a.m. PT For years,
Microsoft Corporation CEO Bill Gates railed
against the economic philosophy of open-source
software with Orwellian fervor, denouncing its
communal licensing as a "cancer" that stifled
technological innovation. Today, Microsoft
claims to "love" the open-source concept, by
which software code is made public to encourage
improvement and development by outside
programmers. Gates himself says Microsoft will
gladly disclose its crown jewels--the coveted
code behind the Windows operating system--to
select customers. "We can be open source. We
love the concept of shared source," said Bill
Veghte, a Microsoft VP. "That's a super-important
shift for us in terms of code access. Richard
Stallman, founder of the Free Software
Foundation, countered saying
Microsoft Corporation CEO Bill Gates Microsoft Gat
es Microsoft Bill Veghte Microsoft VP Richard
Stallman founder Free Software Foundation
??a???? ???µat???? ??t?t?t??
6??a???? ?????f???a?
??a???? ?????f???a? segmentation
classification association clustering
October 14, 2002, 400 a.m. PT For years,
Microsoft Corporation CEO Bill Gates railed
against the economic philosophy of open-source
software with Orwellian fervor, denouncing its
communal licensing as a "cancer" that stifled
technological innovation. Today, Microsoft
claims to "love" the open-source concept, by
which software code is made public to encourage
improvement and development by outside
programmers. Gates himself says Microsoft will
gladly disclose its crown jewels--the coveted
code behind the Windows operating system--to
select customers. "We can be open source. We
love the concept of shared source," said Bill
Veghte, a Microsoft VP. "That's a super-important
shift for us in terms of code access. Richard
Stallman, founder of the Free Software
Foundation, countered saying
Microsoft Corporation CEO Bill Gates Microsoft Gat
es Microsoft Bill Veghte Microsoft VP Richard
Stallman founder Free Software Foundation
7??a???? ?????f???a?
??a???? ?????f???a? segmentation
classification association clustering
October 14, 2002, 400 a.m. PT For years,
Microsoft Corporation CEO Bill Gates railed
against the economic philosophy of open-source
software with Orwellian fervor, denouncing its
communal licensing as a "cancer" that stifled
technological innovation. Today, Microsoft
claims to "love" the open-source concept, by
which software code is made public to encourage
improvement and development by outside
programmers. Gates himself says Microsoft will
gladly disclose its crown jewels--the coveted
code behind the Windows operating system--to
select customers. "We can be open source. We
love the concept of shared source," said Bill
Veghte, a Microsoft VP. "That's a super-important
shift for us in terms of code access. Richard
Stallman, founder of the Free Software
Foundation, countered saying
Microsoft Corporation CEO Bill Gates Microsoft Gat
es Microsoft Bill Veghte Microsoft VP Richard
Stallman founder Free Software Foundation
8??a???? ?????f???a?
??a???? ?????f???a? segmentation
classification association clustering
October 14, 2002, 400 a.m. PT For years,
Microsoft Corporation CEO Bill Gates railed
against the economic philosophy of open-source
software with Orwellian fervor, denouncing its
communal licensing as a "cancer" that stifled
technological innovation. Today, Microsoft
claims to "love" the open-source concept, by
which software code is made public to encourage
improvement and development by outside
programmers. Gates himself says Microsoft will
gladly disclose its crown jewels--the coveted
code behind the Windows operating system--to
select customers. "We can be open source. We
love the concept of shared source," said Bill
Veghte, a Microsoft VP. "That's a super-important
shift for us in terms of code access. Richard
Stallman, founder of the Free Software
Foundation, countered saying
Microsoft Corporation CEO Bill Gates Microsoft Gat
es Microsoft Bill Veghte Microsoft VP Richard
Stallman founder Free Software Foundation
9??a???? ?????f???a? ap? pe??e??µe??
10?p?s??p?s? ?eµ?t??
- ??µata ??a t?? e?a???? p????f???a?
- ?p?s?µe??s? ??t?t?t??
- ??a???? s?s?et?se??
- ??a???? ?e????t??
- ???µ???s? e?a????? p????f???a?
- ???µ???s? ??a ded?µ??a µe????? ?????
- ?????t? ??t?µata ???µ???s??
11??µata ??a????? ?????f???a?
- ??a???? ??t?t?t?? ?a? s?s?et?se??
- ??t?t?te? ???µat???? ?a? ?e?????
- S?s?et?se?? s??des? ??t?t?t??
- Ge????ta ap?te????ta? ap? p?e??de? p?????
s??se?? - ??µata e?a?????
- ???epe?e??as?a d?a????sµ?? p??t?se??, s??tat???
a????s? - ??µ??????a ?a????? ? e?a???? p??t?p??
?e????a?t???, µ??a???? µ???s?, ?ß??d??? - ?fa?µ??? p??t?p?? ? ?a????? ??a e?a???? ??a?
p????f???a? - ?ste??-epe?e??as?a ?a? e?s?µ?t?s? p????f???a?
12?p?s?µe??s? ??t?t?t??
- ?????e?s? a?af???? se ??t?t?te? st? ?e?µe?? (p.?.
???µata a????p??, t?p??es?e?,?t?) - ?e????a?t??? vs. ?e?????? µ??a????? µ???s??
- ? ß??t?st? p??s????s? e?a?t?ta? ap? t?? t?p?
??t?t?t?? ?a? t? ped?? - ??e?st?? t?p?? (p.?., ?e???af???? pe??????,
???µata as?e?e???) ?e????a?t??? ?e???? - S??ta?t??? (p.?., t??ef?????? a???µ??,
ta??d??µ???? ??d??e?) regular expressions - S?µas???????? (p.?., ???µata a????p??)
s??d?asµ?? pe??e??µ????, s??ta?t???? ?????sµ?t??,
?e?????, ?a.
13?a??de??µa e?a????? ??t?t?t??
Ronald Fagin, Combining Fuzzy Information from
Multiple Systems, Proc. of ACM SIGMOD, 2002
Segment(si) Sequence Label(si)
S1 Ronald Fagin Author
S2 Combining Fuzzy Information from Multiple Systems Title
S3 Proc. of ACM SIGMOD Conference
S4 2002 Year
14?e????a?t???? µ???d??
- ?p?d?t???? ??a ???sµ??e? pe??pt?se?? (p.?
a?a?????s? t?µ??, ta??d??µ???? ??d??a, ???µata
s??ed????, ?t?) - ??t?µata ???µ???s??
- ??p?ast??? e??as?a
- Domain-specific
- Corpus-specific
- ????ß? t? ta???asµa t?? ?a?????
IBM Avatar
15?e?????? ???a????? ????s??
- ?p?te?esµat???? ?ta? ?p?????? µe???a s?µata
e?pa?de?s?? - ?p?t?p????? s???eta p??t?pa p?? e??a? d????? ?a
??d???p??????? ?e????a?t??? - ?? µ?a a???????s? e??a? ?et??? ? a???t???
- ????? t?p???? e?a?t?se??
16???t??a a?apa??stas?? Cohen and McCallum, 2003
17??a???? S?s?et?se??
Disease Outbreaks relation
Date Disease Name Location
Jan. 1995 Malaria Ethiopia
July 1995 Mad Cow Disease U.K.
Feb. 1995 Pneumonia U.S.
May 1995 Ebola Zaire
Relation Extraction
18?e?????? e?a????? s?s?et?se??
- ???a???? ????s?
- ?p?pte??µe?? e?pa?de?s? s?st?µat?? se
?e????a?t??? ep?s?µe??µ??a ded?µ??a - ?µ?-ep?pte??µe?? e?pa?de?s? s?st?µat?? µe
bootstrapping ap? seed pa?ade??µata - ?ß??d??? ? d?ad?ast??? s?st?µata
- ??d???? a????ep?d???? µe a??????µ??? µ??a?????
µ???s?? ??a ?a d?????s??? epa?a???pt??? ?a???e?
?a? p??t?pa - ?? d?ad??se?? pe????e???? pa?ade??µata
ep?s?µe??s??, ?a???e? t??p?p???s?? ? s??d?asµ???
19??a???? Ge????t??
- ?a??µ??a µe t?? e?a???? s?s?et?se??, a???
- ?a ?e????ta µp??e? ?a e??a? eµf??e?µ??a
- ?e?a??te?? p???p????t?ta
- S???? apa?te?ta? ep???s? s??-a?af????,
ap?saf???s? ?a? s?µpe?asµ?? - ?.?.integrated disease outbreak event
20??????se?? st?? e?a???? ?e????t??
- ? p????f???a ß??s?eta? se p???? ?e?µe?a
- ?p??se? ? ?a??asµ??e? t?µ??
- S??d?asµ?? p?e??d?? ??a s???eta ?e????ta
- ?p??s?a µ??ad???? ??e?d??? ??a t?? ?µad?p???s?
d?p??t?p?? ?at? t? d?a????sµ? pa??µ???? a???
d?af??et???? ??t?t?t?? - ?s?fe?a d?af??et???? ??t?t?te? µe ????? ???µa
21???µ???s? e?a????? p????f???a?
- ??ast?se?? ???µ???s??
- ???e??? ded?µ????
- ????ß? ? efa?µ??? ?a?????/p??t?p??
- ?p?te?esµat???? t??p?? ep?????? s?et???? ?e?µ????
- ???sßas?µ?t?ta ?e?µ????
- ???at?? ?st?? p??sßas? µ?s? d?epaf??
- ???aµ??? ded?µ??a
- ?te?????e?a p????
- ????ß? ? e?µ???s? p??t?p?? ??a ???e p???
- ?pa?t???ta? p????? ?a???e?
- ??af???p???s? ped???
- ??a???? p????f???a? ap? ???e ped??
22?p?d?t??? ??a???? ?????f???a?
Output Tuples
Extraction System
Text Database
- ????t?s? ?e?µ????
- ??a???? p?e??d??
- ?pe?e??as?a ?e?µ????
- ?a???a? 80/20 ????? ?a? ap??? ?a???e? ??a t??
e?a???? t?? pe??ss?te??? st??µ??t?p?? - ??pa?de?s? ?at??????p???t? ??a apa???f? t?? µ?
s?et???? ?e?µ???? ????? e??tas? - ??aµ???asµ?? ?????? ep?s?µe??se?? (et???te?
??t?t?t??) ??a p???ap??? e??as?e? e?a?????
23?pa?a??pt??? d?e????s? s??????
??µ??????a ???t?µat??
Output Tuples
Text Database
Extraction System
?pe?e??as?a a?a?t????t?? ?e?µ????
??e????s? p?e??d??
???t?s? st? ?? µe p?e??de? Ebola AND Zaire)
??a???? p?e??d??
(p.?., ltMalaria, Ethiopiagt)
- Execution time Retrieved Docs (R P)
Queries Q
?????? a???t?s?? ?e?µ????
?????? ap??t?s?? e??t?µat??
?????? epe?e??as?a? ?e?µ????
24???s?t?s? µ?s? e??t?µ?t??
G??f?? p??s?t?s??
ltSARS, Chinagt
ltEbola, Zairegt
ltMalaria, Ethiopiagt
t1 a?a?t? t? ?e?µe?? d1 p?? pe????e? t? t2
ltCholera, Sudangt
ltH5N1, Vietnamgt
??? ???? a?????s?? ?a?????eta? ap? t? µ??e???
t?? p?? s??dedeµ???? st???e???
25???a p??s?t?s??
User-Provided Seed Tuples
Seed Sampling
- ???s?t?s? de??µat?? ?e?µ???? µe µ????? a???t???
?a? µ????? ?et??? pa?ade??µata - ?p?s?µe??s? de??µ?t?? ?e?µ???? ???s?µ?p????ta? t?
s?st?µa e?a????? ?? µa?te?? - ??pa?de?s?? ?at??????p???t?? ??a t?? a?a?????s?
???s?µ?? ?e?µ???? - ??µ??????a e??t?µ?t?? ap? t??? ?a???e?
Information Extraction
Classifier Training
Query Generation