Information Retrieval - PowerPoint PPT Presentation

About This Presentation
Title:

Information Retrieval

Description:

Information Retrieval – PowerPoint PPT presentation

Number of Views:120
Slides: 78
Provided by: iamsuchao
Category:
Tags:

less

Transcript and Presenter's Notes

Title: Information Retrieval


1
??????(Information Extraction)
2
????
  1. ????(IE)??????
  2. ??????????
  3. ??????????
  4. ??????????
  5. ???????????

3
1.????(IE)??????
  • ??CLEF????
  • A Co-operative Clinical E-Science Framework
    (CLEF)
  • Funded by the UK Medical Research Council
  • Descriptive information
  • ??(Clinical histories)
  • ??????(radiology reports)
  • ?????(pathology reports)
  • ???????????(annotations on genomic and image
    databases)
  • ????(technical literature)
  • ????(Web based resources).....

4
????
ROYAL MARSDEN NHS TRUST -
PATIENT CASE NOTE
324A621FMRS Dorothy Smith
DOB 12/05/44 21, Park
Crescent
Basingstoke B12 Q13 16 Dec 1992 Seen
in General Surgical   This lady who has had a
mastectomy and left open capsulotomy and
removal of her prosthesis was seen by me in the
clinic today on behalf of Mr Peterson. She
has extensive bony lymphoedema in her left
arm which does not seem to be getting any better
although she is more or less reconciled to
the problem. The original problem was that
she complained of shooting pain in the
direction of ulna nerve and although there does
not seem to be any evidence of local,
regional or distant recurrence the pain
itself warrants management in a pain clinic. Mrs
Smith could be seen in the pain clinic at
the Marsden but as this would involve a lot
of travelling would like to be treated nearer her
home. I wonder whether it would be possible
for you to investigate if there is a pain
clinic available at Basingstoke as I am
sure Dotty could be treated and benefit from its
management. I have otherwise arranged for
her to be seen in the clinic again in a
year's time. There are no signs of recurrence
at this time.  Mr Thomas Partridge
5
????????????
NHS TRUST - PATIENT CASE
NOTE

DOB 1944
CLEF-RMH-Entry-Key 52A4F6DB2B46E AB 1992
Seen in General Surgical   This lady
who has had a mastectomy and left open
capsulotomy and removal of her prosthesis
was seen by me in the clinic today on behalf
of XXXXXXXXXXX. She has extensive bony
lymphoedema in her left arm which does not
seem to be getting any better although she
is more or less reconciled to the problem. The
original problem was that she complained of
shooting pain in the direction of ulna nerve
and although there does not seem to be any
evidence of local, regional or distant recurrence
the pain itself warrants management in a
pain clinic. XXXXXXXXX could be seen in the
pain clinic at the XXXXXXX but as this would
involve a lot of travelling would like to be
treated nearer her home. I wonder whether it
would be possible for you to investigate if
there is a pain clinic available at XXXXXXXXXXX
as I am sure XXXXX could be treated and
benefit from its management. I have
otherwise arranged for her to be seen in the
clinic again in a year's time. There are no signs
of recurrence at this time.
5213A4F612F1
??????????????????????????
Interventions(????)
Problems(?????)
Problem Site(????)
Locations(????)
Time(????)
6
????????????

????????
????????
Interventions
Problems
Problem Site
Locations
Time
7
????

???????????? What happened why ? What was done
why ?
????
caused_by
???????
8
????
?????????????
CLEF-RMH-Entry-Key 52A4F6DB2B46E
Maria Sklodowska-Curie
12.10.20 Coryza chest NAD reassure13.10.20
URTI wheezy amoxycillin20.10.20 Anxiety lump
under arm staging scan24.10.21 PEFR 300
10.11.21 PEFR 400 CXR requested12.11.21 CXR
Basal Consolidation erythromycin27.11.21
Chest clear 07.03.30 Depression recurrence
Paroxetine19.04.30 WCC OK01.06.31 rpt Rx
paroxetine18.10.31 Pain L arm
coproxamol03.03.31 Viral URTI PEFR 350
salbutamol04.03.34 WCCAbnormal 30.05.34
BP, ECG NAD
9
  • ?????????????????
  • ??????(NLP)
  • ??????(HLT)
  • ??????(CL)
  • ????(KE)
  • ????(KM)
  • ????(Semantic Web)
  • ????(Agent Based Computing)
  • Web??(Web Intelligence)

10
  • ??????????????(Knowledge Technologies)????
  • ????
  • ????
  • ????????
  • ???????
  • ????
  • ????
  • ????????
  • ????

11
  • ??????
  • ??KDD?Data Mining???????????(????????)????????
  • ????????(NLP)?????(Text Mining)?????????????????(?
    Word?HTML??PDF??)????????????????????????????????
    ,????????????????

12
?????????????????????
13
1.????(IE)
  • ????(Information Extraction)??????,??????????????
    ???

14
1.????(IE)??????
  • Hamish Cunningham
  • Information Extraction (IE) is a technology based
    on analysing natural language in order to extract
    snippets of information.
  • ?????????/?????
  • ????????
  • ?????????????(??)
  • ????????????
  • ???????
  • ??????????????????
  • ???????,???????????

15
  • Douglas E. Appelt?
  • ???????????
  • ??????????(???)????????(??)?????????
  • ??????????????????????,??????????????????????

16
  • ?????????????????????,?????????

17
  • ??????????????????(?),??????(?)?

18
  • ?????????????
  • ?????
  • ????????????????????????????,?????????(bags of
    words),????????????????????????????????,?????????
    ????????????????
  • ????????????????,??????????????,??????????????,???
    ???????????????????

19
????
  1. ????(IE)??????
  2. ?????????????
  3. ??????????
  4. ??????????
  5. ???????????
  6. ???????????

20
2.?????????????
  • IE??????????????
  • MUC(Message Understanding for Comprehension)
  • MET(Multilingual Entity Task Evaluation)
  • ACE(Automatic Content Extraction)
  • DUC(Document Understanding Conferences)
  • TDT ......

21
2.1MUC
  • MUC??IE,??TREC??IR
  • ??????MUC?Message Understanding
    Conference?Message Understanding Competition
  • 20??80??????????DARPA(Defense Advanced Research
    Projects Agency) ??

22
2.1MUC
  • MUC???????????????????,????????????,???????????
    ???????????
  • ????7?
  • ???MUC 1-2??????????????
  • 20??90?????MUC 3-7????????????,???????????????????
    ?????
  • MUC??????????????????????????????????????????

23
2.2MET
  • MET Multilingual Entity Task Evaluation
  • ??DARPA??????????
  • MET????????????????????????????????
  • MET-1?MET-2?????1996??1998???

24
2.3ACE
  • ACE (Automatic Content Extraction)
  • ????????????(NSA),???????????(NIST),???????(CIA)??
    ???
  • ??????????????
  • ?????????
  • ??ASR(???????)???????
  • ????OCR(??????)???????,
  • ????
  • ??????????????,????????????????????
  • ?????????????????,???????????

25
2.3ACE
  • ????5?
  • ACE Phase-1(1999.7-2000.12)?????????????(EDT,
    Entity Detection and Tracking) ?
  • ACE Phase2(2001-??)???EDT RDC???RDC?Relation
    Detection and Characterization?ACE????????????????
    ????,???????????,?????????????????????

26
2.4 DUC
  • DUC,Document Understanding Conferences
  • ??DARPA?TIDES (Translingual Information
    Detection, Extraction, and Summarization
    program)???????????????????????
  • ??2000?,?????DUC 01-06,DUC 2007??????
  • ??,??????????????????NIST????

27
2.5 TDT
28
????
  1. ????(IE)??????
  2. ?????????????
  3. ??????????
  4. ??????????
  5. ???????????
  6. ???????????

29
3.??????????
  • MUC??????????????????????????,????????????????
  • NE???????(Named Entity Recognition)
  • MET??????????(Multi-lingual Entity Task)????
  • TE?????(Template Element)????
  • CO???(Coreference)????
  • TR?????(Template Relation)?????
  • ST?????(Scenario Template)?????

30
3.1 NE
  • NE(Named Entity Recognition)??????
  • ???????,??????????????????,?????????????
  • MUC??????????,???,??,??,??,????????????(?????,???)
    ,???????????????
  • NE???????????,?????????????????????,??????????????
    ?????????????,NE?????????????

31
3.1NE
  • The shiny red rocket was fired on Tuesday. It is
    the brainchild of Dr. Big Head. Dr. Head is a
    staff scientist at We Build Rockets Inc.
  • NE entities are "rocket", "Tuesday", "Dr. Head"
    and "We Build Rockets"

32
3.2 MET
  • MET(Multi-lingual Entity Task)?????????????
  • MET?????????????????,??????????????????,??????????
    ??????????????

33
3.3 TE
  • TE????(Template Element)????
  • TE????????????????????????????????????????????????
    ??????,?????????????????,??????

34
3.3 TE
  • ?MUC???,TE?????????????????????,??????????????????
    ??????
  • ??????????????,?????????????????????????????????

35
3.3 TE
  • The shiny red rocket was fired on Tuesday. It is
    the brainchild of Dr. Big Head. Dr. Head is a
    staff scientist at We Build Rockets Inc.
  • NE entities are "rocket", "Tuesday", "Dr. Head"
    and "We Build Rockets"
  • TE the rocket is "shiny red" and Dr. Head's
    brainchild.

36
3.4 CO
  • CO ??(Co-reference)????
  • CO?????NE?TE???,????????????????????
  • ??
  • ?????????????Tony Blair,The premier minister
  • ??????????????????

37
3.4 CO
  • ?MUC?,CO???????,?????????TE?ST(???)?????
  • CO????????????????????????????
  • ??
  • ???????????????
  • ?????????????????
  • ???????????

38
3.4 CO
  • The shiny red rocket was fired on Tuesday. It is
    the brainchild of Dr. Big Head. Dr. Head is a
    staff scientist at We Build Rockets Inc.
  • NE entities are "rocket", "Tuesday", "Dr. Head"
    and "We Build Rockets"
  • TE the rocket is "shiny red" and Head's
    "brainchild".
  • CO "it" refers to the rocket "Dr. Head" and
    "Dr. Big Head are the same

39
3.5 TR
  • TR????(Template Relation)
  • TR???TE??????????????????
  • TR?MUC-7????????,????????????????
  • ??
  • ??????????(employee_of)
  • ????????????(product_of)
  • ????????????(location_of)
  • etc

40
3.5 TR
  • The shiny red rocket was fired on Tuesday. It is
    the brainchild of Dr. Big Head. Dr. Head is a
    staff scientist at We Build Rockets Inc.
  • NE entities are "rocket", "Tuesday", "Dr. Head"
    and "We Build Rockets"
  • CO "it" refers to the rocket "Dr. Head" and
    "Dr. Big Head are the same
  • TE the rocket is "shiny red" and Head's
    "brainchild".
  • TR Dr. Head works for We Build Rockets Inc.

41
3.6 ST
  • ST ????(Scenario Template)
  • ST???????????????????????????????????
  • ST?????????????????,??????????????????????,???????
    ????,?????????????

42
3.6 ST
  • The shiny red rocket was fired on Tuesday. It is
    the brainchild of Dr. Big Head. Dr. Head is a
    staff scientist at We Build Rockets Inc.
  • NE entities are "rocket", "Tuesday", "Dr. Head"
    and "We Build Rockets"
  • CO "it" refers to the rocket "Dr. Head" and
    "Dr. Big Head are the same
  • TE the rocket is "shiny red" and Head's
    "brainchild".
  • TR Dr. Head works for We Build Rockets Inc.
  • ST a rocket launching event occurred with the
    various participants.

43
????
  1. ???????(IE)
  2. ??????????
  3. ??????????
  4. ??????????
  5. ???????????
  6. ???????????

44
4 ??????????
  • 4.1GATE
  • 4.2????
  • 4.3????
  • 4.4????
  • 4.5????

45
4.1 GATE
  • GATE (General Architecture for Text Engineering)
  • 1995??,University of Sheffield
  • ?Java???????????
  • ?????Unicode
  • GATE?????????XML? RTF?Email?HTML?SGML???????

46
4.1 GATE
  • Gate?????,??????????????????
  • ?????
  • ????????
  • ???????

47
?????
  • ???????????????,???????????
  • ???????(Format Detection)
  • ????(Tokenisation)
  • ?? (Word Segmentation)
  • ????(Sentence Splitting)
  • ????(POS tagging)
  • ???????,?????????????????????,????????????,???????
    ??????????????????

48
??????
  • ??????????????????,???????????????
  • ??????,??????????????????????????????,????????????
    ?????(?Ltd.??????)????????????
  • ????????????????ANNIE??,??JAPE(Java Annotations
    Pattern Engine) ?????,????????????????

49
????
  • ???????????????????????????,??????????????????????
    ?????????????,?????????

50
(No Transcript)
51
4.1 GATE
  • GATE???
  • 1)??????????????,??????????
  • 2)????????????????,??????????????????????????????
  • 3)????????????????????????????,??????????????????
    ?

52
(No Transcript)
53
4.1 GATE
  • GATE?????
  • ?????????????????????E-science????????????????????
    ??????????????????? ?E-science????????????????????

54
4 ??????????
  • ??GATE??,????IE???IE??
  • KIM
  • ArtEquAKT
  • Amilcare
  • Armadillo
  • BioRAT
  • ANP(Arizona Noun Phraser)
  • DELOS WP5 Knowledge Extraction and Semantic
    Interoperability
  • TAKE Toolkit for Agent-based Knowledge
    Extraction
  • SKIFA Distributed Knowledge Extraction Framework
    Based on Semantic Web Services
  • BioMeKe BioMedical Knowledge Extraction project

55
????
  1. ???????(IE)
  2. ??????????
  3. ??????????
  4. ??????????
  5. ???????????

56
5.???????????
  • ??????????
  • GATE????????????
  • ?GATE?????,????????????????????,??????????????

57
5.???????????
  • ??????????

58
5.??????????
  • ?????????????
  • Chinese tokenizing
  • Chinese gazetteers
  • Chinese named entity recognition

59
???????
Chi Tokenizing
Chi IE
??????
Chi Gazetteer
Chi Rules
????
????
60
(No Transcript)
61
?????
  • ?????????
  • ???ICTCLAS???(C)??
  • ??????HMM(????????)
  • ??????N?????????
  • ????
  • Utility           ??????
  • Unknown      ????????
  • Tag              HMM????
  • Segment       ??????
  • Result          ??????
  • Data            ??????
  • res              Windows?????

62
(No Transcript)
63
3.3 ????????
??????
  • ????????????
  • ??????(???????????????????????)?125M

64
3.3 ????????
  • ???95?????,74?????(?????????????????????)
  • ??30??????????(txt?SQL)

65
3.3 ????????
??????? ??
???? 1968
???? 401
???? 2610
?? 455
??????? 1505
???? 257
???? 156
?????? 112
???? 1443
??????? ??
???? 1033
?? 874
?? 5815
?? 4377
???? 1211
?95?????
66
3.3 ????????
??????? ??
???? 110
????? 1309
????? 140
?????? 1241
?????? 288
?????? 147
???? 222
??? 2189
?????? 1003
??????? ??
???? 331
?? 416
????? 210
?? 654
??????? 912
????30????????????37??????????74?????
67
??????
  • GATE??????????????,??JAPE??????
  • ?????????JAPE??

68
??JAPE??
69
??JAPE??
70
?????
71
?????????????
  • ?????
  • ????? ??
  • ??? ??
  • ??

72
  • ?????
  • ?????
  • ???????????? ??

73
??????
  • ??
  • ???????
  • 1.???????yahoo
  • 2.??????
  • 3.???????

74
???????????
  • 1.???????google baidu yahoo
  • ??????
  • 2.????????????????????
  • ???????????????

75
????????
76
???????
77
??????????
Write a Comment
User Comments (0)
About PowerShow.com