XG Multimedia Semantic News Use Case - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

XG Multimedia Semantic News Use Case

Description:

t:bs.ass numericContext='cn0' 1338066 ... ass.accountingConvenience.changeDem2Eur numericContext='cn0' ... ass.currAss numericContext='cn0' 749385 /t:bs.ass. ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 46
Provided by: dec115
Category:

less

Transcript and Presenter's Notes

Title: XG Multimedia Semantic News Use Case


1
XG Multimedia SemanticNews Use Case
  • Thierry Declerck, DFKI GmbH
  • Language Technology Lab

2
Automatic Semantic Analysis of Metada associated
with News Videos of Broacasting companies
  • On-going work in the projects K-Space and MESH

3
Metadata of News Broadcasters
  • We analysed the metadata available from various
    Broadcasters
  • Their data consists of audio/video material and
    textual metadata. This is a very valuable data
    set, since the textual metadata consists also in
    manually annotated scenes descriptions.
  • This dataset can be used for building a training
    corpus for automated alignment of video, audio
    and text data.
  • In the next slides we see some abstraction over
    the various types of metadata provided.

4
The Metadata Labels
  • ltDOC filename0324000-3_Journal_
    ENG_F4001C_26122003_2000gt
  • ltTYPEgtEarthquake Iranlt/TYPEgt
  • ltSERIESgtJournal F 4001 Clt/SERIESgt
  • ltSEG sidintegergt
  • ltTITLEgtlt/TITLEgt
  • ltDESCRIPTIONgtlt/DESCRIPTIONgt
  • ltSCENESgt lt/SCENESgt
  • ltKEYWORDSgtlt/KEYWORDSgt
  • lt/SEGgt
  • lt/DOCgt

5
The Title Tag
  • ltTITLEgt
  • TdT Erdbeben /Iran/Zerstörungen in
    Bam/Trümmer/Ruinen/Opfer
  • lt/TITLEgt
  • Extract Erdbeben (keyword for disaster
    ontology) location Iran (with NE detection).
    Other terms, but yet still unclear about their
    role

6
The Description Tag
  • ltDESCRIPTIONgt
  • Ein schweres Erdbeben hat im Iran die Stadt Bam
    fast völlig zerstört.
  • lt/DESCRIPTIONgt
  • Linguistic and semantic analysis
  • Subj-NP Ein schweres ltnoun-disastergtErbebenlt/nou
    n-disastergt Vhat LOC-PP in ltne-countrygtIranlt/n
    e-countrygt OBJ-NP die Stadt ltne-citygtBamlt/ne-cit
    ygt ADV fast völlig Vzerstört.
  • Extraction
  • Who (causation)Erbeben (Earthquake)
  • What_action zertören (destroy)
  • What Stadt Bam (city of Bam). Here the system
    can infer that Bam is located in Iran.
  • Where Iran

7
The Scenes Tag
  • ltSCENESgt
  • ltSCENE sid"1"gtBam Menschen sitzen zwischen
    Trümmern auf Bodenlt/SCENEgt
  • ltSCENE sid"2"gtverzweifelte Menschen sitzen am
    Strassenrandlt/SCENEgt
  • ltSCENE sid"3"gtSchuttbergelt/SCENEgt
  • ltSCENE sid"4"gtzerstörte Häuserlt/SCENEgt
  • ltSCENE sid"5"gtrauchende Trümmerlt/SCENEgt
  • lt/SCENESgt
  • Descriptons of sequences of images displayed.
    Extracting related entitiesPeople within ruins,
    desperate people, destroyed houses, smoking ruins
    etc. All those terms can be seen as consequences
    of the earthquake. Important also they provide
    for a description of what is to be seen in the
    video.

8
The Keywords Tag
  • ltKEYWORDSgt
  • Naher Osten Iran Erdbeben
  • lt/KEYWORDSgt
  • The pattern of the content of this tag allows us
    to infer that Iran is located in near-east.

9
Linguistic Knowledge Structures
  • Multiple layers and levels
  • Low-level linguistic features (tokenization,
    morphology, )
  • Semantic properties of terms and phrases
  • Named Entities
  • Relation Extraction (incl. Grammatical Relations)
  • Semantic linking to domain ontologies
  • Can involve several abstraction layers connected
    through reasoning/mapping processes
  • Semantic linking to other media analysis
  • Associated to the domain ontology of MESH
    (natural disasters in the news)

10
Association of Ontologies
11
Semantic annotation of Text extracted from Images
  • (Thierry Declerck, DFKI Andreas Cobet, TUB)

12
Background
  • The data The German Broadcast news programme
    Tagesthemen
  • Extract Text from key frames of shots. Annotate
    those terms semantically
  • Analyse of the position of the text and the kind
    of text extracted. 6 cases detected so far

13
Case1 Above the picture, just a normal phrase,
mostly a nominal phrase (NP)
14
Case 2 Below the picture Name of a person and
of the function of this person
15
Case 3 Below the picture Name of a person and
of a city/country
16
Case 4 Above the picture, just a normal nominal
phrase, and below the picture, name of a person,
17
Case 5 Below the picture the word Bericht (or
similar) and name of Person (gt Journalist)
18
Case 6 A location name. No picture of a
specific human
19
Cross-Media Ontologies
  • The next slides by courtesy of Paul Buitelaar,
    Michael Sintek, Malte Kiesel (DFKI GmbH) from the
    Project SmartWeb. Paper Feature Representation
    for Cross-Lingual, Cross-Media Semantic Web
    Applications, presented at ESWC 2006.

20
Semiotic Triangle
  • See (Ogden Richards, 1923) - based on
  • Structural Linguistics (de Saussure, 1916)
  • philosophical work by Peirce (mostly 19th
    century)

21
Semiotic Triangle the real world
... actual goalkeepers in the real world ...
22
Semiotic Triangle concepts
... actual goalkeepers in the real world ...
23
Semiotic Triangle words
... actual goalkeepers in the real world ...
goalkeeper (EN) Torwart (DE) doelman (NL) ...
24
Semiotic Triangle images
... actual goalkeepers in the real world ...
25
Features
  • Multilingual Features
  • Terms with Linguistic Info and Context Models
  • Example goalkeeper
  • part-of-speech noun
  • morphology goal-keeper
  • context (Google hits stats.) gets420000,
    holds212000, shoots55900,
  • Multimedia Features
  • Images with Feature Models
  • Example goalkeeper
  • color 111111
  • shape human
  • texture keypatch-set 223

26
Representation Proposal
  • Attach multilingual and multimedia features to
    classes and properties (and also instances)
  • use of meta-classes ClassWithFeats and
    PropertyWithFeats with properties lingFeat and
    imgFeat (with ranges LingFeat and ImgFeat)
  • The classes LingFeat and ImgFeat are used for
    complex feature descriptions

rdfsClass
rdfsProperty
rdfssubClassOf
rdfssubClassOf
featClassWithFeatsfeatlingFeatfeatimgFeat
featPropertyWithFeatsfeatlingFeatfeatimgFeat
meta-classes
ifImgFeatifcolor iftexture
lfLingFeatlftermlflang
classes
27
Representation Simplified Example
28
Representation LingInfo Ontology
is-a
is-a
is-a
is-a
is-a
is-a
...
29
Example Instance Fußballspielers (of the
football player)
30
Features Interacting Layers
31
Translating XBRL Into Description Logic
  • Thierry Declerck and Hans-Ulrich Krieger
  • DFKI GmbH

32
Motivation
  • Toward a large intelligent web-based financial
    information and decision support systems in the
    MUSING project
  • Till now a prototype based on XBRL (eXtensible
    Business Reporting Language), as developed within
    the eTen project WINS
  • There we experienced the limitations of the XBRL
    schema, due to the lack of reasoning support over
    XML-based data and information extracted from
    documents.
  • Need to translate XBRL into an ontology

33
XBRL Example Header Metadata
  • lt?xml version"1.0" encoding"iso-8859-1"
    standalone"no"?gt
  • ltgroup xmlns"http//www.xbrl.org/2001/instance"
    xmlnsxsi"http//www.w3.org/2001/XMLSchema-instan
    ce" xmlnst"http//www.xbrl.org/german/ap/ci/2002
    -02-15" xmlnsISO4217"http//www.iso.org/4217"
    xsischemaLocation"http//www.xbrl.org/german/ap/
    ci/2002-02-15 german_ap.xsd"gt
  • ltnumericContext id"cn0" precision"8"
    cwa"false"gt
  • ltentitygt
  • ltidentifier scheme"http//www.xbrl.de/xbrl/sg2
    "gt001lt/identifiergt
  • lt/entitygt
  • ltperiodgt
  • ltstartDategt2001-01-01lt/startDategt
  • ltendDategt2001-12-31lt/endDategt
  • lt/periodgt
  • ltunitgt
  • ltmeasuregtISO4217EURlt/measuregt
  • lt/unitgt
  • ..
  • lt/nonNumericContextgt
  • .
  • lttgenInfo.doc.author nonNumericContext"c2"gtXBRL
    Deutschland e.V.lt/tgenInfo.doc.authorgt
  • lttgenInfo.doc.author.city nonNumericContext"c2"
    gtDüsseldorflt/tgenInfo.doc.author.citygt
  • lttgenInfo.doc.author.compName
    nonNumericContext"c2"/gt

34
XBRLFinancial Data
  • .
  • lttbs.ass numericContext"cn0"gt1338066lt/tbs.assgt
  • lttbs.ass.accountingConvenience
    numericContext"cn0"gt0lt/tbs.ass.accountingConveni
    encegt
  • lttbs.ass.accountingConvenience.changeDem2Eur
    numericContext"cn0"gt0lt/tbs.ass.accountingConveni
    ence.changeDem2Eurgt
  • lttbs.ass.accountingConvenience.startUpCost
    numericContext"cn0"gt0lt/tbs.ass.accountingConveni
    ence.startUpCostgt
  • lttbs.ass.currAss numericContext"cn0"gt749385lt/t
    bs.ass.currAssgt
  • lttbs.ass.currAss.cashEquiv numericContext"cn0"gt
    259760lt/tbs.ass.currAss.cashEquivgt
  • lttbs.ass.currAss.inventory numericContext"cn0"gt
    209343lt/tbs.ass.currAss.inventorygt
  • lttbs.ass.currAss.inventory.advPaymPaid
    numericContext"cn0"gt0lt/tbs.ass.currAss.inventory
    .advPaymPaidgt

35
XBRL to OWL-XBRL
  • XBRL taxonomies make use of XML in order to
    describe the structure of an XBRL document as
    well as to define new datatypes and properties
    relevant to XBRL.
  • Allows to check whether a concrete (business)
    document conforms to the syntactic structure,
    defined in the schema.
  • But a need for languages and tools that go beyond
    the expressive syntactic power of XML Schema.
  • OWL, the Web Ontology Language is the new
    emerging language for the Semantic Web that
    originates from the DAMLOIL standardization. OWL
    still makes use of constructs from RDF and RDFS,
    such as rdfresource, rdfssubClassOf, or
    rdfsdomain
  • Two important variants OWL Lite and OWL DL
    restrict the expressive power of RDFS, thereby
    ensuring decidability.
  • What makes OWL unique (as compared to RDFS or
    even XML Schema) is the fact that it can describe
    resources in more detail and that it comes with a
    well-defined model-theoretical semantics,
    inherited from description logic

36
Actual Experiment with the Sesame DB
  • The basic idea during our (manual) effort was
    that even though we are developing an XBRL
    taxonomy in OWL using Protégé, the information
    that is stored on disk is still RDF at the
    syntactic level. We were thus interested in RDF
    data base systems, wich make sense of the
    semantics of OWL and RDFS constructs such as
    rdfssubClassOf or owlequivalentClass
  • Current experiment with the Sesame open-source
    middleware framework for storing and retrieving
    RDF data. Sesame partially supports the semantics
    of RDFS and OWL constructs via entailment rules
    that compute missing" RDF triples (the deductive
    closure)
  • From an RDF point of view, additional 62,598
    triples were generated through Sesame's deductive
    closure.

37
Example of Entailment RulehasPart relation
  • ltrule name"owl-transitiveProp"gt
  • lt!-- note ?p, ?x, ?y, and ?z are variables --gt
  • ltpremisegt
  • ltsubject var"?p"/gt
  • ltpredicate uri"rdftype"/gt
  • ltobject uri"owlTransitiveProperty"/gt
  • lt/premisegt
  • ltpremisegt
  • ltsubject var"?x"/gt
  • ltpredicate var"?p"/gt
  • ltobject var"?y"/gt
  • lt/premisegt
  • ltpremisegt
  • ltsubject var"?y"/gt
  • predicate var"?p"/gt
  • ltobject var"?z"/gt
  • lt/premisegt
  • ltconsequentgt
  • ltsubject var"?x"/gt

38
A concrete Example of deduced Relation
  • Since we have classied hasPart (as well as
    partOf) as a transitive OWL property, the rule in
    the former slide will fiere, making implicit
    knowledge explicit and produces new triples such
    as
  • ltt_bs, hasPart, t_bs.ass.defTaxgt
  • although only
  • ltt_bs, hasPart, t_bs.assgt
  • ltt_bs.ass, hasPart, t_bs.ass.defTaxgt
  • can be found in the original XBRL specification.

39
Translating the Base Taxonomy
  • In the GermanAP Commercial and Industrial (German
    Accounting Principles) taxonomy
    (http//www.xbrl-deutschland.de/xe news2.htm),
    the file xbrl-instance.xsd specifies the XBRL
    base taxonomy using XML Schema. It makes use of
    XML schema datatypes, such as xsdstring or
    xsddate, but also defines simple types
    (simpleType), complex types (complexType),
    elements (element), and attributes (attribute).
    Element and attribute declarations are used to
    restrict the usage of elements and attributes in
    XBRL XML documents.
  • Since OWL only knows the distinction between
    classes and properties, the correspondences
    between XBRL and OWL description primitives is
    not a one-to-one mapping

40
Business Intelligence in MUSING
  • Next generation Business Intelligence The MUSING
    European RD Project (MUlti-industry,
    Semantic-based next generation business
    INtelliGence). Towards a new generation of
    Business Intelligence (BI) tools and modules
    founded on semantic-based knowledge and content
    systems, enhancing the technological foundations
    of knowledge acquisition and reasoning in BI
    applications.

41
Application Domains in MUSING
  • The breakthrough impact of MUSING on
    semantic-based BI will be measured in three
    strategic, vertical domains
  • Finance, through development and validation of
    next generation (Basel II and beyond)
    semantic-based BI solutions, with particular
    reference to Credit Risk Management
  • Internationalisation, (i.e., the process that
    allows an enterprise to evolve its business from
    a local to an international dimension, hereby
    expressly focusing on the information acquisition
    work concerning international partnerships,
    contracts, investments) through development and
    validation of next-generation semantic based
    internationalisation platforms
  • Operational Risk Management, through development
    and validation of semantic-driven knowledge
    systems for measurement and mitigation tools,
    with particular reference to IT operational risks
    faced by IT-intensive organisations.

42
Processing of Quantitative Data
  • Typical Input Finance reports in PDF

43
PDF to XBRL (OWL-XBRL)
  • Mapping from PDF to HTML/XML
  • Detection in the HTML/XML of relevant layout
    information that helps in reconstructing the
    logical units of the original PDF documents
    (title,header/footer, footnote,tables, free text)
  • Mapping of terms found in the XML version of the
    document to XBRL labels. Disambiguating where
    needed.
  • Checking if all the lines of the PDF documents
    are XBRL compliant. Non-compliant information to
    be saved in a log file. Towards a XBRL checker of
    balance sheets delivered in proprietary formats.
  • Generation of the results of the PDFtoXBRL
    procedure in a multilingual setting

44
Processing of Qualitative Data
  • TURNOVER, INCOME, GROWTH State of revenues, if
    depurated from sales related to Consip contract
    award, which remarkably affected the turnover in
    2003, would have, on the contrary, recorded an
    increase of 3,23 against that microinformatics
    market which recorded an increase of 3,2 (Sirmi,
    january 2005).
  • Task of identifying relevant expressions and to
    classify them

45
Integration of Data
  • The Challenge Merging data and information
    extracted from various types of documents. Also
    in various languages. And in the XG use case,
    especially integrated information from news wires.
Write a Comment
User Comments (0)
About PowerShow.com