Building Data Mining Controls - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Building Data Mining Controls

Description:

Web crawler. Features. Allow user to define a path to crawl the web and extract information ... Web crawler. Application. Example. Building protein-protein ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 13
Provided by: xd
Category:

less

Transcript and Presenter's Notes

Title: Building Data Mining Controls


1
Building Data Mining Controls
  • Xiaodong Li

2
Definition
  • eLAB Combination of informatics and
    high-throughput functional analysis
  • dWEB Web-based portal for managing research
    project, communication, data sharing and resource
    usage
  • iGENE A platform that helps integrate existing
    algorithms into desktop tools
  • User controls (controls) Reusable toolboxes with
    GUI for building software applications

3
iGENE implementation
  • Data manipulation controls
  • Data mining controls
  • Data visualization controls
  • Pathway visualization controls
  • Sequence visualization controls
  • Array visualization controls
  • Chart engine
  • Data computation libraries
  • Linear algebraic equations
  • Interpolation and extrapolation
  • Integration
  • Evaluation
  • Root-finding and nonlinear sets

4
Data mining
Access
Split
Filter
Restriction
MySQL
Partition
Merge
WebQuery
Oracle
Compare
System
Partition
Text file
MSSQL
Join
Extractor
Select
Module
DBI
Load
Excel file
Transfer
Library
Delimited
XML file
Database
Database
File
Tagged
Web Crawler
Sub-control
Regex
iMiner
Text Parser
iWorm
DTE
http//10.112.64.92/OncologyMicroarrayCore/Bioinfo
rmatics/eLAB_RoseDesign.html
5
Information retrieval from free text
  • Features
  • Support extracting data from multiple files with
    different schema
  • Support nested parsing up to 5th levels
  • Support regular expression, delimited and tagged
    extraction algorithms
  • Automatically generate parent-to-child
    relationship and one-to-many reference for
    database schema

6
Information retrieval from free text
  • Application
  • Upload genome data into database
  • Example
  • Loading human/mouse/rat unigene data in 5 hrs
    including sequences and annotations,
    automatically generate mapping between unigene ID
    and genbank accession ID.
  • Loading human genome sequence in 2 hrs including
    contigs ordering, sequences, gene annotation, ID
    mapping, extraction of features such as
    promoters, 5-URT, exons, introns, SNPs, STS,
    repeats, etc.
  • Demo

7
XML parser
  • Features
  • Pull-based SAX parser for fast and memory
    intensive data extraction
  • Automatically generate xml schema and display it
    hierarchically
  • Automatically convert hierarchical structure to
    relational structure and maintain referencing
    relationship

8
XML parser
  • Application
  • Data sharing between applications
  • Data structures for applications
  • Example
  • Implementing MIAME specification
  • Building knowledge base
  • Demo1 Demo2

9
Web crawler
  • Features
  • Allow user to define a path to crawl the web and
    extract information
  • Support nested web pages/links
  • Allow page content filtering and decision-making
    based on pre-defined rules
  • Support cookies setting and pre-authentication
  • Support HTTP pipeline and timed web request

10
Web crawler
  • Application
  • Example
  • Building protein-protein interaction database
    from literature
  • Building customized portal on news
  • Steal proprietary data from public web site (such
    as TransFac) ?
  • Building applications that relay on computation
    from public domain with web interface (such as
    Blast in NCBI)
  • Demo

11
Excel files
  • Features
  • Dynamically determine worksheet count, column
    and row ranges in each worksheet
  • Automatically split header content from table
    content at run time

12
DBI
  • Features
  • Provide integrated GUI for different database
  • Provide same API for different database
Write a Comment
User Comments (0)
About PowerShow.com