AI - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

AI

Description:

Patented multistrategy constructive induction algorithm. Privately held and profitable. ... absorption, distribution, metabolism, excretion, and toxicity ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 33
Provided by: LarryH64
Category:

less

Transcript and Presenter's Notes

Title: AI


1
AI Molecular BiologyA Growing Success Story
2
How has AI been successful in molecular biology?
  • Wide, daily use of AI-based tools by biologists
  • Thriving AI/MolBio community
  • Intelligent Systems for Molecular Biology (ISMB)
    conference now 11 years old, with gt1,000
    attendees
  • Significant scientific publications, e.g.
  • Successful businesses based on AI techniques
  • http//www.medicalscientists.com

3
Medical Scientists, Inc.
  • Predictive modeling in health care cost domain
  • Patented multistrategy constructive induction
    algorithm
  • Privately held and profitable.

4
MolBio even creeping into mainstream AI
  • KDD cup competition last two years involved
    learning in molecular biology domains
  • TREC launched genomics track this year.
  • AI Magazine special issue on MolBio in Spring 04

5
Why success in biology?
  • Big open questions, e.g.
  • Drug design, engineering novel organisms,
    evolution
  • Rich sources of new information about life, e.g.
  • Genome sequencing
  • Expression array chips
  • No common sense issues
  • Everything anyone knows about MolBio is written
    down
  • Significant community investment
  • Biologists built a gene ontology, construct
    curated knowledge bases, and are eager
    consumers of software

6
The Irony of AI MolBio
  • Human understanding of the overwhelming
    complexity of our own genome will require
    partnership with biognostic machines

7
What is a biognostic machine?
  • From the Greek??????(life) and????????(knowing)
  • Two kinds of biognostic machines
  • Instruments that produce data about a living
    things in molecular detail and with genomic
    breadth
  • Bioinformatics systems that bring to bear
    existing knowledge in the computational analysis
    of data

8
Biognostic instruments
  • Gene chips read out the expression of each gene
    in a tissue sample
  • 10,000 genes/chip anddozens of chips per study
  • High throughput SNP genotyping automation
  • Finds millions of tiny genetic differences among
    people

9
Drinking from a firehose
  • 150 published genomes, 19 Eukaryotes (human,
    mouse, wheat, rice, fruit fly, etc.) 798 ongoing
    projects (243 Eukaryotes)
  • 12,661,480 articles in MedLine 12,824 new in the
    last week 372 journals provide free full text
    (gt100,000 full text articles)

10
What AI technologiesare used in bioinformatics?
  • Some of the key AI technologies that have been
    broadly adopted in computational biology
  • Hidden Markov Models
  • Ontologies and related knowledge-based
    computation
  • Clustering, e.g. Self-Organizing Maps
  • Supervised learning, e.g. Support Vector Machines
  • Information extraction / natural language parsing

11
HMMs in molecular biology
  • HMMs (trained with E/M) are the main mechanism
    used to represent patterns in DNA and protein
    sequences

12
The Gene Ontology
  • Actively developed, community curated ontology
    http//geneontology.org
  • About 12,000 defined concepts, in a DAG with two
    link types (part-of, is-a) under three roots
  • Cellular component
  • Biological process
  • Molecular function.
  • Used as annotations for genes (gt80,000 so far),
    HMMs of domain patterns, etc.

13
(No Transcript)
14
A closer look at a biognostic instrument
  • Gene expression arrays (gene chips)
  • Produces 10,000 measurements/chip, generally
    10s-100s of chips/experiment
  • Huge computational challenges
  • Many novel statistical and data management issues
  • Interpretation of results can be overwhelming
    must transcend one gene at a time methods.
  • Linking data to prior knowledge is crucial.

15
What is gene expression?
  • Not all of the genes in a genome are used in all
    circumstances
  • In order for a gene to play a role in a cell, it
    must be expressed.
  • A gene is expressed when the protein it encodes
    is synthesized
  • Transcription of DNA to mRNA is the first step in
    protein production
  • Measuring abundance of mRNA assays the level gene
    expression

16
Expression is central because...
  • Differentiation All cells in a body have the
    same genome. Expression is what differentiates,
    e.g. brain cells from liver.
  • Physiology Cells do their business (dividing,
    sending signals, digesting, etc.) largely via
    changes in expression
  • Response to stimuli Environmental changes (like
    drugs or disease) often cause changes in
    expression
  • Disease markers and drug targets changes in
    expression associated with disease can be
    diagnostic markers and/or suggest novel
    pharmaceutical approaches.

17
Laboratory robotics, too
  • One form of expression array places controlled
    quantities (and shapes) of thousands of different
    DNA sequences on glass slides

18
Statistical challenges!
  • Many basic tools for analysis of expression data
    (normalization, statistical tests, visualization,
    clustering) are open source in the R language,
    see http//bioconductor.org
  • Novel approaches stillneeded, e.g. for multiple
    testing corrections, finding gene-gene
    interaction terms, etc.

19
Clustering approaches
  • Gene expression changes are coordinated, so
    levels should cluster meaningfully, but
  • Clusters change with situation (biclustering)
  • Expression levels have complex correlational
    structure
  • Distance measures unknown
  • Approaches include
  • SOMs (Slonim)
  • PRMs (Koller Friedman)
  • Trajectory clustering

20
Discrimination tasks
  • Given expression array results from e.g. tumors
    that were successfully treated vs. not, develop a
    predictive model
  • High dimensionality,interactions, but
  • Feature selection
  • Support vector machines
  • Interesting kernels!
  • Meet FDA regulations?

21
Understanding expression changes in context
  • Long lists of differentially expressed genes are
    difficult to interpret meaningfully
  • Much knowledge about structure,function and
    interactions of genes
  • Hundreds of public databaseshttp//nar.oupjournal
    s.org/
  • Best information in the literature.
  • Key computational challenge Bring prior
    knowledge to bear on understanding expression
    (and other high-throughput) data

22
Data integration
  • Just tracking down all of the information about a
    list of genes isnt easy
  • Dozens of general and hundreds of specialized
    data sources available (many public free)
  • No universal IDs Sometimes heuristic key
    matching is necessary to link data sources
  • Inference is often required (e.g. about the
    applicability of information from a different
    species).
  • Rapid change as new information becomes available
  • Errors and inconsistencies abound.

23
Semantic interpretation tools
  • Mapping gene lists to the Gene Ontology

24
Literature-based approaches
  • Many active areas of research
  • Information extraction to transform the
    biomedical literature into more computationally
    useful form
  • Information retrieval and presentation making
    large collections of relevant documents
    comprehensible
  • Document meta-analysis finding potential
    linkages among biomolecules from patterns of use
    in documents.
  • Great resources
  • PubMed NLM indexers (e.g. GeneRIFs)
  • Growing full text repositories

25
Meta-analysis for gene-gene interactions
26
Towards The Biological Knowledge-base
  • Inferential potential of a unified knowledge-base
    transcends human ability
  • Even heroic bioscientists cant keep up with
    flood of information as disciplinary boundaries
    break down.
  • Integrated database search isnt enough
  • Semantic issues in integration
  • Meta-analysis
  • Making a compelling story from disparate bits of
    evidence
  • A grand challenge for AI

27
Minsky, AI Common Sense
  • Marvin Minksy in the August 03 Wired on Why AI
    is brain dead
  • There is no computer that has common sense.
    We're only getting the kinds of things that are
    capable of making an airline reservation.
  • The elderly segment of the population is growing
    to the point where there won't be enough doctors,
    nurses, and nurses' aides. We should be working
    to get robots to pick up the slack.
  • I think Marvin has the right diagnosis, but the
    wrong prescription

28
But AI isnt psychology
  • AI should be about general principles of
    intelligence people are just one example
  • Turing test Is this program indistinguishable
    from a person?
  • Human idiosyncracies as the sine qua non of
    intelligence?
  • My alternative approach Is this a mind worth
    wanting to know?
  • Also an approach to the other minds problem

29
Pharmacology as a test of intelligence?
  • Making a contribution to inventing a new drug as
    a test for computational theories of intelligence
  • Lots of existing, declarative background
    knowledge
  • Clear metric for success FDA approval
  • Credit assignment exists (but note Hollywood
    accounting)
  • and improvements in human health riding on it
  • Reasonable incremental tasks
  • Passing graduate pharmacology exams
  • Making contributions to subtasks

30
Pharmacology 101
  • Find a target a naturallyoccurring molecule to
    beenhanced or inhibited
  • Find a lead a drug-likemolecule that
    interacts specifically with the target
  • Optimize find a compound in the same family as
    the lead that is specific and effective enough to
    be a drug
  • ADMET absorption, distribution, metabolism,
    excretion, and toxicity

31
Biognosticopoeia
  • Our first steps
  • Integrate human-curated databases
  • Exploit 10Ms years of effort
  • Requires dynamic and heuristic approaches
  • Extend GO to many other relationships
  • IE from literature using DMAP
  • Explicit representation of procedural computation
    tasks
  • IBM p690 w/ 8x Power4 processors 64GB RAM Lisp
    Machine

32
Come visit!
  • The UCHSC Center for Computational Pharmacology,
    http//compbio.uchsc.edu
  • International Society for Computational
    Biologyhttp//iscb.org
  • Medical Scientists, Inc.http//medicalscientists.
    com
  • Larry HunterLarry.Hunter_at_uchsc.edu
Write a Comment
User Comments (0)
About PowerShow.com