Virtual Organizations: Building Interdisciplinary Collaborations - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Virtual Organizations: Building Interdisciplinary Collaborations

Description:

Alan Blatecky, Kevin Gamiel, Xiaojun Guan. Clark Jefferies, Howard Lander ... Clark Jefferies, RENCI. Ethan Lange, Genetics. Andrew Nobel, Statistics. Karen ... – PowerPoint PPT presentation

Number of Views:142
Avg rating:3.0/5.0
Slides: 48
Provided by: ncsa6
Learn more at: http://www.sura.org
Category:

less

Transcript and Presenter's Notes

Title: Virtual Organizations: Building Interdisciplinary Collaborations


1
Virtual Organizations Building Interdisciplinary
Collaborations
  • Dan Reed
  • reed_at_renci.org
  • Chancellors Eminent Professor
  • Vice Chancellor for IT
  • University of North Carolina at Chapel Hill
  • Director, Renaissance Computing Institute

2
Acknowledgments
  • Funding agencies
  • NIH
  • Carolina Center for Exploratory Genetic Analysis
    (CCEGA)
  • NSF
  • TeraGrid Science Gateways
  • State of North Carolina
  • RENCI and ancillary Bioportal support
  • RENCI staff
  • Alan Blatecky, Kevin Gamiel, Xiaojun Guan
  • Clark Jefferies, Howard Lander
  • John Magee, Ruth Marinshaw, Jeff Tilson
  • Lavanya Ramakrishnan
  • And a host of others

3
21st Century Challenges
  • The three fold way
  • theory and scholarship
  • experiment and measurement
  • computation and analysis
  • Supported by
  • distributed, multidisciplinary teams
  • multimodal collaboration systems
  • distributed, large scale data sources
  • leading edge computing systems
  • distributed experimental facilities
  • Socialization and community
  • multidisciplinary groups
  • geographic distribution
  • new enabling technologies
  • creation of 21st century IT infrastructure
  • sustainable, multidisciplinary communities
  • Come as you are response

Computation
Experiment
Theory
4
Exemplar 21st Century Challenges
  • Population growth in sensitive areas
  • severe weather sensitivity
  • national impact
  • geobiology and environment
  • economics and finance
  • sociology and policy
  • Economics and health care
  • longitudinal public health data
  • environmental interactions
  • genetic susceptibility
  • heart disease, cancer, Alzheimer's
  • privacy and insurance
  • public policy and coordination

5
Mean Onset of Alzheimers Disease
  • apolipoprotein (apo)
  • apoE2, apoE3 and apoE4 alleles
  • on chromosome 19
  • apoE4 allele
  • 40 to 60 of Alzheimer's patients
  • not the only cause for Alzheimers
  • apo gene inheritance
  • 25 inherit 1 copy of apoE4 allele
  • Alzheimer's risk increases 4X
  • 2 inherit 2 copies of apoE4 allele
  • Alzheimer's risk increases 10X

1.0
2/3
0.8
2/4
0.6
3/3
Proportion of each
genotype unaffected
0.4
3/4
0.2
4/4
0
60 65 70 75 80 85
Age at onset
Source Alan Roses, GSK
6
Big Questions
Protein sequence and regulation
DNA sequence
Sequence Annotation
Data integration
Network analysis
Pathway simulations
Multi-protein machines
Organs, Organisms and Ecologies
Metabolic pathways and regulatory networks
Bacteria and cells
7
Genetics and Disease Susceptibility
Phenotype 1 Phenotype 2 Phenotype 3
Phenotype 4
Ethnicity Environment
Age Gender
Identify Genes
Pharmacokinetics
Metabolism
Endocrine
Biomarker Signatures
Physiology
Proteome
Transcriptome
Immune
Morphometrics
Predictive Disease Susceptibility
Source Terry Magnuson, UNC
8
PITAC Report Contents
  • Computational Science Ensuring Americas
    Competitiveness
  • A Wake-up Call The Challenges to U.S.
    Preeminence and Competitiveness
  • Medieval or Modern? Research and Education
    Structures for the 21st Century
  • Multi-decade Roadmap for Computational Science
  • Sustained Infrastructure for Discovery and
    Competitiveness
  • Research and Development Challenges
  • Two key appendices
  • Examples of Computational Science at Work
  • Computational Science Warnings A Message Rarely
    Heeded
  • Available at www.nitrd.gov

9
Life Science Lessons from Astronomy
  • Historically, discoveries accrued to those
  • with access to unique data
  • who built next generation telescopes
  • Two things changed
  • growing costs and complexity of telescopes
  • emergence of whole sky surveys
  • The result virtual astronomy
  • discovering significant patterns
  • analysis of rich image/catalog databases
  • understanding complex astrophysical systems
  • integrated data/large numerical simulations

10
International Virtual Observatory
3.
X-ray and Optical Images retrieved via SIA
interface
Chandra SIA
NED Cone Search
Skyview SIA
CADC CNOC Cone Search
DSS SIA
5.
Initial Galaxy Catalog generated via Cone Search
DSS SIA
CNOC SIA
Cluster Galaxy Morphology Analysis Portal
6.
Image cutout pointers merged into catalog
2.
Look up cluster in internally stored catalog
clusters
Morphology Calculation Service
Morphological parameters calculated on grid for
each galaxy
7.
Users Machine
1.
User selects a cluster
User downloads final table and images for
analysis visualization
4.
User launches distributed analysis
8.
web browser
Source Ray Plante, NCSA
11
The Bioinformatics Challenge
  • Challenge
  • the rise of quantitative biology
  • burgeoning bioinformatics data
  • complex analysis and modeling problems
  • education and training in new technologies
  • Reality
  • diverse tools with idiosyncratic interfaces
  • steep learning curves
  • software development by diverse groups
  • distributed, databases with diverse metadata
  • Need
  • integrated, easy-to-use toolset with standard
    interfaces
  • extensible mechanisms that hide idiosyncrasies
  • tool and bioinformatics training
  • The solution
  • bioinformatics infrastructure and coupled
    training

12
Need Simple, Easy-To-Use Tools
  • Genome. Bought the book. Hard to read.
  • Eric Lander

13
Web and Social Processes
  • Google
  • its a search engine, its a verb,
  • Blogs
  • published self-expression
  • Instant Messenger
  • social networks
  • Wireless messaging
  • semi-synchronous
  • Internet commerce
  • the dot.com boom/bust
  • EBay, Amazon
  • Spam, phishing,
  • anti-social behavior

14
Benefits of Standards
  • Interoperability
  • Separation of concerns
  • Reuse
  • Independence
  • Dependability
  • Sharing
  • Commonality
  • Shared knowledge base
  • knowledge reuse
  • simplification (one hopes)

15
Grids of All Flavors
16
Whats A Grid/Web Service?
http//
Web Uniform access to documents
http//
Software catalogs
Grid/Web Services Flexible, high-performance
access to resources and services for distributed
communities
Computers
Sensors and instruments
Colleagues
Data archives
17
Grid History I-Way at SC95
  • A prototype national infrastructure
  • 17 sites, connected by
  • vBNS and six other ATM networks
  • 60 applications
  • Features
  • I-POPs for site access
  • Kerberos authentication
  • manual scheduling
  • distributed communication libraries
  • Experiences
  • led to Globus Grid toolkit
  • Concurrent industry needs
  • led to web services for B2B interoperation

18
Web Services Commercial Grids
  • From browser-centric to service-centric
  • from human-computer to computer-computer
  • structured negotiation and response
  • Workflow creation and management
  • end-to-end service negotiation
  • inter-organizational interaction
  • Prerequisites
  • metadata standard for service descriptions
  • standard communication mechanisms
  • resource discovery and registration

19
eBay Web Services Architecture
  • Over 40 of eBay's listings are now via API calls

Source IBM
20
Web Services A Definition
  • A web service is designed to support
    interoperable machine-to-machine interaction over
    a network. It has an interface described in a
    machine-processable format (specifically WSDL).
    Other systems interact using its description
    using SOAP-messages, using HTTP with an XML
    serialization ....
  • W3C Working Draft, August 2003

SOAP
SOAP
WSDL
UDDI
SOAP
  • SOAP (Simple Object Access Protocol)
  • WSDL (Web Services Description Language)
  • UDDI (Universal Description, Discovery and
    Integration)

21
Technology Push
Source Gartner Group
22
European myGrid Architecture
Source www.mygrid.org
23
The Bioinformatics Challenges
  • Complex, multilevel models
  • integration and in silico designs
  • Information visualization
  • complexity and scale
  • Data models and ontologies
  • community definition
  • Data federation, storage and management
  • shared access and support
  • User access portals
  • web-based tool and service interfaces
  • Packaging, distribution and deployment
  • community building

24
Multilevel Cellular Models
  • Signaling networks
  • environmental triggers and behavior
  • e.g., cell lifecycle
  • different pathways in each tissue type
  • Metabolic networks
  • measurable products in pathway
  • many systems are steady state
  • negative feedback leads to stabilization
  • Protein interaction networks
  • localization of proteins that interact for
    function
  • protein-protein interactions for specific actions
  • Gene regulatory networks
  • many things affect gene product concentration
  • nucleic-nucleic, protein-nucleic interactions
  • Computing, physics, engineering and biology
  • control theory, mathematical models, phase spaces
  • from biological cartoons to predictive models
  • e.g., microRNAs and gene expression controls

25
Biological Models
  • Simulation and prediction
  • structures and dynamics
  • Reasoning and discovery
  • reverse engineering

Temporal (seconds)
Spatial (nM3)
26
Biophysical and Environmental Modeling
Airway/flow
Mucus
Disease, Environment and Medicine
Cilia
Cell biochemistry and structure
Proteomics
Genomics
Source Ric Boucher, UNC
27
Data Heterogeneity and Complexity
Genomic, proteomic, transcriptomic, metabalomic,
protein-protein interactions, regulatory
bio-networks, alignments, disease, patterns and
motifs, protein structure, protein
classifications, specialist proteins (enzymes,
receptors),
Proteome
Source Carole Goble (Manchester)
28
Sensor Data Overload
Source Chris Johnson, Utah Art
Toga, UCLA
Source Robert Morris, IBM
  • High resolution brain imaging
  • 4.5 petabytes (PB) per brain

29
RENCI What Is It?
  • Statewide objectives
  • create broad benefit in a competitive world
  • engage industry, academia, government and
    citizens
  • Four target areas
  • public benefit
  • supporting urban planning, disaster response,
  • economic development
  • helping companies and people with innovative
    ideas
  • research engagement across disciplines
  • catalyzing new projects and increasing success
  • building multidisciplinary partnerships
  • education and outreach
  • providing hands on experiences and broadening
    participation
  • Mechanisms and approaches
  • partnerships and collaborations
  • infrastructure as needed to accomplish goals

30
Carolina Center for Exploratory Genetic Analysis
(CCEGA)
Interoperable Data Management
Faculty, Staff Students
Driving Problems
Promoting Mutual Awareness
Experimental Genetics Portal
Analysis Techniques
Statistical Computational Techniques
Extant Data Models
Virtuous Cycle
Interdisciplinary Research Education
31
CCEGA Participants
  • Coordination team
  • Dan Reed, RENCI
  • Terry Magnuson, CCGS
  • Alan Blatecky, RENCI
  • Kirk Wilhelmsen, CCGS
  • Eleven departments/institutes
  • Biostatistics
  • Cancer Center
  • Genetics
  • Computer Science
  • Epidemiology
  • Genetics
  • Health Science Library
  • Information and Library Science
  • Pharmacy
  • RENCI
  • Statistics
  • Campus wide support
  • from many sources
  • Project participants
  • Brad Hemminger, Information Library Science
  • James Evans, Genetics
  • Kevin Gamiel, RENCI
  • Xiaojun Guan, RENCI
  • Barrie Hays, Health Science Library
  • Clark Jefferies, RENCI
  • Ethan Lange, Genetics
  • Andrew Nobel, Statistics
  • Karen Mohlke, Genetics
  • Kari North, Epidemiology
  • Susan Paulsen, Computer Science
  • Fernando Manuel Pardo, Genetics
  • Charles Perou, Cancer Center
  • Lavanya Ramakrishnan, RENCI
  • Jan Prins, Computer Science
  • Patrick Sullivan, Genetics
  • Lisa Susswein, Cancer Center
  • David Threadgill, Genetics

32
Data From Lab and Clinic to Analysis
  • Independent data management
  • data security
  • version control
  • redundancy
  • controlled access

ELSI
Clinical
ELSI
Analysis
Analysis
Laboratory
Integration Informatics
LAB
Clinic
Analysis
  • NIH CCEGA
  • Carolina Center for Exploratory Genetic Analysis

Source Brad Hemmenger, UNC
33
Data Management and Information Viz
Published Domain Literature
Taxonomy Annotation
Ontology Annotation
..
DB Schema Ontology Annotation
Annotated Domain Literature
Information Mining Module
Information Visualization Module
34
From SNPs to HapMap
  • Single Nucleotide Polymorphisms (SNPs)
  • one in 1200 bases differ across individuals
  • SNPs act as markers to locate genes
  • Common groups of SNPs are shared
  • i.e., form a haplotype
  • HapMap data sources
  • 90 Yoruba individuals (30 trios) from Nigeria
    (YRI)
  • 90 individuals (30 trios) of European descent
    from Utah (CEU)
  • 45 Han Chinese individuals from Beijing (CHB)
  • 45 Japanese individuals from Tokyo (JPT)
  • 3,500,000 SNPs typed
  • basis for association studies for disease
    identification

35
CCEGA HapMap Simulator
  • Synthetic data
  • disease models
  • model testing
  • mining bakeoffs

36
Carolina Bioportal
  • Three overlapping target groups
  • undergraduate education
  • graduate education and research
  • academic/industrial research
  • Features
  • access to common bioinformatics tools
  • extensible toolkit and infrastructure
  • OGCE and National Middleware Initiative (NMI)
  • leverages emerging international standards
  • remotely accessible or locally deployable
  • packaged and distributed with documentation
  • National reach and community
  • TeraGrid deployment
  • science gateway
  • Education and training
  • hands-on workshops
  • clusters, Grids, portals and bioinformatics

37
(No Transcript)
38
Distributed Grid and Web Services
Launch, configure and control
Grid Portals
Open Grid Service Infrastructure (web service
component model)
Online instruments
Source Dennis Gannon, Indiana
39
Bioportal Architecture

Bioportal

Interface Generator
HTML Files
PISE
Application XML Description
Application Processing
www.ncbioportal.org
Velocity Files
User Profile
Job Submission
Remote File Access
Job Records
Authentication, Grid Credential
Application Databases
Command Files
OGCE User Databases
Job History Database
Application Processing
MyProxy
GridFTP
Gatekeeper
Local cluster
  • OGCE toolkit
  • used by cyberinfrastructure projects
  • LEAD, NEES, PACI, DOE, TeraGrid

40
Putting the Technologies Together
NC Bioportal
OGCE Toolkit (Grid middleware)
PISE (XML Wrapper)
Tomcat (Apache servlet container)
Chef (collaboration/standard portlets)
Jakarta Jetspeed (enterprise portal)
Bio Applications
Velocity (template engine)
Turbine (web app framework)
Grid Portlets, CoG
VMC
Databases
41
Community Software Toolkit Lessons
  • NSF PACI Alliance In a Box toolkits
  • cluster software (aka OSCAR)
  • Grid infrastructure (aka NMI)
  • Access Grid for distributed collaboration
  • tiled display walls for visualization
  • Distribution materials
  • software and training materials
  • CDs and web
  • Community workshops and training
  • Linux Clusters Institute
  • MSI HPC workshops
  • hands on training
  • Lowering the entry barrier
  • usage and deployment
  • Bioportal distribution
  • workshops, tutorials
  • training materials
  • road shows

Bioportal Distribution
42
NC Bioportal Whats Next
  • Engagement
  • workshops, experiences and deployments
  • Infrastructure
  • dynamic job scheduling across multiple sites
  • migration to OGCE 2.0
  • fully automated database updates
  • workflow construction and processing
  • Portal tool suite
  • expanded applications and databases
  • phylogeny, morphology, microarray analysis,
  • Training materials
  • additional modules based on user feedback
  • workshop materials packaged for self-study
  • Leverage national presence
  • TeraGrid/NCSA bioinformatics portal

43
The Vision of Grid/Web Services
  • Behold, the people is one, and they have
    all one language and this they begin to do and
    now nothing will be restrained from them, which
    they have imagined to do.
  • Book of Genesis

Peter Bruegel The Tower of Babel (1563)
We're Not There Yet ...
44
Interdisciplinary Collaborations
  • Appropriate reward structures
  • well-matched time constants
  • Intellectual equality
  • balanced recognition of contributions
  • Research/infrastructure distinctions
  • timelines and people needs differ
  • Confidentiality and openness
  • academic/industry collaboration perspectives
  • Intellectual property
  • background IP and differential disciplinary
    models

45
Some Thoughts on the Future
  • Grids/web services are not a panacea
  • we have seen this movie before
  • standards debates can be endless
  • make new mistakes, not the same old ones
  • code is shifted from modules to interfaces
  • Danger of Death by CS Abstraction
  • all problems can be solved by another level of
    indirection
  • Appropriate decomposition is a challenge
  • performance, usability, flexibility
  • Generality and extensibility really matter
  • incremental aggregation and interoperability
  • data management and federation
  • Better questions, not just private capabilities
  • limited by creativity not resources

46
The Cambrian Explosion
  • Most phyla appear
  • sponges, archaeocyathids, brachiopods
  • trilobites, primitive mollusks, echinoderms
  • Indeed, most appeared quickly!
  • Tommotian and Atdbanian
  • as little as five million years
  • Lessons for computing
  • it doesnt take long when conditions are right
  • raw materials and environment
  • leave fossil records if you want to be
    remembered!

47
Thanks for the Invitation!
Write a Comment
User Comments (0)
About PowerShow.com