Title: Mopping up the Flood of Data with Web Services
1Mopping up the Flood of Data with Web Services
- Gary Wiggins
- Indiana University
- School of Informatics
- wiggins_at_indiana.edu
2Overview of the Talk
- Data Mining and Knowledge Discovery
- DMKD in Bioinformatics
- DMKD in Chemistry
- Public Chemistry Databases for DMKD
- Overview of Web Services
- NIH-funded Projects Underway or Planned at
Indiana University - Educational Opportunities at IU
3Data Mining and Knowledge Discovery (DMKD)
- Techniques began to be used around 1989
- Rapid growth in the mid 1990s, with DMKD field
emerging around 1995 - Built on DM tools such as Machine Learning
4Data Mining
- One of the steps in Knowledge Discovery
- Concerned with the actual extraction of knowledge
from data - Efficient and scalable methods for mining
interesting patterns and knowledge and
discovering hidden facts contained in large
databases
5Data Mining Techniques
- Efficient classification methods
- Clustering
- Outlier analysis
- Frequent, sequential, and structured pattern
analysis - Visualization and spatial/temporal analysis tools
6Knowledge Discovery (KD)
- KD is a nontrivial process of identifying valid,
novel, potentially useful, and ultimately
understandable patterns from large collections of
data. - --Fayyad et al., as quoted by Cios and Kurgan
- The KD process involves
- Understanding and preparation of the data
- Data Mining (DM)
- Verification and application of the discovered
knowledge
7Framework for KD Process
- Steps range from very few, e.g.,
- Data collection and understanding
- Data mining
- Implementation
- To multi-step models, e.g., Cios and Kurgans
six-step DMKD process model
8Cios and Kurgans Six-Step DMKD Process Model
- Understanding the problem domain
- Understanding the data
- Preparation of the data
- 50 or more of effort spent on this step
- Data mining
- Evaluation of the discovered knowledge
- Using the discovered knowledge
9General Data Mining/Data Analysis Systems
- SAS Enterprise Miner
- SPSS
- Insightful S-Plus
- IBM DB2 Intelligent Miner
- Microsoft SQLServer 2005
- SGI MLC and MineSet Tree Visualizer
- Inxight VizServer
10Trends Major Conferences
- Knowledge Discovery and Data Mining (KDD) 2005
- http//www.informatik.uni-trier.de/ley/db/conf/kd
d/kdd2005.html - International Conference on Machine Learning
(ICML) 2006 - http//www.icml2006.org/icml2006/technical/accepte
d.html - SIAM Conference on Data Mining 2006
- http//www.siam.org/meetings/sdm06/proceedings.htm
1112th Annual SIGKDD International Conference
onKnowledge Discovery and Data Mining,
Philadelphia, August 20-23, 2006
- Areas of Interest on the Research Track
- Applications of data mining (biomedicine,
business, e-commerce, defense) - Data and result visualization
- Data warehousing
- Data mining for community generation, social
network analysis and graph-structured data - Foundations of data mining
- Interactive and online data mining
- KDD framework and process
- Mining data streams
- Mining high-dimensional data
- Mining sensor data
- Mining text and semi-structured data
- Mining multi-media data
- Novel data mining algorithms
- Privacy and data mining
- Robust and scalable statistical methods
- Pre-processing and post-processing for data
mining - Security issues
- Spatial and temporal data mining
12Trends in DMKD
- OLAP (On-Line Analytical Processing)
- Data warehousing
- Association rules
- High Performance DMKD systems
- Visualization techniques
- Applications of DM
- More recently
- Database products that incorporate DM tools
- New developments in design and implementation of
the DMKD process - Information visualization products as end-user
queries - XML
13XML the Key to DM and KD?
- Or simply a data exchange protocol?
- Allows for the description and storage of
structured or semi-structured data and their
relationships - Can be used to exchange data in a
platform-independent way - BUTonly one paper at the major conferences
listed earlier that dealt with XML
14XML helps
- Standardize communication between diverse DM
tools and databases (I/O procedures) - Build standard data repositories sharing data
between different DM tools that work on different
software platforms - Implement communication protocols between DM
tools - Provide a framework for integration of and
communication between different DMKD steps
15Predictive Model Markup Language (PMML) and Other
Tools
- In conjunction with XML, PMML enables the
automation of sharing of discovered knowledge
between different domains and tools - XML-RPC
- SOAP (Simple Object Access Protocol)
- UDDI
- OLAP
- OLE DB-DM
16Discovery Informatics Definition
- "Discovery Informatics is the study and practice
of employing the full spectrum of computing and
analytical science and technology to the singular
pursuit of discovering new information by
identifying and validating patterns in data."
--William W. Agresti in 2003
17Discovery Informatics
- Discovery and Application of Information
- Data Mining and Machine Learning are two aspects
of Discovery Informatics.
18Overview of the Talk
- Data Mining and Knowledge Discovery
- DMKD in Bioinformatics
- DMKD in Chemistry
- Public Chemistry Databases for DMKD
- Overview of Web Services
- NIH-funded Projects Underway or Planned at
Indiana University - Educational Opportunities at IU
19Trends Bioinformatics Conferences
- International Conference on Intelligent Systems
for Molecular Biology (ISMB) 2006 - http//ismb2006.cbi.cnptia.embrapa.br/papers.html
- Research in Computational Molecular Biology
(RECOMB) 2006 - http//www.informatik.uni-trier.de/ley/db/conf/re
comb/recomb2006.html - Pacific Symposium on Biocomputing (PSB) 2006
- http//helix-web.stanford.edu/psb06/
20Main Areas of Research in Bioinformatics
- Sequence alignment
- Alternative splicing
- Microarray analysis
- Functional analysis
- Analysis of single nucleotide polymorphisms
(SNPs) - Natural language text analysis
21DMKD Sessions at Major Bioinformatics Conferences
- Databases and Data Integration
- Text Mining and Information Extraction
- Semantic Webs
22Data Mining in Bioinformatics (Bajcsy)
- Data cleaning, data preprocessing, and semantic
integration of heterogeneous, distributed
biomedical databases - Existing data mining tools for biodata analysis
- Development of advanced, effective, and scalable
data mining methods in biodata analysis
23Preprocessing of Biodata
- Integration of multiple microarray gene
experiments must resolve inconsistent labels of
genes to form a coherent data store. - Focus on quantitative quality metrics based on
analytical and statistical data descriptors and
on relationships among variables.
24Semantic Integration of Heterogeneous Biomedical
Databases
- Combine multiple sources into a coherent data
store - Find semantically equivalent real-world entities
from several biomedical sources - Problems
- Different labels for the same concept gene_id
vs. g_id - Time asynchronization same gene analyzed at
multiple development stages
25Approaches for Semantic Integration of Biodata
- Construction of integrated biodata warehouses or
biodatabases - Construction of a federation of heterogeneous
distributed biodatabases - Must build up mapping rules or semantic ambiguity
resolution rules across multiple databases
26Existing Data Mining Tools for Biodata Analysis-I
- Sequence Analysis, e.g.,
- NCBI/BLAST, ClustalW, HMMER, PHYLIP, MEME,
TRANSFAC, MDScan, Vector NTI, Sequencher,
MacVector - Structure Prediction and Visualization, e.g.,
- RasMol, Raster3D, Swiss-Model, Scope, MolScript,
Cn3D
27Existing Data Mining Tools for Biodata Analysis-II
- Genome Analysis, e.g.,
- CAP3, Paracel GenomeAssembler, GenomeScan,
GeneMark, GenScan, X-Grail, ORF Finder,
GeneBuilder - Pathway Analysis and Visualization, e.g.,
- KEGG, EcoCyc/MetaCyc, GenMapp
- Microarray Analysis, e.g.,
- ScanAlyze/Cluster/TreeView, Scanalytics
MicroArray Suite, Profiler, Silicon Genetics
28Biospecific Data Analysis Software Systems
- Agilent GeneSpring
- Spotfire
- Invitrogen VectorNTI
29Text Mining in Bioinformatics
- Techniques have progressed from simple
recognition of terms to extraction of interaction
relationships in complex sentences. - Search objectives have broadened to a range of
problems, e.g., - Improving homology search
- Identifying cellular location
- Deriving genetic network technologies
30Current Work in Biomedical Text Mining (Cohen and
Hersh)
- Text mining operates at a finer level of
granularity than information retrieval and text
summarization. - TM examines relationships between specific kinds
of information contained within and between
documents. - Areas of active research
- Named entity recognition (genes, proteins, etc.)
- Text classification
- Synonym and abbreviation extraction
- Relationship extraction
- Hypothesis generation
- Integrated frameworks
31Systems Biology
- Requires a shift in focus from genes and proteins
to the systems structure and dynamics - Four key properties
- System structures
- System dynamics
- Control method
- Design method
- Systems Biology Markup Language (SBML) and CellML
32iSpecies.org
33Overview of the Talk
- Data Mining and Knowledge Discovery
- DMKD in Bioinformatics
- DMKD in Chemistry
- Public Chemistry Databases for DMKD
- Overview of Web Services
- NIH-funded Projects Underway or Planned at
Indiana University - Educational Opportunities at IU
34Data Mining in Chemistry
- Modern experimentation (whether classical or
high-throughput) should be based on the
productive interplay of statistical techniques
(design-of-experiments), molecular modeling as
well as cheminformatics. - --Ulrich S. Schubert
35Session on Integration of Informatics and
Knowledge Management Informatics
- Integration of Informatics at the Systems Level
and at the Data LevelChris L. Waller, Ph.D.,
Director, World Wide Chemistry Informatics,
Pfizer Global Research Development - Integrated Knowledge Management at Bayer
HealthCare Pharmacophore Informatics William J.
Scott, Ph.D., Team Leader, Department for
Chemistry Research, Bayer Pharmaceuticals
Corporation - Building a Knowledge Enabled OrganizationCory R.
Brouwer, Ph.D., Associate Director, Knowledge
Management Informatics, Pfizer Global Research
Development - Knowledge Management Building a Knowledge
Enabled OrganizationVictor Lobanov, Ph.D.,
Principal Scientist, MDI, Johnson Johnson
Pharmaceutical RD - 10th Annual Cheminformatics Conference, May
23-16, 2006, Philadelphia
36Impact of HTS and Combinatorial Chemistry Research
- Most impact in
- the pharmaceutical industry
- medical research
- catalyst research
- More recently
- polymer and materials research.
37Diversity of Data Mining in Chemistry
- On 5/7/2006 there were 4072 references to either
datamining or data mining in Chemical
Abstracts. - 3416 different index terms were assigned to those
records. - 2772 used 1-5 times (81)
- 298 used 6-10 times (9)
- 103 used 11-15 times (3)
- 71 used 16-20 times (2)
- 38 used 21-25 times (1)
- 24 used 26-30 times (1)
- 110 for 31-480 times (3)
- Most frequent co-term bioinformatics with 480
hits or 12 of the occurrences
38SFS graph
39Components of the Semantic Web for Chemistry
- XML eXtensible Markup Language
- RDF Resource Description Framework
- RSS Rich Site Summary
- Dublin Core allows metadata-based newsfeeds
- OWL for ontologies
- BPEL4WS for workflow and web services
- Murray-Rust et al. Org. Biomol. Chem. 2004, 2,
3192-3203.
40Chemical Markup Language (CML)
- Much of the semantics in a chemical article can
be supported by CML - Molecules
- Structures
- Reactions and reaction schemes
- Spectra (including annotations)
- Physicochemical data
- XML dictionaries and lexicons provide linguistic
and semantic support for markup - Will lead to quicker authoring and higher quality
of embedded structures and data through machine
validation
41Key Factors in the Success of the Chemical
Semantic Web
- Institutional Repositories services deployed and
supported at an institutional level to offer
dissemination management, stewardship, and where
appropriate, long-term preservation of both the
intellectual work created by an institutional
community and the records of the intellectual and
cultural life of the institutional community - Open Access Movement
42Knowledge-Driven Bioinformatics Enhanced with
Chemistry
43Text Mining (Banville)
- In the pharmaceutical field, it is ideally the
marriage of biological and chemical information
that needs to be the ultimate focus of text data
mining applications. - Problems
- Lack of universal publication standards for
identifying each unique chemical entity - Selective indexing policies of AI services
- Need to understand how chemical structures link
to biological processes
44OSCAR3 Service
- Open Java source application under development by
Peter Murray-Rust group at Cambridge (Not
published yet) - Extracts chemical information from either a
paragraph of experimental data or a full paper
(e.g. melting points, infra-red and NMR data, and
mass spectral information) - Produces an XML instance highlighting the
chemical information with an Extensible
Stylesheet Language (XSL) file - At IU, we are attaching SOAP input/output engine
for a web service based on OSCAR3.
45OSCAR at Work in the Future
46Semantic Scholars Grid I
Local MDStore
Local HarvestStore
Fetch MD and Documents
Gatherer
Query and Get list
Indexer
Analyzer
Index all Local MD
Run filter such asOSCAR onharvested MDand
documents Store new MD
47Semantic Scholars Grid II
Local MDStore
Plug-in
SynchronizeSSG andforeign MD
Updater
CommunityTools
SSGViewer
Instant Citation Index etc.
Update local MD Control foreign interactions View
all MD Access Community Tools
Update and viewforeign MD
48Chemical Datamining Software
- SureChem
- http//surechem.reeltwo.com/
- CLiDE
- Recognizes structures, reactions, and text
- http//www.simbiosys.ca/clide/
- OSCAR
- OSCAR1 to check experimental data
- http//www.ch.cam.ac.uk/magnus/checker.html
- http//www.rsc.org/Publishing/ReSourCe/AuthorGuide
lines/AuthoringTools/ExperimentalDataChecker/ - CSR (Chemical Structure Reconstruction)
- http//www.scai.fraunhofer.de/uploads/media/MZ-ERC
IM05_04.pdf - MDL DocSearchcombines MDLs Isentris platform
and EMCs Documentum
49Overview of the Talk
- Data Mining and Knowledge Discovery
- DMKD in Bioinformatics
- DMKD in Chemistry
- Public Chemistry Databases for DMKD
- Overview of Web Services
- NIH-funded Projects Underway or Planned at
Indiana University - Educational Opportunities at IU
50ChemDB http//cdb.ics.uci.edu/CHEM/Web/
51ChEBI, Chemical Entities of Biological Interest
- Dictionary of molecular entities focused on small
chemical compounds - Features an ontological classification, showing
the relationships between molecular entities or
classes of entities and their parents and/or
children
52Vioxx Entry in ChEBI
53The IUPAC International Chemical Identifier
(InChI)
- Open source, non-proprietary, public-domain
identifier for chemicals - String of characters that uniquely represent a
molecular substance - Independent of the way the chemical structure is
drawn - Enables reliable structure recognition and easy
linking of diverse data compilations - Accepts as input MOLfiles (or SDfiles) and CML
files - Download the program to your computer at
- http//www.iupac.org/inchi/license.html
54Generation of InChI for Vioxx with wInChI
55Vioxx Entry in PubChem Compounds Found with InChI
56Vioxx Bioassay Data in PubChem
57Vioxx PubChem Link to External Sources of
Information
58PubChem Link to Elsevier MDL
- DiscoveryGate www.discoverygate.com
- provides access to integrated scientific content
from databases, journal articles, patent
publications and reference works - information providers include Elsevier,
Thomson-Derwent, FIZ CHEMIE, the U.S. FDA, Prous
Science and Thieme - MDL Compound Index (the master list of substances
included in DiscoveryGate data sources) now
exceeds 14 million unique chemical structures
with the addition of 5 million chemical
structures from the PubChem database.
59The Elsevier MDL/NIH Link via PubChem and
DiscoveryGate
- Cross-indexes PubChem to the Compound Index
hosted on Elsevier MDLs DiscoveryGate platform - MDL added 5 million structures from PubChem to
their index, resulting in over 14 million unique
chemical structures - Links go both ways
- Can move from biological data in PubChem to
bioactivity, chemical sourcing, synthetic
methodology, and EHS data in DiscoveryGate
sources
60Elsevier MDLs xPharm
- Comprehensive set of records linking
- Agents (compounds) (2300)
- Targets (600)
- Disorders (450)
- Principles that govern their interactions (180)
- Answers questions such as
- What targets are associated with control of blood
pressure? - What adverse effects are associated with
monoamine oxidase inhibitors?
61Web Guide for Essential Cheminformatics Resources
- http//www.chembiogrid.org
- http//www.indiana.edu/cheminfo/cicc/
62ChemBioGrid Chemical Databases
63Overview of the Talk
- Data Mining and Knowledge Discovery
- DMKD in Bioinformatics
- DMKD in Chemistry
- Public Chemistry Databases for DMKD
- Overview of Web Services
- NIH-funded Projects Underway or Planned at
Indiana University - Educational Opportunities at IU
64Web Services Overview
- What are Web Services?
- A distributed invocation system built on Grid
computing - Independent of platform and programming language
- Built on existing Web standards
- A service oriented architecture with
- Interfaces based on Internet protocols
- Messages in XML (except for binary data
attachments)
65Web Services for Chemistry Problems
- Performance and scalability
- Proprietary data
- Competition from high-performance desktop
applications - -- Geoff Hutchison, its a puzzle blog,
2005-01-05 - ALSO
- Lack of a substantial body of trustworthy Open
Access databases - Non-standard chemical data formats (over 40 in
regular use and requiring normalization to one
another)
66DM Internet Toolbox Architecture
67Overview of the Talk
- Data Mining and Knowledge Discovery
- DMKD in Bioinformatics
- DMKD in Chemistry
- Public Chemistry Databases for DMKD
- Overview of Web Services
- NIH-funded Projects Underway or Planned at
Indiana University - Educational Opportunities at IU
68Indiana University Planned Projectshttp//www.ch
embiogrid.org
- Application of a Grid-based distributed data
architecture to chemistry - Development of tools for HTS data analysis and
virtual screening - Database for quantum mechanical simulation data
- Chemical prototype projects
- Novel routes to enzymatic reaction mechanisms
- Mechanism-based drug design
- Data-inquiry-based development of new methods in
natural product synthesis
69Web Services for Chemistry at IU
70NCI Developmental Therapeutics Program (DTP)
- Downloadable data
- In vitro 60 cell line results
- in vitro anti-HIV results
- Yeast assay
- 200,000 chemical structures
- molecular targets
- microarray data
- Or search the database at
- http//dtp.nci.nih.gov/docs/dtp_search.html
71IU Database of NIH DTP Data
- Contains over 200,000 chemical structures tested
in 60 cellular assays from different human tumor
cell lines - Also includes microarray assay profiles for the
untreated cell lines (14,000 datapoints) - A local PostgreSQL database containing the data
that is exposed as a web service - Using workflows and complex SQL queries, we can
do advanced data mining that exploits the
chemical, biological and genomic information for
particular audiences (chemists, biologists, etc)
72Mining the NIH DTP database
14,000 gene expression values
60 cell lines
Cell lines can be clustered based on gene
expression similarity
200,000 compounds
Compounds can be clustered based on similarity of
profile across cell lines, or by chemical
structure fingerprint similarity
73Use of Taverna at IU
- A protein implicated in tumor growth is supplied
to the docking program (in this case HSP90 taken
from the PDB 1Y4 complex) - The workflow employs our local NIH DTP database
service to search 200,000 compounds tested in
human tumor cellular assays for similar
structures to the ligand. - Client portlets are used to browse these
structures - Once docking is complete, the user visualizes the
high-scoring docked structures in a portlet using
the JMOL applet. - Similar structures are filtered for drugability,
and are automatically passed to the OpenEye FRED
docking program for docking into the target
protein. - A 2D structure is supplied for input into the
similarity search (in this case, the extracted
bound ligand from the PDB IY4 complex) - Correlation of docking results and biological
fingerprints across the human tumor cell lines
can help identify potential mechanisms of action
of DTP compounds
74Taverna Workflow
Workflow definition
Available web services (WSDL)
Visual depiction of workflow
75Taverna in Action
76CGL Contributions to CICC
- Build Web/Grid services for connecting
- Data sources
- Applications (simulation, data mining, data
assimilation, imaging, etc). - Computing resources
- Information services.
- Third party tool evaluation
- Workflow (Taverna)
- Grid tools Globus and Condor (for interacting
with TeraGrid) - Building standards-based Web portal environments.
- OGCE grid portal project
- JSR 168 Java standards.
- This activity will begin in earnest over the
summer.
77Digital Chemistry (BCI) Clustering Service Methods
78Local Web Service Methods for WWMM of PMRs Group
79More Services
80ToxTree
- An in silico toxicology prediction suite
- Based on the CDK toolkit
- Built on CML
- Released as OpenSource under the GPL
- Standalone PC software
- User Manual http//ecb.jrc.it/DOCUMENTS/QSAR/TOXT
REE/toxTree_user_manual.pdf
81ToxTree Service
- An open Java source application by Nina
Jeliazkova - Estimates toxic hazard by applying a decision
tree approach. - Encodes the Cramer scheme
- (Cramer G. M., R. A. Ford, R. L. Hall,
Estimation of Toxic Hazard - A Decision Tree
Approach, J. Cosmet. Toxicol., Vol.16, pp.
255-276, Pergamon Press, 1978) - Could be applied to datasets from various
compatible file types. - We are converting this GUI application to a
text-based web service
82Overview of the Talk
- Data Mining and Knowledge Discovery
- DMKD in Bioinformatics
- DMKD in Chemistry
- Public Chemistry Databases for DMKD
- Overview of Web Services
- NIH-funded Projects Underway or Planned at
Indiana University - Educational Opportunities at IU
83Chemoinformatics Education at IU
- School of Informatics degree programs
- BS, MS, PhD
- Programs offered at both the Indianapolis (IUPUI)
and Bloomington (IUB) campuses
84Other Educational Activities
- Graduate Certificate Program in Chemical
Informatics (4 courses by Distance Education) - I571 Chemical Information Technology (3 cr.)
- I572 Computational Chemistry and Molecular
Modeling (3 cr.) - I573 Programming Techniques for Chemical and Life
Science Informatics (3 cr.) - I553 Independent Study in Chemical Informatics (3
cr.) - I571 as CIC Courseshare offering w. Michigan
- Experiments with teleconferencing as a distance
education tool
85PhD in Informatics
- Began in August 2005
- Tracks
- bioinformatics chemical informatics health
informatics human-computer interaction design
social and organizational informatics - Under development
- complex systems, networks, modeling and
simulation cybersecurity discovery and
application of information logical and
mathematical foundations music informatics
86Graduate Enrollment Chemo-, Laboratory, Bio-,
Health Informatics
87Software/DBs Used in the Program
- Company Products and/or (Target
Area) - ArrgusLab (Molecular modeling)
- Digital Chemistry Toolkit (Clustering)
- Cambridge Cryst Data Ctr Cambridge Structrual DB
GOLD - CambridgeSoft ChemDraw Ultra
- Chemical Abstracts Service SciFinder Scholar
- Chemaxon Marvin (and other software)
- Daylight Chemical Info System Toolkit
- FIZ Karlsruhe Inorganic Crystal Structure DB
- IO-Informatics Sentient
- MDLCrossFire Beilstein and Gmelin
- OpenEye Toolkit (and other software)
- Sage Informatics ChemTK
- Serena Software PCMODEL
- Spotfire DecisionSite
- STN International STN Express with Discover
(Anal Ed) - Wavefunction Spartan
88Closing quote
- The future of chemistry depends on the
automated analysis of chemical knowledge,
combining disparate data sources in a single
resource, . . . which can be analysed using
computational techniques to assess and build on
these data. - Townsend et al. Org. Biomol. Chem. 2004, 2, 3299.
89We all need help when overloaded!
90Bibliography
- Agresti, William W. Discovery informatics.
Communications of the ACM 2003, 46(8), 25-28. - Bajcsy, Peter Han, Jiawei Liu, Lei Yang,
Jiong. "Survey of bio-data analysis from a data
mining perspective." Chapter 2 in Wang, Jason T.
L. Zaki, Mohammed J. Toivonen, Hannu T. T.
Shasha, Dennis (eds.), Data Mining in
Bioinformatics. London, Springer Verlag, 2005,
pp.9-39. - Banville, Debra L. Mining chemical structural
information from the drug literature. Drug
Discovery Today, 2006, 11(1/2), 35-42. - Cios, Krzysztof J. Kurgan, Lukasz A. Trends in
data mining and knowledge discovery. Chapter 1
in Pal, N.R. Jain, L.C. Teodoresku, N. (eds.),
Knowledge Discovery in Advanced Information
Systems. N.Y., Springer Verlag, 2002, pp. 1-26. - Cohen, Aaron M. Hersh, W.illiam R. "A survey of
current work in biomedical text mining."
Briefings in Bioinformatics March 2005, 6(1),
57-71. - Corbett, Peter T. Murray-Rust, Peter Day, Nick
E. Townsend, Joe A. Rzepa, Henry S.
Chemistry publications in CML. Abstracts of
Papers, 231st ACS National Meeting, Atlanta, GA,
United States, March 26-30, 2006, CINF-055.
91Bibliography
- Fayyad, U.M. Piatesky-Shapiro, G. Smyth, P.
Uthurusamy, R. Advances in Knowledge Discovery
and Data Mining. AAAi/MIT Press, 1996. (quoted by
Cios and Kurgan) - Gardner, Stephen P. Ontologies and semantic data
integration. Drug Discovery Today 2005 10(14),
1001-1007. - Guha, R. Howard, M.T. Hutchison, G.R.
Murray-Rust, P. Rzepa, H. Steinbeck, C Wegner,
J. Willighagen, E.L. The Blue
ObeliskInteroperability in chemical
informatics. Journal of Chemical Information and
Modeling 2006 Web Release Date 22-Feb-2006 DOI
10.1021/ci050400b - Holliday, Gemma L. Murray-Rust, Peter Rzepa,
Henry S. Chemical Markup, XML, and the World
Wide Web. 6. CMLReact, an XML Vocabulary for
Chemical Reactions. Journal of Chemical
Information and Modeling 2006, 46(1), 145-157. - Jónsdóttir, S.O. Jorgensen, F.S. Brunak, S.
Prediction methods and databases within
chemoinformatics emphasis on drugs and drug
candidates. Bioinformatics 2005 May 15 21(10)
2145-60.
92Bibliography
- Karthikeyan, M. Krishnan, S. Pankey, Anil
Kumar. Harvesting chemical information from the
Internet using a distributed approach
ChemXtreme. Journal of Chemical Information and
Modeling. DOI 10.1021/ci050329. - Krallinger, Martin Alonso-Allende Erhardt,
Ramon Valencia, Alfonso. Text-mining approaches
in molecular biology and biomedicine. Drug
Discovery Today 2005, 10(6), 439-445.Scherf Uwe,
Ross Douglas T., Waltham Mark, Smith Lawrence H.,
Lee Jae K., Tanabe Lorraine, Kohn Kurt W.,
Reinhold William C., Myers Timothy G., Andrews
Darren T., Scudiero Dominic A., Eisen Michael B.,
Sausville Edward A., Pommier Yves, Botstein
David, Brown Patrick O., Weinstein John N. A
gene expression database for the molecular
pharmacology of cancer. Nature Genetics 2000,
24, 236-244. - Schubert, Ulrich S. "Materials informatics from
data to knowledge towards integrated escience
approaches." QSAR Combinatorial Science 2005,
24(1), 5. (NB Entire issue is devoted to this
topic.) - SIAM International Conference on Data Mining
(5th 2005 Newport Beach, CA) Data Mining
Proceedings. Kargupta, Hillol et al., eds. SIAM,
2005. - Torr-Brown, Sheryl. Advances in knowledge
management for pharmaceutical research and
development. Current Opinion in Drug Discovery
Development 2005, 8(3), 316-322.
93Web 2.0
- Social Software allows group interactions
- Enables groups to form and organize themselves
- Examples
- Wikis
- Blogs
- RSS (now found on chemistry.org)
- Podcasting/Coursecasting
- Webcasting/Webinars
- Flickr
- Jybe
- FURL
94FURL (Frame Uniform Resource Locater)
- For archiving and sharing of web pages
- Furler can capture the pages for a discussion
group - Tracks useful pages for a discussion
- http//www.furl.net/home.jsp
95Jybe (Join Your Browser with Everyone)
- Collaboration and communication in real time with
IE and Firefox - Screen-sharing AND editing
- Privacy protected must be invited
- Upload documents to convert to html
- http//www.jybe.com