Title: Bioinformatics%20for%20Microarray%20Studies%20at%20IBS
1Bioinformatics for Microarray Studies at IBS
- Pei-Ing Hwang, Ph.D.
- Mar. 24, 2005
2Different aspects for life science research
genomics
transcriptomics
proteomics
3Building blocks for DNA or RNA
- DNA A, T, G, C
- RNA A, U, G, C
4DNA deoxyribonucleic acid
5Why microarray?
- Gene Expression
- To simultaneously study multiple genes
- To obtain an overview of gene expression at
transcriptional level under specific experimental
conditions - To study gene interaction network from the
transcriptional aspect - Genome
- SNP detection
- To find out recombination site in the
chromosome/genome - Hopefully to discover the gene responsible for a
genetic disease
6Outline
- Introduction to Microarray experiments
- Experiences at IBS for the cDNA arrays
- Data generated with microarray
- DNA annotation
- Data Analysis
- Data Management
7About Microarray Technology-1
- Up to hundreds of thousands of spots in a fixed
area on a glass slide or a membrane - One species of DNA molecules per one spot
- Spot is also named as feature
- DNA fixed on the chip or membrane is also called
probe - The sequence or/and function of each DNA species
on the spot is known .
8About Microarray Technology-2
- Making use of hybridization method
- A T, U
- G C
- Image processing
- Data analysis
- Result interpretation from biology aspect
9Types of Microarray
- Types of DNA immobilized on the solid support
- cDNA vs. oligonucleotides
- Manufacturing methods
- Printing vs. photolithography
- Solid support
- Glass slides
- Membrane
- Nucleotide labeling (slide scanning condition)
- One color vs. two colors
10GeneChip Array Manufacuturing
Figure 1. Affymetrix uses a unique combination of
photolithography and combinatorial chemistry to
manufacture GeneChip Arrays.
11Microarray printing machine
http//arrayit.com/Products/MicroarrayI/NanoPrint/
Nano-Print-new-600.jpg
12Procedure for one-channel array
13Experimental Procedure for 2-channel Microarray
14Data Analyses
- Feature intensity acquisition
- Image analyses
- To identify differentially expressed genes
- Normalization (global, local, print-tip, btwn
array etc.) - Clustering or Classification
- Analyses from biology aspect
- Significant genes
- Transcriptional regulation study
- Cellular pathway or network finding
15Experiences at IBS for the cDNA arrays
16About IBS tomato arrays
- 13000 spots/features per chip
- 1 clone per spot
- cDNA clones from a dozen of various cDNA
libraries - At least two different protocols were followed
and six different vectors were used - More than ten technicians involved
17Bioinformatics for Microarray at IBS (contd)
- IBS tomato EST database construction
- Installation, management and maintenance of data
analyses software - Reference information searching
- Batch Submission of EST sequences
18Bioinformatics Needs for Microarray Studies at IBS
- Pre-arraying data management
- cDNA info collection, vector trimming, sequence
annotation, EST submission..etc. - Array information management
- Gene set characterization, data storage, data
retrieval - Post-hybridization data analysis and management
- array data analyses, storage of the scanning
result, biology-oriented bioinformatics analyses
19Bioinformatics Service Work for Microarray
studies at IBS
- Data pre-processing for the cDNAs
- Clone id assignment
- Sequence trimming
- gene annotation
- Function classification
- Data sheet preparation for commercial software to
analyze microarray data - Gal file preparation for GenePixPro
- Master Gene List preparation for GeneSpring
20cDNA clones
Vector trimming Assembly Function annotation
sequencing
Database
PCR
Spotfire, GeneSpring
Biological meaning Pathway analysis
Transcription network Gene-gene interaction
GenePix
Data analysis Normalization, Variance Clustering
Feature intensities normalization
21Pre-array Bioinformatics
- Clone id generation
- Vector Trimming
- Sequence assembly
- Seq annotation (BLAST)
- EST submission to NCBI
- Database construction
clones from labs
sequencing
Raw EST seq
Data Processing and Management
22Clone id generation
- Data centralization following sequencing
- Rules for re-arraying
- 96 well plate to/from 384 well
- PCR from 96 well and spotting from 384 well
- Order of A1, A2, B1, B2
2396 or 384 well
96 well
96 well
384 well
2496-well to 384 well plates
B2
B1
A2
A1
25Data collection
- Raw sequencing data obtained from the sequencing
company - Organized and stored both ABI and text files by
labs and by date - Confirmed with each sequence contributor for
clone info - Clone id matched with raw sequences
26Processing the sequencing data
- cDNA libraries procedures confirmed with each
single lab - Vector/linker/primer trimming (Seqclean)
- Function annotation
- Blast against different database
- Gene Ontology annotation
- Sequence Assembly (Phrap)
27Procedure to generate cDNA clones
28IBS tomato EST Database
- Cloning information
- Sequencing data
- Vector/adaptor Trimming information
- EST assembly
- Function annotation
- Cross Reference
29The Tomato Database Entity-Relationship model
Trimmed Sequence 1. Seq id 2. Trimmed
Sequence 3. Method 4. Trim set
TAIR Result 1. Seq id 2. At number 3.
E-Value 4. Description 5. Identity 6. Other result
Untrimmed Sequence 1. Seq id 2. Trimmed Sequence
Assembly Information 1. Contig _ id 2. Contig
Sequence 3. BLAST Result 4. Position 5. Component
seq id
NCBI BLAST Result 1. Seq id 2. NCBI _id 3.
E-Value 4. Description 5. Identity 6. Other result
Seq _ id
Lab info 1. Seq id 2. Comment 3. Primer 4.
Biotech 5. Sender 6. Collect From
TIGR Result 1. Seq id 2. TC number 3.
E-Value 4. Description 5. Identity 6. Other result
Clone _ id
TOM 4
Clone _ id
TOM 3
ID MAP 1. Seq id 2. Clone _ id 3.
Contig id 4. Lab_id1 5. Lab_id2 6.
NCBI_sbmt_id93 7. NCBI_sbmt_id94 8. dbEST _ accn
_no 9. note
Gene Ontology 1. TC number 2. EC number 3.
Process -GO_id -Description 4. Function
-GO_id -Description 5. Component
-GO_id -Description
cDNA Library Information 1. Clone _
id(3)(4) 8. Host. 2. Name
9. Species 3. Date made
10. Vector 4. Developmental stage 11.
Antibiotic. 5. Cloning sites 12.
Authors 6. Description 13.
Tissue 7. Library 14.
Primer
Clone _ id
n
1
1
n
TC number
30Information to be further analyzed
- Gene set characterization
- Number of unique genes on the array
- Number of known/ unkown genes
- Coordination of each spotted sequence
- Statistics about spotted cDNA
- grouped by function/pathway
- grouped by sequence similarity
31Post-hybridization data analysis and management
32Post-hybridization data analysis
- Software for Microarray Analysis At IBS
- GenePix Pro5.0 image processing
- GeneSpring microarray data analysis
- Spotfire microarray data analysis and data
storage - TransPath pathway searching
33Image Processing
- GenePix Pro5.0
- GAL (GenePix Array List) file
34From multi-well plate to microarray
35GAL online
36GeneSpring at IBS
- for microarray data analyses
- standalone software
- providing statistical methods for data analysis
- Some bioinformatics
- providing visaulization
- licensed annually
- rigid format requirement for input data
- requiring installation of a master gene list
(master table) prior to data analysis
37Master table for GeneSpring
- Master table contains information of
- Id
- Source of DNA
- Gene name
- Gene function annotation (from Blast results)
- GO annotation
- Each array needs its own master table
- Format of master table may vary with different
version of the software.
38To generate master table for GeneSpring
- Batch blast against three sequence database
- Parsing Blast results
- Incorporating EC number, GO number and other
related data from the best BLAST matched results - Integrate all required data from various files
and generate the master table - checking
39Spotfire
- for microarray data analyses
- server-client software
- linked to Oracle database for data storage
- providing various statistical methods for data
analysis - capability in establishing links to more
bioinformatics tools - can record analysis procedure
- more flexible format requirement for input data
40One color array for Arabidopsis
- Affymetrix ATH1 chip
- Annotation information provided by company and
available on internet
41Bioinformatics support at Affymetrix
42Projects for now and the near future
- Infrastructure build-up
- Microarray data management system
- Platform for Bioinformatics analyses
- Plant Signaling Pathway Database
43Team
44Thank you!