Title: Microarray experiments: Database and Analysis Tools.
1Microarray experiments Database and Analysis
Tools.
Kate Milova cDNA Microarray Facility March
24, 2005
2Outline.
- Microarray platforms and services at AECOM
- cDNA
- Long Oligo
- Affymetrix
- Database (cDNA Long Oligo) structure and
content - Printing information
- Chip layout
- Annotation
- Annotation algorithms and data mining
- On-line Analysis Tools
- Normalization
- Signal filtering
- Data sets comparison
- Statistical packages and Analysis software
- Summary
3Microarray Platforms at AECOM.
4How to choose a microarray platform.
5Before starting your microarray experiment.
6cDNA Microarray Facility. Home page.
Standart Custom Arrays. Description Prices
Hybridization, labeling, bioinformatics, workshops
Database for cDNA Long Oligo Arrays. Analysis
Pipeline
AECOM cDNA microarray facility. Supported
publications
Useful links of analysis tools
7Database for Analysis of Microarrays at AECOM.
Contents.
Chip layout
Gene Annotation
Printing Information
- Accession
- Clone ID
- Clone end
- Vector name
- Clone name
- UniGene cluster ID
- Best blast hit
- Main blast parameters (score, E-value,
identity, blast date, etc.) - Gene ID
- Gene symbol
- Gene synonyms
- Chromosome
- Map location
- GO IDs
- GO Annotation
- Chip name
- Spot information (Accession or clone id or
bacterial control) - Spot location
- Library name
- Clone location on 384 plate
- Clone location on 96 plate
- Chip name
- Specie
- Number of spots
- Number of controls
- Number of pen domains
- Number of slides
- Printing pattern
- Distance between spots
- Number of rows
- Number of columns
- Printing date
- Master chip
8Annotation sources NCBI.
UniGene ID ? Accession
UniGene
UniGene ID ? Blast against UniGene clusters
Entrez Gene
UniGene ID ? Gene ID ? GO ID
NCBI
Blast Software
Blast Search
Refseq NT databases ? Annotation
9Annotation sources NCBI.
UniGene ID ? Accession
UniGene
UniGene ID ? Blast against UniGene clusters
- NCBI ? UniGene ? UniGene ID
- UniGene ID for cDNA arrays is obtained from the
UniGene source file for each particular accession
number of the clone. - NCBI ? UniGene ? Blast
- UniGene ID for Long Oligo arrays is obtained
from blast results - Blast search was done with the set of oligo
sequences against UniGene clusters with cutoff
99 for sequence identity and 90 for
overlapping. - UniGene ID for the oligo hitting multiple
UniGene clusters is marked as an Ambiguous
cluster ID.
NCBI
10Annotation sources NCBI.
- UniGene ID ? Gene ID
- All information retrieved from Enrez Gene
project is based on the UniGene cluster ID and
corresponding Gene ID. - Gene ID is ambiguous in Gene ID to UniGene
cluster ID connection. - Parsing filter was used to eliminate ambiguous
Gene IDs.
- Gene ID ? GO ID
- For each Gene ID corresponding Gene Ontology IDs
were retrieved from Entrez Gene source file - There might be a few or more then 10 different
GO IDs for a Gene ID. All of them are collected.
11Annotation sources NCBI.
- Blast Software package is installed on the
microarray server. - This software allows to format databases and run
batch homology search for any combination of
custom databases and query sequences. - Refseq NT databases. Annotation
- Loaded formatted and periodically updated on the
microarray server. - When databases are updated we run blast search
of cDNA and Long Oligo sequences. - Blast results are parsed using our algorithm for
annotation extraction.
NCBI
Blast Software
Blast Search
Refseq NT databases? Annotation
12Annotation Extraction Algorithm
Raw Data
Sequences
Homology search against RefSeq NT
Alignment quality check
90
80
13Annotation Extraction Algorithm
EDMUSDFMUSKULUSDETRIKENGLLCLONEJF
FPROTEINRFTYROSINEMNWZMKINASEJHMIW
Linguistic Filter
OUT
14Annotation sources Gene Ontology.
Biological process
Molecular function
Gene Ontology
- Gene Ontology.
- Multiple GO IDs for each Gene ID are retrieved
in the previous step from Entrez Gene ( if
available).
Cellular compartment
- Gene Ontology annotation for all GO IDs is kept
in three different information fields biological
processes, molecular function and cellular
compartment. For each of the fields all available
annotation was prefiltered with redundancy check
and concatenated.
15cDNA Microarray Facility. Database.
16Database Search.
- Database Annotation Search with
- Accession
- Gene annotation
- Gene symbol synonyms
- UniGene cluster ID
- Chromosome number
- Gene ID
- GO ID
- Function
- Cellular compartment
17Microarray Data Analysis Pipeline.
18Pipeline LOWESS Normalization.
19Pipeline LOWESS Normalization.
20Pipeline Filtering.
21Pipeline Data set Comparison.
22Statistical packages and Analysis software.
- Microarray Analysis Software
- GeneTraffic client-server systems for
microarray data analysis. Iobion - GeneSpring cutting-edge tools for expression
analysis. Agilent Technologies - GeneSifter. GeneSifter
- BASE. Lund University
- Data Mining
- PathwayAssist Interaction Explore Software.
Stratagen - Pathways Analysis. Ingenuity
- Tools for Statistical Analysis
- SAM Significance Analysis of Microarrays.
Stanford - R statistical package
- S-PLUS. Insightful
23Summary
- Multiple microarray platforms are available at
AECOM - Affymetrix
- cDNA arrays
- Long Oligo
- Custom arrays
- Data analysis and annotation
- Database for Analysis of Microarrays containes
all information about our arrays, cDNA and oligo
sets - Sequences annotation is updated and integrated
into the database - Web interface of the database makes it easy to
search for a particular gene, synonyms, map
location, function, etc - Easy to use web based analysis pipeline get
your results in just 5 minutes. List of Up,
Down regulated genes with full gene annotation. - We are here for help and consultation !
24BACKUPS
25cDNA Microarray Facility. Services.
26cDNA Microarray Facility. Arrays.
27cDNA Microarray Facility. Publications.
28Gene Correspondence Tables.
29Gene Correspondence Tables.