Title: To understand the structures of existing biological databases,
1(No Transcript)
2Purpose
- To understand the definition of
- homology, and its importance in
- supporting the evolution theory
- To understand the structures of existing
biological databases, - and the ability to process their data
- To utilize sequence alignment tools such as
BLAST, - understand its underlying scoring mechanisms,
and its - statistical significance
- To construct a easily manageable database system
- with user-friendly interface
3Step 1 What is homology?
- Read related information regarding the origin
and meaning of homology - Recommend reading Homology in Biology,
Jonathan Wells - Recommend reading Icons of Evolution? Alan
D. Gishlick
- Before actually staring to work, you should be
able to - answer the following questions
- Whats the difference between similarity and
homology? - What were biologists views on the meaning of
homology? - And what is your point of view? (If you have
any)
4Definition of Homology
Homologous sequences. Orthologs and Paralogs are
two types of homologous sequences. Orthology
describes genes in different species that derive
from a common ancestor. Orthologous genes may or
may not have the same function. Paralogy
describes homologous genes within a single
species that diverged by gene duplication.
(Definition from NCBI Education)
5Step 2 BLAST
- Know how to use BLAST, understand the BLAST
algorithm - Use the web-based BLAST at NCBI,
- play with it
- We will use blastp in the project
- Download BLAST software from the NCBI
- FTP site, install it
6Step 3 Review Biological Databases
- You have to integrate the following database
- HomoloGene (from NCBI)
- Homophila
- Improved homology prediction (? construct a
database like euGenes by hand) - The following databases are optional for
inclusion - FlyBase
- MGD
- RGD
- SGD
- DHMHD (The Dysmorphic Human-Mouse Homology
Database ) - Cancer Immunity
7To complete your project
- You must give us
- A search function so that we can search
homologies by criteria such as organisms, gene
name, gene symbol, homology group, etc. - When we give a query, you must display the
corresponding homologies and its source (ex this
homology is from HomoloGene, this is from
RGD.etc) - A summary table (like in euGenes) describing the
number of predicted homologies between different
organisms
8Bioinformatics Computational Molecular
BiologyFinal Project
- Microarray Analysis
- Presenter Elizabeth Tseng
9Purpose
- To understand what is microarray and its use
- Obtain an overview of the various analysis tools
used in microarrays - Familiarize with the usage of
- microarray tools
- Construct an user-friendly platform
- for microarray analysis
10Step 1 What is microarray?
- You already know from the previous courses that
microarray is an array of DNA or protein samples
that can be hybridized with probes to study
patterns of gene expression. - Types of microarray include cDNA (spotted) and
Affymetrix - Definition of gene expression used to describe
the transcription of the information contained
within the DNA into mRNA molecules that are then
translated into the protein that perform most of
the critical functions of cells. - How it works by using an array containing many
DNA samples, scientists can determine in a
single experiment the expression levels of
numerous genes within a cell by measuring the
amount of mRNA bound to each site on the array.
With the aid of computer, the amount of mRNA
bound is measured, generating a profile of gene
expression in the cell. - Ratio of expression (ex tumor/normal, or
red/green) indicates whether or not that spot is
more high expressed
11Step 2 Why do need microarray?
- Microarray can contain a VERY LARGE number of
genes - Microarray is small sized
- Microarray may be used to assay gene expression
within a single sample or to compare gene
expression in two different cell types or tissue
samples
GREEN represents Control DNA, RED represents
Sample DNAÂ YELLOW represents a combination of
Control and Sample DNAÂ BLACK represents areas
where neither the Control nor Sample DNA
hybridized to the target DNA.
(Step 1 2 from NCBI Education)
12Step 3 Microarray Analysis
- (pre-Analysis) Standardization
- Remove background noise
- Normalize intensity
- Example SNOMAD
- (pre-Analysis) Picking out the significant genes
(ex SAM) - Comparison of gene expression
- Clustering
- Hierarchical clustering
- K-means
- SOM (Self-Organizing Maps)
13Actual image
Pseudogram
Source MeV
14(No Transcript)
15Step 4 Review microarray tools
- SNOMAD - Standardization and NOrmalization of
MicroArray Data http//pevsnerlab.kennedyk
rieger.org/snomadinput.html - SAM - Significance Analysis of Microarrays
http//www-stat.stanford.edu/tibs/SAM/index.html
- PAM - Prediction Analysis for Microarrays
http//www-stat.stanford.edu/tibs/PAM/ - Eisens Cluster TreeView http//rana.lbl.gov/E
isenSoftware.htm - EMBL-EBI-Expression Profiler
http//www.ebi.ac.uk/microarray-srv/EP/cgi-bin/ep_
ui.pl - NCI BRB ArrayTools http//linus.nci.nih.gov/BRB-
ArrayTools.html - TIGR-TM4 http//www.tigr.org/software/tm4/
- Whitehead Institute - GeneCluster 2
http//www-genome.wi.mit.edu/cancer/software/genec
luster2/gc2.html - dChip http//www.dchip.org/
- Genesis http//genome.tugraz.at/
- BioConductor http//www.bioconductor.org
16To complete the project
- You must construct a website that
- For each input microarray file, perform
statistical analysis (ex t-test, pearson corr.)
to determine the significance of each expression
profile - For a given Affymetrix CEL file, draw a
probe-intensity image. - Compare expression levels between two probe sets.
- Use PCA (Principle Component Analysis) to
- perform clustering
- Hierarchical clustering
probe-intensity sample image
profile comparison sample