Title: EGGSViz : Visualization and Exploration of Gene Clusters
1EGGSViz Visualization and Exploration of Gene
Clusters
- Ankita Bhan
- Advisor Prof. Sun Kim
- Co-advisor Prof. Yves Brun
- Indiana University, Bloomington
2Outline
- Background
- Motivation
- Problem Description
- Features
- Workflow 1
- Workflow 2
- Workflow 3
- Exploration Feature 1
- Exploration Feature 2
- Future Work
3Background
- Background
- Motivation
- Problem Description
- Features
- Workflow 1
- Workflow 2
- Workflow 3
- Exploration1
- Exploration2
- Future Work
- Functionally related genes co-evolve, probably
due to selection pressure during evolution. - This leads to conservation of gene clusters
across genomes (especially in microbial genomes).
4Motivation
- Microbial genomes contain an abundance of genes
with conserved proximity. - Genes with conserved proximity are often
co-transcribed as operons, or co-regulated as
part of a larger biochemical network.
- Background
- Motivation
- Problem Description
- Features
- Workflow 1
- Workflow 2
- Workflow 3
- Exploration1
- Exploration2
- Future Work
5Motivation
- Gene clusters-Definition
- Group of genes in microbial genomes with
conserved proximity that are the possible
candidates for being co-transcribed as operons,
or co-regulated as part of a biochemical pathway.
- Background
- Motivation
- Problem Description
- Features
- Workflow 1
- Workflow 2
- Workflow 3
- Exploration1
- Exploration2
- Future Work
6EXAMPLE OF A GENE CLUSTER
7THE FUNCTIONAL CATEGORY CODES FROM TIGR
8Problem Description
- Background
- Motivation
- Problem Description
- Features
- Workflow 1
- Workflow 2
- Workflow 3
- Exploration1
- Exploration2
- Future Work
- Predicting sets of families in genomes by
interpreting a genome as a sequence of families
modeling as a gene cluster. - Visualizing the clusters with the multiple
genomes on a single display window is challenging
as the clusters are scattered. - Adding genes from a new unknown genome to a known
cluster of genes is challenging.
9Features
- simultaneous visualization of all gene clusters
on genome scale with zoom in/out features - detailed view of individual cluster
- color coding scheme according to COG functional
categories - multiple sequence alignment of genes in a cluster
- selection of clusters by feeding in
"genes-of-interest" by users - adding a new genome and search for instances of
clusters and saving the results of search
- Background
- Motivation
- Problem Description
- Features
- Workflow 1
- Workflow 2
- Workflow 3
- Exploration1
- Exploration2
- Future Work
10Features Illustration of how EGGSVIZ works
- Background
- Motivation
- Problem Description
- Features
- Workflow 1
- Workflow 2
- Workflow 3
- Exploration1
- Exploration2
- Future Work
- Workflow1
- Main display of EGGSVIZ and broader view of
cluster. - Workflow2
- Detailed view of an individual cluster
- Workflow 3
- Further details of each gene in the Detail View
window. - Exploration feature 1
- Visualizing our genes of interest.
- Exploration feature 2
- Adding a new genome and saving the search.
11Workflow 1
- Display of all clusters on genome scale on a
single display screen. - Divided dynamic zoom (from low to high
resolution) of genomic regions. - Displays the cluster number and size(number of
genes ) in a particular cluster in the main
window. - Highlights an individual cluster and shows the
annotation information for each and every gene
and cluster.
- Background
- Motivation
- Problem Description
- Features
- Workflow 1
- Workflow 2
- Workflow 3
- Exploration1
- Exploration2
- Future Work
12DISPLAY WINDOW
13OPEN CLUSTER FILE
14CLUSTER FILE LOADED
15GENOMES ZOOMED OUT
16GENOMES ZOOMED IN
17MAIN DISPLAY WITH CONNECT CLUSTERS OPTION
18CLUSTER OF INTEREST HIGHLIGHTED
19COLOR REVERSE OPTION
20CONNECT GENES OPTION
21Workflow 2
- How to choose a cluster?
- Detailed view of the individual cluster of our
choice. - Color Reverse and Disconnect genes options.
- Background
- Motivation
- Problem Description
- Features
- Workflow 1
- Workflow 2
- Workflow 3
- Exploration1
- Exploration2
- Future Work
22HOW TO GO TO THE DETAIL VIEW WINDOW
23THE DETAIL VIEW WINDOW FOR CLUSTER 1 CONTAINING
30 GENES
24GENE SIZE ZOOMED IN TO A SCALE OF 15
25GENE SIZE ZOOMED IN TO A SCALE OF 25
26DETAILED VIEW WITH THE COLOR REVERSE OPTION
27DETAIL VIEW WITH DISCONNECT GENES OPTION
28DETAIL VIEW WITH THE GENE ANNOTATION
29DETAIL VIEW WITH THE GENE ANNOTATION
30Workflow 3
- On double clicking a gene redirection to sequence
window. - Sequence window retrieves sequences of genes
related in the synteny. - Clustalw button facilitates the multiple
alignment of sequences. - More features to be added to connect this
information to web services.
- Background
- Motivation
- Problem Description
- Features
- Workflow 1
- Workflow 2
- Workflow 3
- Exploration 1
- Exploration 2
- Future Work
31HIGHLIGTING A CLUSTER
32DETAILED VIEW OF CLUSTER 14 IN COLOR REVERSE
33HIGHLIGHTING AN INDIVIDUAL GENE
34RETRIEVING SEQUENCES OF THE HIGHLIGHTED SET OF
GENES
35MULTIPLE ALIGNMENT OF SEQUENCES
36Exploration Feature 1
- Background
- Motivation
- Problem Description
- Features
- Workflow 1
- Workflow 2
- Workflow 3
- Exploration 1
- Exploration 2
- Future Work
- A feature to explore the genes of interest is
provided. - Input button on the main window would prompt a
window asking our genes of interest. - On submitting the query we would get the complete
detail of those genes and the genes would be
highlighted in the detail view window.
37INPUT BUTTON ON THE MAIN DISPLAY WINDOW
38FEEDING IN THE GENES OF OUR INTEREST
39INFORMATION RETRIEVED ON USING THE SUBMIT BUTTON
40 CLUSTER-gt GENES INVOLVEDTABLE REDIRECTS TO
DETAIL VIEW WINDOW
41REDIRECTED TO DETAIL VIEW WINDOW
42GENES OF INTEREST HIGHLIGHTED
43GENES OF INTEREST HIGHLIGHTED IN YELLOW
44GENES OF INTEREST HIGHLIGHTED
45Exploration Feature 2
- Background
- Motivation
- Problem Description
- Features
- Workflow 1
- Workflow 2
- Workflow 3
- Exploration 1
- Exploration 2
- Future Work
- Import an additional new Genome to the cluster on
the detail view window. - Connect that genome to the present cluster and
predict clusters by connecting to a web server. - A Hidden Markov Model is used to predict similar
genes from the 4th genome.
46SELECTING THE CLUSTER OF YOUR CHOICE
47DETAILED VIEW OF THE HIGHLIGHTED CLUSTER
48ON CLICKING THE BROWSE BUTTON
49ON SELECTING THE NEW GENOME
50DETAILS OF THE GENES OF THE NEW GENOME
51Future Work
- Background
- Motivation
- Problem Description
- Features
- Workflow 1
- Workflow 2
- Workflow 3
- Exploration 1
- Exploration 2
- Future Work
- Saving the results of the clusters generated
after adding the 4th genome for faster and
efficient lookup. - Extending the cluster prediction and
visualization beyond 3 genomes. - Analyzing the gene clusters Phylogenetically and
visualizing the results.
52Acknowledgements
- Special thanks to Justin Choi, Center of
Genomics and Bioinformatics, Indiana University - Prof. Sun Kim, School of Informatics, Indiana
University - Prof. Yves Brun, Department of Biology, Indiana
University - Kwangmin Choi, Youngik Yang,School of
Informatics, Indiana University - Pamela Bonner and the Brun Lab, Department of
Biology, Indiana University - Faculty and the staff , School of Informatics,
Indiana University