Title: Pr
1EUROPEAN MULTIMEDIA BIOINFORMATICS EDUCATIONAL
RESOURCE a new tutorial on sequence analysis and
bio computing
Viorica Ghita, Valérie Ledent, Robert Herzog,
Terry Attwood, Ioannis Selimas, Marc
Brugman Belgian EMBnet Node BEN. Laboratoire
de Bioinformatique. Université Libre de
Bruxelles. Campus de la Plaine Bat NO. Bd du
Triomphe. 1050 Bruxelles. UMBER, the University
of Manchester Specialist Node of EMBnet, School
of Biological Sciences, Oxford Road, M13 9PL,
Manchester. University of Amsterdam, Mauritkade
61, 1092 AD Amsterdam, The Netherlands
Ember is a new tutorial on sequence analysis and
bio computing developed by several EMBnet teams
within an EC framework. The course can be used by
independent users as well as material for
academic purposes and is structured by chapters
of gradually increasing difficulty. Each chapter
has several sections AIM, INFO (presenting
theoretical aspects of the subjects tackled),
INSTRUCTIONS (presenting practical exercises on
line) Quiz and References.
- The tutorial is addressed to a wide variety of
researchers (Master and PhD students, post-docs,
junior and senior researchers) from all Molecular
Biology and Bioinformatics departments, covering
broad analysis areas such as - DNA analysis DNA translation (chapter 1),
similarity searches (chapter 2), multiple
alignments (chapter 4), restriction mapping
(chapter 13) determination of gene structure
through intron/exon prediction (chapter 10)
inference of protein coding sequence through open
reading frame (ORF) analysis (chapter 10) - Protein analysis retrieving protein sequences
from databases (chapter 1) classifying proteins
into families (chapter 3) searching primary and
secondary protein databases (chapter 3) finding
the best alignment between two or more proteins
(chapter 4) computing amino-acid composition,
molecular weight, isoelectric point, and other
parameters (chapter 5) computing
hydrophobicity/hydrophilicity profiles, locating
membrane-spanning segments (chapter 5)
predicting elements of secondary structure
(chapter 5) visualizing the protein structure in
3D (chapter 6) predicting a protein 3D structure
from its sequence (chapters 7 and 8) finding
evolutionary relationships between proteins
(chapter 12). - Genome analysis analysing genomic sequences
locating genes in a genome displaying genomes
parsing a eukaryotic genome sequence GenScan
(chapter 10), etc. - The tutorial presents a wide variety of tools and
websites for multiple types of analysis
similarity searches tools (BLAST, PSI-BLAST)
protein family analysis through databases
searches (PROSITE, eMOTIF, BLOCKS, PRINTS,
Pfam) multiple alignment tools (Clustal,
DIALIGN, T-COFFEE, CINEMA, Jalview)
physicochemical parameters and profile prediction
(ProtParam and ProtScale) transmembrane helix
prediction (MEMSAT, TMpred) secondary structure
prediction (Jpredet, NNPREDICT) 3D prediction,
comparison and visualisation (RasMol, QuickPDB,
Cn-3D) homology modelling (Swiss Model,
Geno-3D) fold recognition (GenThreader,
3D-PSSM) phylogenetic analysis (Pylip) SRS
(sequence retrieval), etc.
Figure 1. Ember presentation page here chapter 3
of the tutorial, containing a detailed
presentation of the most important secondary
databases PROSITE, eMOTIF, PRINTS, BLOCKS, Pfam
and InterPro. The information presented is
supported by multiple web links, illustrative
animations and practical exercises.
Figure 2. The tutorial presents the most
important tools for multiple sequence alignment,
rich information about manual and automatic
multiple alignment tools, exercises and links to
various software and alignment databases (chapter
4).
Figure 3. Physicochemical parameters computation
tools for molecular weight, theoretical pI,
amino acid composition, atomic composition,
extinction coefficient, hydropathy, chain
flexibility, solvent-accessible surface area,
etc., software tools to predict the transmembrane
topology of proteins and some secondary structure
prediction software are presented in tutorial
(chapter 5).
Figure 4. Figure 4. A detailed presentation of
Protein Data Bank, the principal repository of
biological macromolecule structures, and some
structure classification resources (CATH, SCOP,
EC-gtPDB) are presented in Chapter 6 Fold
classification, as well as visualisation and
comparison of protein 3D structure with various
Molecular Structure Viewers RasmOl, QuickPDB,
Deep View, Cn-3D.
Figure 7. Human Genome case study chapter
proposes a complex analysis using advanced
bioinformatics tools in concrete research
applications. Using a genomic fragment of the
human chromosome 6, the students are invited to
find potential genes in this fragment with
GenMark and GENESCAN software. They can then
compare the results and assess their reliability
using GeneQuiz, an integrated system for
large-scale biological sequence analysis, and
current database annotation in Human Genome
project - Ensembl.
Figure 6. In the Sickle cell haemoglobin case
study chapter the users can compare sickle cell
and normal ß globin sequences to reveal the
nature of the sickle cell mutation.The exercise
integrates several databases searches and
multiple toolsSRS, CLUSTALW, Restriction map as
well as an advanced RasMol session by scripting
files to visualise the mutant haemoglobin and the
interaction between mutant ß chains and further
amino acid side chains in the vicinity of mutated
Val6 residue. In this representation, the two
central mutant ß chains are highlighted as white
and orange wireframes. Also highlighted are the
side chains of the central Val6 mutation and
porphyrin prosthetic group (in CPK coloured
space-filling models). Both the porphyrin
prosthetic groups (blue) and the mutant Val6
residues (red) are represented as space filling
models. Highlighted in yellow are the side-chains
in the vicinity of Val6 at the interface of the
two haemoglobin molecules.
Figure 5. Different protein structure viewers,
presented in the tutorial, displaying the
ubiquitin-like signalling protein, Nedd8 (PDB ID
1NND). (A) Deep View, (B) Rasmol, (C) QuickPDB
and (D) CN3D. (A) illustrates classical ball and
stick mode, (B) cartoon mode, (C) a wireframe
a-carbon trace, with a small section of the
structure highlighted in blue, and (D) a hybrid
display with amino acid chains in cartoon mode
and non-amino acid atoms in space-filling mode.
EMBER EMBnet teams University of Manchester
(United Kingdom), Swiss Institute of
Bioinformatics (Switzerland), University of
Nijmegen (The Netherlands), University of the
Western Cape (South Africa), European
Bioinformatics Institute (United Kingdom),
Instituto Gulbenkian de Ciencia (Portugal), ULB
University of Bruxelles (Belgium), Canada
Institute for Marine Biosciences (Canada),
Research Institute for Genetic engineering and
Biotechnology (Turkey), Expert Center for
Taxonomic Identification (The Netherlands). The
project coordinator is Professor Terri Attwood
from the University of Manchester the principal
authors include Ioannis Selimas, from the
Manchester group and Marc Brugman from the Expert
Centre for Taxonomic Identification.