Title: Function preserves sequences
1Function preserves sequences
Christophe Roos - MediCel ltd christophe.roos_at_medi
cel.fi
Mutations change sequences
Molecular evolution
Part 3 sequence databases comparisons
Similarity is the result of conservation or
converging evolution it has its reason of being
2The public biological databases
- EMBL or GenBank or DDBJ for DNA
- emblnew for daily updates, merges the main DB
4x/year - SwissProt or PIR for proteins
- Trembl, tremblnew, remtrembl
- PDB for structures
- In flat file format, yet quite informative and
convertible - Fasta format is a universal sequence format
first line starts with gt followed by free text.
Second line has the start of the sequence (50 or
60 characters per line). Use the first line for
the name or the Accession Number (AC)
3Database homes
- The European database home is in Hinxton,
Cambridge, UK European Bioinformatics Institute
- EBI - http//www.ebi.ac.uk
- Access through the Sequence Retrieval System, SRS
- The American database home is in Washington DC
National Center for Biotechnology Information
NCBI - http//www.ncbi.nlm.nih.gov
- Access through Entrez
- Both centers exchange their data on a daily
basis, however there are differences in
annotations, consistency, speed and quality. - There is also a Japanese database provider, DDBJ.
4A look at one entry from EMBL
part 1/3
5A look at one entry from EMBL
part 2/3
6A look at one entry from EMBL
The feature table of the entry contains several
linked items, such as exon-assembly (mRNA) and
coding sequence (CDS). There are also
cross-references to other databases
part 3/3
7A look at one entry from SwissProt
The eyeless gene a master regulatory gene in eye
formation
8The effect of the eyeless gene
- The eyeless gene is a master regulatory gene in
eye formation - When it is absent, no eyes are formed
- When it is present where it should not, it
induces eye formation
Normal
Overexpressed in antennae and wings
Absent
9A look at one entry from SwissProt
Part 2 the annotations about the function and
location
10A look at one entry from SwissProt
Part 3 The feature table and the amino acid
sequence
11A look at one entry from SwissProt
The eyeless gene is also called PAX6 and can be
found in several species birds, mammals,
reptiles, fish, invertebrates
12Sequence comparison
- Why?
- Function by analogy If sequences are conserved
their function is probably also conserved. - Functional domains If some parts of the
sequences are more conserved than other parts,
there must be an underlying biological reason for
it. - Establishing relationship/differences in
function By quantification of sequence
relationships it is possible to estimate function
of novel genes - Establishing relationship between species
13Sequence comparison how?
- Compare two sequences of similar length
- Compare two sequences of very different length
- Compare several sequences
- Allow gaps or not?
- Scoring yes-no or good-intermediate-bad
- The best or all above a threshold?
14Sequence comparison metrics
gap
match
GA-CGGATTAG GATCGGAATAG
- The scoring matrix
- The score for a match
- The penality for a mismatch
- The penality for the insertion of a gap
(gap-open) - The penality for elongating a gap (gap-length)
- Local or global similarities ?
mismatch