Title: PowerPointPrsentation
1PTGL Protein Topology Graph Library Patrick
May1 Ina Koch2 1ZIB Berlin, BCB Junior
Research Group,Takustr.7, 14195 Berlin,
Germany 2Technical University of Applied Sciences
TFH Berlin, Dept. Bioinformatics, Seestr.64,
13347 Berlin, Germany 1patrick.may_at_zib.de
2ina.koch_at_tfh-berlin.de
Konrad-Zuse-Zentrum für Informationstechnik Berlin
Introduction The experimental exploration of
protein structures is a basic topic of
post-genomic research. Therefore, the theoretical
analysis which is based on a mathematically
unique description of protein structures in order
to search for similarities in proteins at
different abstraction levels became more and more
important. The simplest representation of protein
topology are schematic diagrams of protein folds
illustrating the secondary structure elements
(SSEs) and their spatial neighbourhoods. Protein
topologies are used in the protein classification
databases CATH (1) and SCOP (2). The only
database that generates toplology diagrams
automatically and provides the possibility to
search for patterns of secondary structures is
TOPS (3). PTGL (Protein Topology Graph
Library) is motivated by the aim of providing an
efficient online database to search for protein
topologies and to provide different topology
representations. We present a mathematically
unique graph-theoretical description of protein
topology using four linear notations with a new
type of schematic representation.
- Methods 4
- protein graph G (V,E)
- - undirected labelled graph for each chain of
an PDB (5) entry - - vertices V SSEs with labels h (helix) and
e (strand) - - edges E spatial neighbourhoods between
SSEs with labels p - (parallel), a
(antiparallel), and m (mixed) - - G is defined as Alpha-Beta (Fig.2), Alpha,
and Beta graph (Fig.2) - folding graph
- - connected components of a protein graph
- - bifurcated structure vertices v with vertex
degree grad(v) gt 2 - - non-bifurcated structure Hamiltonian path
- - barrel structure non-bifurcated closed
Hamiltonian path - linear notations
- - the adjacent ADJ, the reduced RED, the
KEY, and the sequence - SEQ notation (Fig.1)
- Conclusion
- mathematically unique description of protein
topologies on secondary - structure level by methods of applied graph
theory - a web-based database for protein topologies
- applications protein structure comparision and
prediction (threading)
- Implementation
- input PDB 5, DSSP 6
- graph definition C
- database PostgreSQL (8 tables, 2 stored
procedures, 21 indices) - Perl-DBI/DBD, C-libpq
- web application Perl, HTML, CGI, BLAST (7)
(Fig.4) - automatic generating of topology diagrams C,
PostScript - 4 query browser
- SearchKey - simple keyword query
form(boolean operators) - SearchFields - more customizable query form
for searching in - selected fields and
tables, e.g. search for barrel - structures
- SearchTopos - query form for two or more
selected proteins - SearchSequence - a BLAST (7) sequence search
against the sequences - stored in the PTGL
database
References 1 C.A. Orengo et al. Structure
281093 (1997) 2 A. Murzin et al. J.Mol.Biol.
247536 (1995) 3 D. Gilbert et al. Computers
Chemistry 2620 (2001) 4 I. Koch ISBN
3-89685-446-1 (1997) 5 F.C. Bernstein et al.
J. Mol. Biol 112535 (1977) 6 W. Kabsch C.
Sander J.Mol.Biol. 114181 (1977) 7 S.F.
Altschul et al. J. of Mol. Biol 215(3)403
(1990)
This work was done within the context of the