Title: Benchmarking domain assignment in proteins
1Benchmarking domain assignment in
proteins Stella Veretnik, Ilya Shindyalov,
Nickolai Alexandrov and Phil Bourne March 5 2003
SPAM meeting
2Background Assignment of domains remains a
difficult and unresolved problem. Several
manually curated databases of domains exist (
SCOP, CATH, Authors assignments). These are
referred as expert assignments. There are many
more automatic methods of domain
assignment. There is only partial (up to 80)
agreement between any automatic method and an
expert assignment. There is only partial
agreement (80) between different expert
assignment methods. Therefore it becomes
difficult to benchmark algorithms for domain
assignment, since the result depends on which
expert assignment was used.
3Performance of domain assignment methods using
authors assignment as reference.
Figure 1. Evaluation of domain assignment by
different methods. Authors assignment 6 are
used as reference. Evaluation is performed on
the entire dataset.
4Improvement of assignments by automatic methods
using manual consensus dataset.
Figure 2. Fraction of misassigned chains in the
entire dataset vs. manual consensus dataset
(manual consensus requires agreement between
authors assignment, CATH and SCOP). Exclusion
of 20 of chains without manual consensus reduces
misassignment of chains by DALI and PDP methods
by 62 and misassignment by DomainParser method
by 73. The effect is biased toward
misassignments of the undercut type.
5Manual vs. automatic consensuses do they overlap?
Chains with manual consensus 375 (80 of entire
dataset) Chains with automatic consensus 374
(80 of entire dataset) Chains with consensus
(automatic or manual) 424 (90.6 of entire
dataset)
Automatic consensus only 46 chains (10.9 of
chains with consensus)
Manual consensus only 47 chains (11.1 of chains
with consensus)
Automatic consensus and manual consensus disagree
3 chains (0.7 of chains with consensus)
Figure 3. Distribution of chains with manual and
automatic consensuses. Equal fraction (80 ) of
468-chain entire dataset reaches manual or
automatic consensus. Among the chains with
consensus, 78 (331 chains) have both automatic
and manual consensus, which is identical in the
vast majority (99.1) of the cases. The
remaining 22 of the chains with only one type of
consensus are evenly split between those with
manual and those with automatic consensus.
6Misassignments among manual assignment methods
measured against an automatic consensus.
Figure 4. Looking at manual (experts)
assignment methods from the point of view of
automatic consensus. Chains in which there is
automatic, but no manual consensus among
assignment methods (total of 46 chains) exhibit
interesting tendencies Authors assignment and
CATH methods habitually predict more domains than
found by automatic consensus, while SCOP method
predicts less domains.
7Automatic assignments disagree Manual methods
reach consensus 2 domains
1png DomainParser 1domain
1png PDP 2 domains Manual consensus 2 domains
A.
B.
1png DALI 3 domains
C.
Figure 5. Automatic methods for 1png
(peptide-N(4)-(N-acetyl-beta-D-glucosaminyl)
asparagine amidase) produce 3 different
assignments, while manual methods reach
consensus. A. DomainParser 1 domain, B. PDP
2 domains C. DALI 3 domains.
Manual consensus 2 domains.
8Manual methods disagree automatic methods reach
consensus 2 domains
A.
B.
C.
Figure 6. Chain 1tnrr (human 55kd tumor
necrosis factor (TNF) receptor) has three
different assignments by experts, however
automatic methods reach a consensus. A. Authors
assignment 4 domains predicted B. SCOP
assignment 3 domains predicted C. CATH and
automatic consensus assignment 2domains
predicted.
9Consensus among all methods (automatic and manual)
A.
C.
B.
A.
D.
E.
Figure 7. Some of the 328 cases where manual and
automatic consensuses agree. A. 8adh 2 domains
B. 3cd4 2 domains C. 1btc 1 domain D. 1dhk
1 domain E. 1fcdc 2 domains