OEChem in PubChem: Cleaning Up PDB Small Molecules - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

OEChem in PubChem: Cleaning Up PDB Small Molecules

Description:

OEChem's Valence Model. Phosphorus with valence 4 can be /- 1. Valence Model (cont'd) Oxygen 1 is allowed valence 3, 5. PDB's HET. PDB sometimes splits into ... – PowerPoint PPT presentation

Number of Views:279
Avg rating:3.0/5.0
Slides: 29
Provided by: paulath8
Category:

less

Transcript and Presenter's Notes

Title: OEChem in PubChem: Cleaning Up PDB Small Molecules


1
OEChem in PubChem Cleaning Up PDB Small
Molecules
  • Paul Thiessen
  • NCBI

2
Outline
  • Introduction
  • Legacy data
  • Small molecules from PDB
  • OEChem perception from 3D
  • Successes
  • Successes, with some tweaking
  • Outright failures
  • Molecule classification
  • Bound vs. free ligands
  • Blacklist
  • Other projects

3
Crosslinks
  • Specialized legacy data sets
  • Open BioCyc
  • KEGG
  • PDB (MMDB)
  • lt5 of structures in PubChem, but 100 of
  • Direct links from chemicals to
  • Literature (PubMed, except MeSH)
  • Sequence (GenBank/GenPept)
  • Protein structure (MMDB)
  • Enzyme activity (EC)

4
Sample
  • ATP from KEGG

5
PDB Molecules
  • What is a small molecule in PDB?
  • Any non-biopolymer
  • A protein of lt 10 residues
  • A nucleotide of lt 5 residues
  • Dont distinguish bound vs. co-crystallized
  • Whatever PDB puts in a HET residue
  • Hydrolyze covalent bonds (no R-groups)
  • Ignore uninteresting molecules
  • Solvents
  • Monoatomics
  • Blacklisted

6
The Problem
  • PDB lacks chemical detail
  • No bond orders
  • No charges
  • Mostly no hydrogens

7
Filling in Detail
?
  • So how do we go from this

to this?
8
OEChem
  • The answer OEChem
  • OEChem 1.3 by OpenEye
  • Chemical information
  • Manipulation of chemical structure
  • Perception from 3D coordinates

9
Perception Results
  • OEChem sets bond orders, hydrogens, and charges

to the deposited structure
Now we can go from PDB
10
Standardization
  • The next step (also with OEChems help)

Deposited structure has bond orders and
(implicit) hydrogens
Standardized structure includes stereochemistry
(perceived from 3D coordinates)
11
Success
  • Some complex molecules come out right, just on 3D
    coordinates alone
  • Vancomycin

12
Failure
  • Some dont (noise in PDB)

13
Spatial Overlap
  • Sometimes PDB overlaps molecules

14
Success (Mostly)
  • Some need a little massaging

Pattern match to fix heme bonds, oxidation state
15
Hard Cases
  • Without H, bond order, and charge ?

16
Diatomics
  • Diatomics are also hard to distinguish
  • Especially when bound to a metal (heme)
  • OO or HO-OH?
  • Similarly for
  • Cyanide (vs. aminomethane)
  • Carbon monoxide (vs. methanol)
  • Nitrogen (vs. hydrazine)
  • Nitrogen monoxide (vs. hydroxylamine)
  • Other special functional groups
  • Isocyanides (M N-C-R)
  • Azide (N-NN-)

17
Carbocations
  • Sometimes we get charged carbons

18
Carbanions
  • OEChem wont mix implicit/explicit H

19
Resonance
  • Sometimes OEChem chooses an unlikely resonance
    form

20
Errors?
  • Not always clear why OEChem does what it does

21
Bad Kekulization
  • Some cases fail to Kekulize properly

22
OEChems Valence Model
  • Phosphorus with valence 4 can be /- 1

23
Valence Model (contd)
  • Oxygen 1 is allowed valence 3, 5

24
PDBs HET
  • PDB sometimes splits into separate HETs


Bonds get hydrolyzed
25
PDBs HET (contd)
  • This can be a bit arbitrary


26
Statistics
  • Latest PDB data set
  • 22,075 substances
  • 6,791 unique substances (by connectivity)
  • Ignored 7,042 blacklisted molecules
  • 3,075 sulfate ions
  • 979 glycerols
  • 882 phosphate ions
  • 567 acetate ions
  • 323 mercaptoethanols
  • 314 ethylene glycols
  • PubChem data is freely available!

27
PCView
28
Thanks To
  • Steve Bryant
  • PubChem Team
  • NCBI SWG
  • OpenEye
  • CUP-VI Organizers
Write a Comment
User Comments (0)
About PowerShow.com