Title: A united Platform for NMR: CCPN
1 A united Platform for NMR CCPN
2The CCPN Project
- Collaborative Computing Project for NMR
- Started in 1999
- Collaborators in several countries
- Developers at University of Cambridge and EBI in
UK - Unifying platform for NMR software
- Similar to CCP4 (X-ray)
- Main goals
- Data standards and data exchange
- Software integration
- Software development and distribution
- Meetings to determine and disseminate best
practice - Open source access
3CCPN Provides
- The CCPN data model standard and libraries
- UML model
- APIs in multiple languages
- Data exchange with existing software
- Format conversion via data model
- Deposition into NMRStar 2 3
- New NMR applications
- FormatConverter
- Analysis
- Clouds
- Processing
- A platform for community development
- ARIA 2.1, QUEEN, Recoord
- Open source access to software
- Yearly conference (UK)
4People
- Cambridge
- Ernest Laue
- Wayne Boucher
- Rasmus Fogh
- Tim Stevens
- Dan ODonovan
- Wolfgang Rieping
- Alan da Silva
- EBI, Hinxton
- Kim Henrick
- John Ionides
- Wim Vranken
- Anne Pajon
5Traditionally Anarchic and Lossy Data Links
Task2
Task1
Convert
Task1
Task1
Convert
Convert
Convert
Task2
Convert
Task 3
Task3
Task3
6A Unified NMR Data standard
7NMR Software
- Problem - Heterogeneous development
- Lots of proprietary data formats
- Lots of stand-alone programs
- Data is lost along the way
- Dedicated converters needed
- Not ideal for structural genomics projects
- Solution - Unity
- Data standards
- Ease of transfer between programs
- Completeness, integrity, deposition, data mining
- Libraries
8Structural Biology Pipeline
Sample Preparation
NMR Machine
Structure Calculation
Data Processing
Spectrum Analysis
Repository Database
9CCPN Approach
- Data model rather than data format
- Format independent
- Language independent
- Scientifically descriptive (NMR)
- Library (API) in memory manipulation
- Create, update, delete query objects
- One for each language
- Error checking
- I/O modules load/store data from/to disk
- One for each (storage format, language)
- Bookkeeping
10Data Models
Chain Code
- Abstract description of data
- Independent of any file format
- An interconnected collection of objects (classes)
- Describes organisational hierarchy
- Chain Residue Atom
- Describes inheritance (subclasses)
- e.g. Distance or dihedral constraint
- Describes attributes
- Describes rules of connectivity
- e.g. A bond must have exactly two atoms
Residue 3 letter code Seq number
Atom Name Element
Coordinate X Y Z
11Coordinates
12UML Example
13Application View
User
GUI
Application1
Application2
Application3
API
In Memory Representation (Python, Java, C, Perl)
I/O
Data Store (XML, SQL)
14Data Model Contents
Applications
Citations
NMR
Samples
Nuclei and
Structure
Isotopes
Targets
Molecule
Compound
Compound
Sequence
Source
Preparation
Molecular
Project
System
Tracking
X-ray
Crystallisation
Crystallography
15CCPN Packages
- Groupings of related data
- e.g. NMR, X-ray, Molecular description
- Connections between packages
- e.g. NMR loads Nucleus (isotope) information
- Allows lazy loading
- Only load relevant data
- Only load when a link is queried
- Save only modified
- Reference packages
- Chemical compound, Reference chemical shifts
Molecule
ChemComp
People
MolSystem
Nucleus
Sample
Coordinates
Nmr
16CCPN API
- Classes for developers
- Mainly getters and setters
- More than just code stubs
- Constraints (e.g. cardinality) enforced
- Links the hard part
- Mostly (gt 99) auto generated from UML
- Some helper functions and constraints hand coded
- Currently around 360k lines in Python and 650k
lines in Java
17Code Generation Overview
18Python API
- Find the number of assigned peaks in a spectrum
- count 0
- for peakList in spectrum.peakLists
- for peak in peakList.peaks
- for peakDim in peak.peakDims
- if peakDim.peakDimContribs
- count 1
- break
- PINE peak list output
- For peak in peakList...
19New Core API technology
- Reduce burden of adding new languages, formats
- Languages (Python, Java, C, Perl)
- Storage formats (XML, SQL)
Most of the logic
Language Format independent
Language dependent only
Format dependent only
Language Format dependent
Code required for new language
Code required for new format
20CcpNmr Applications
21Main CcpNmr Applications
- Format Converter
- Conversion to and from legacy formats
- Analysis
- Graphical analysis (e.g. assignment) program
- Processing (coming soon)
- Azara process wrapped in data model
22NMR Applications
CcpNmr Processing
Validation (Queen)
CcpNmr Analysis
ISD ARIA 2.1
CCPN Data Model
Reference Data Isotopes Nuclei Chemical
Compounds Chemical Shifts Experiment Prototypes
CcpNmr FormatConverter
NMRStar 3.0
Multiple Legacy Formats
23Licenses
- GPL
- Data model
- Scripts which produce APIs
- LGPL
- Generic libraries
- Widget libraries
- Format Converter
- CCPN
- Analysis
24CcpNmr Format Converter
- Import/export of data formats to the Data Model
- For harvesting/deposition purposes
- Allow people to use or try out the data model
- Interaction with existing programs
- Fully or partially handles
- Ansig, Auremol, Autoassign, Azara, Bruker,
Charmm, CNS/XPLOR/ARIA, Concoord,
Diana/Dyana/Cyana, Discover, Fasta, Felix, MARS,
Module, .mol, mol2, Molmol, Monte, NmrDraw,
NMRPipe, NMR-STAR (v2.1.1, v3.0), NmrView, PDB,
Pipp, Pistachio, Pronto, Sparky, Talos, Varian,
XEasy - Sequences, chemical compounds, coordinates, NMR
measurements, constraints and peak lists,
processing and acquisition parameters.
25Format Converter - The NMR Translator
Peaks
Chemical shifts
Acquisition parameters
XEasy
NmrView
XEasy
NmrView
Bruker
Varian
...
...
Format specific readers
Generic peak converter
Generic chemical shift converter
Generic acquisition parameters converter
Data model entry
CCPN Data Model
Format specific writers
XEasy
XEasy
NmrView
NMRPipe
Azara
...
...
NmrView
Chemical shifts
Peaks
Processing parameters
26CCPNGrid
- UK e-Science pilot project
27CcpNmr Analysis
- NMR Assignment Program
- Replace ANSIG and Sparky
- Demonstrates CCPN approach
- Modern interface and scripting
- Scalable and extensible
- Operating Systems
- Linux, Sun, SGI, OSX, Windows
28CcpNmr Analysis Development
- Developers
- Wayne
- C code
- Spectrum display
- Myself
- User interface
- Assignment
- Data Analysis
- Dan
- Windows version
- Languages
- Python
- Data model interaction
- Tk Graphical interface
- Scripting
- C
- OpenGL/Tk contours
- Structure display
- Mathematical operations
29CcpNmr Analysis
30CcpNmr Analysis Highlights
- Read many spectrum formats
- Felix, Azara, NMRpipe, NMRView, UCSF, Bruker
- Multiple, superimposable N dimensional spectra
- 2,3,4,5
- Sampled dimensiond (pseudo 3D)
- Resonance based data model assignment
- Comprehensive molecular descriptions
- Complexes, conjugates, unusual residues, organic
compounds etc - Streamlined assignment productivity
- Link related peak lists quickly, peak matching
for backbone etc - Structure generation interface
- Create restraints for ARIA (direct link with
2.1), CYANA, HADDOCK etc - Inbuilt structure viewer
- Violation analysis with direct peak links
- Data analyses
- Relaxation rates, HetNOE, chemical shift changes
- Python macro access to everything
31New CcpNmr Analysis Features
- Windows version!
- Major upgrade for new data model
- Hopefully fairly transparent, except for macro
writers! - Precalculated contour files
- Movable peak annotations
- Pseudo-3Ds from AZARA (sampled data dims)
- Synthetic peak lists
- NOEs from structure, HSQC from correlated shifts
etc... - Carbohydrate functionality
- Separate base and sugar atom assignment options
- Automatically link sugar resonances to ring atoms
(not yet released) - Follow chemical shift changes
- Titrations, temperature etc
- Network anchored constraints
- Improvements for solid state
- Horizontal strips, experiment prototypes etc
32Movable Peak Annotation Labels
33Synthetic Peak Lists
34Following Chemical Shift Changes
35Resonances and Assignment
- Resonances
- The centre of the NMR data model
- Connect to peaks
- Different peaks may be caused by the same thing.
- Connect to atoms
- A connection to NMR equivalent atoms. Need not be
set if anonymous. - Have chemical shifts
- May have different shifts under different
conditions.
Experiment Spectra Conditions
Constraint Distance Dihedral
Measurement Chemical Shift Relaxation Coupling
Peak Dimensions
Resonance
Annotation Spin System Connectivity Residue Type
Structure Co-ordinates
Molecule Atoms Residues Chains
36Constraints HADDOCK Style
37Network Anchoring
Hy1
Hx1
NO support
Hx1
,Hy1
Hx1
,Hy2
Hy1
Hy3
,Hy3
Hy2
Ha
Hx2
,Hy1
Hy3
Hy3
Hx2
Hx2
,Hy2
Hx2
,Hy3
Hx1
Hx2
Hb
Ha and Hb bridge Network SUPPORT
38Network Anchoring
Dense initial network
39Network anchoring reliability
Ambiguous too!
40Future CcpNmr Analysis Features
- RDCs and couplings
- Now easier using new Peak model (no more
SubPeaks) - COSY peak picker
- Full structure ensemble support
- Isotope labelling schemes (with FMP Berlin)
- Isotopomer templates
- Editable molecule labelling
- Edit molecular sequence with assignments
- More fitting functions
- GUI profiles
- Used by multiple projects
- Store colour preferences etc
41 42The CLOUDS Protocol
- Assignment-free structure determination
- Per Kraulis, Thérèse Malliavin, Irvin D. Kuntz
- CLOUDS by Miguel Llinas, Alex Grishaev
- Spatial distribution of anonymous resonances
generated with NOEs
H
H
H
H
A network of distance constraints between
anonymous atoms is sufficient to generate a low
resolution protein structure.
43Assignment from Family of Clouds
Yeast Hho1p GI Linker Histone H1 globular domain
I J.O.Thomas, K.Stott
A family of five Clouds
44Original CLOUDS problems
- Too few restraints for most proteins
- Protocols needed clean 2D spectra, no 3D
protocols - Poor resonance disambiguation
- Distance calibration only for 2D NOESY
Example Yeast Hho1p GI Linker Histone H1
globular domain I J.O.Thomas, K.Stott (92
residues)
An ideal family of Clouds
Backbone of ideal Clouds
Backbone of REAL Clouds Globally inconsistent
45Beyond CLOUDS CcpNmr Nexus
- Entirely new code!
- No FORTRAN
- Dedicated 3D/4D protocols
- Use any existing assignments
- Use backbone experiments, where available
- Better peak to resonance linking Network
anchoring - Incorporation of covalent information
- Iterative cycle
46The CcpNmr Nexus protocol
Experiments used 15N HSQC-NOESY, 13C
HSQC-NOESY, HNcoCACB, HNCACB, HNCA, HNcoCA
NOESY Peaks
Network Anchoring
Restraints
1H Structure
Assignments
Threader
Xplor NIH
Violation Analysis
Covalent Connections
47Iterative improvement
A
B
C
48Improved Network Anchoring
Hy1
Hy2
Hy3
Hx1
Hx2
Ha
Hy3
Hx2
Hb
Ha and Hb bridge Network SUPPORT
49Nexus Threader
- Fully automated
- Accepts any initial typing and assignment
information - Optimised assignments from clouds using dynamic
programming and Monte-Carlo sampling - Works with any backbone triple-resonance
experiments - Works with backbone and side chains
- Only add well-fitting backbone, then generate a
new, better structure
Globally inconsistent structure Local
connectivity still threadable!
50CcpNmr Threader
- Assign spin systems to residues
- Score chemical shift match for residue type
- Score backbone experiment peak matches
- Score inter-atomic distances
- Hn, Ha Hb distances
- i-3 to i3
51Growing Residue Templates
HN
HN
- Residue fragments
- ANY
- ALF
- BET
- BBR
- GAM
- DEL
- PHY
- IVT
- ASX
- GLX
- Fit from 1H structure
- Xplor topology files
- Grow through iterations
- ANY-gtALF-gtBBR-gtVAL
- Etc
C
CHCH2CH2CH2
OC
OC
HN
HN
CH
CHCH2CH2CO
OC
OC
H
H
HN
HN
CHCH2
CHCH2
OC
OC
H
H
HN
HN
CHCH2CH2
CHCHCH3
OC
OC
52Iterative improvement
A
B
C
- Network anchoring
- Improves with bonds!
- Generate structure
- Thread assign
- Grow new bonds
- Repeat
53What can actually be done
- 92 residue test protein
- Helix, Sheet and disordered tail
- 1135 15N NOESY peaks
- 273113C NOESY peaks
- First structure protons only
- Ensemble of 10
- Generated in 15 minutes
- Correctly fit all 86 amide protons
- Second structure backbone linked
- Ensemble of 10
- Generated in 38 minutes
- Correctly fit 33/85 alpha protons
54Development within the CCPN framework
55CCPN Interface Schemes
Via FormatConverter
Application
Proprietary Memory
Formatted File
Format Converter
CCPN XML/SQL
In-memory conversion
Custom conversion
Application
Proprietary Model
CCPN Data Model
CCPN XML/SQL
Direct API access
Application
CCPN XML/SQL
CCPN Data Model CcpNmr Functions
56CcpNmr Analysis Function Library
- Assignment
- Constraints
- Data Analysis
- Shift differences
- Hetero NOE
- Relaxation rates
- Experiments Spectra
- Peaks
- Structure
- Spectrum Windows
- assignResToDim(peakDim, resonance)
- Assign a resonance to peak dimension
- Checks
- Any atoms are of a valid molecule
- Isotopes match dimension
- Shift is within tolerances
- Whether the aliasing is changed
- Creates
- Covalent links between resonances
- Peak annotation label
- Updated chemical shift value
- Data Model objects
57CcpNmr Graphical Widgets
- A library for any developer to use
ColorList PulldownMenu ScrolledMatrix LabelFrame C
heckButton Button Label Entry ButtonList
58Aftercare
- www.ccpn.ac.uk
- Downloads
- Data Model documentation
- Analysis documentation
- Tutorials
- Mailing List
- http//www.jiscmail.ac.uk/lists/CCPNMR.html
- Quick response
- Bugs
- Requests