Title: Xray Crystallography Workshop
1X-ray Crystallography Workshop
- DAY 3
- Software
- Data processing from crystal
- Lunch
- Lecture on Molecular Replacement
- Everybody run CCP4 programs
2Before we solve the structure, we needto
convert the scaled output data (Intensities)to
Amplitudes
- Scaled data are h, k, l, Intensity, std.
Deviation, of Intensity (s)I - A CCP4 program called Truncate does the
following - Analyses the data by calculating a Wilson plot
(calculates an absolute scale and temperature
factor for a set of observed intensities, using
the theory of A C Wilson. (WE TELL THE PROGRAM
THE APPROXIMATE NUMBER OF AMINO ACIDS IN
THEASYMMETRIC UNIT, AND IT USES A FORMULA TO FIND
THE AVERAGE f2. IF the atoms are randomly
distributed through the asymmetric unit THEN
ltf2gt should equal scaleltFobs2gt exp -2B(sin
theta/lambda)2. By fitting a least squares
line through ln(ltf2gt/ltFobs2gt) vs 2(sin
theta/lambda)2 the program derives the scale
and B value. For real structures the assumption
that the atoms are randomly distributed is
obviously incorrect. The effect of this is most
obvious in the low resolution reflections. The
Wilson plot will deviate from a straight line
from about 3.0A - 4.0A downwards. Although all
the points on the Wilson plot are plotted, the
scale and B are only determined from a limited
resolution range. - Truncates the data by a method devised by
French and Wilson based on Bayesian statistics.
This has the effect of forcing all negative
observations to be positive, and inflating the
weakest reflections (less than about 3 sd),
because an observation significantly smaller than
the average intensity is likely to be
underestimatedThe F's are calculated using the
prior knowledge of Wilson's distributions for
acentric or centric data (calculated in shells of
reciprocal space in a first pass through the
data) and the mean intensity and standard
deviation values. The F's output are all positive
and follow Wilson's distribution. - Analyses the cumulative intensity distribution of
the data to test for twinning or anisotropy in
the data
3Example of a Wilson Plot
- Every atom's diffracting power is further reduced
at higher resolution by any atomic vibration
(i.e. temperature factor - The deviations from the scattering of a single
atom in protein crystals come from the non-random
distribution of atoms in the unit cell -
including things like alpha-helices and beta
sheets
4The CCP4 programs use a data file format called
mtz (filename.mtz), a binaryfile format
- Truncate outputs an mtz file with h, k, l, F, sF,
plus the original Intensity for each reflection,
including anomalous data if we choose to
(remember h,k,l and -h, -k, -l NOT equal in the
case of heavy atoms - This is the file we will use for the Molecular
Replacement solution for lysozyme
5The phase problem can be solved in several
different ways
- If you have the structure of the same or a
closely related protein - Determine the orientation of the model in the
unit cell of the new structure - Determine the three angles and the translations
that will place the model correctly - Need some kind of target score to evaluate this
placement - The original target uses the Patterson function
(see - http//www-structmed.cimr.cam.ac.uk/Course/MolRep/
molrep.html
6Remember that the equation for electrondensity
includes the Amplitude and phase, which
comes from the positions of all atoms in the
unit cell relative to the origin
7The name of the software we will use for
molecular replacement is called PHASER
- Link to PHASER Website http//www-structmed.cimr.
cam.ac.uk/phaser/ - This is part of the CCP4 software collection
8PHASER uses a different target functionthan the
older programs, takes into accountmore
probabilistic approach
- If you are very mathematically inclined, here it
is - Randy Reads paper
9Lets go back to the main window that opens
when you start CCP4i
- You see three sections
- The left-hand one is where you select the program
you want to run - The middle gives a list of jobs that have been or
are being run under the current project, and
their status (running, finished) - The right-hand section gives a list of tools
10To run a particular program, select it from
themenu on the left-hand side
- Select the Data Reduction menu option
- Select the import scaled option this will open a
new window - These steps are the same for running any CCP4
program
11- Now we enter the required data in the yellow
boxes - The Browse button allows you to select a file
rather than having to type the name in - We will NOT use anomalous data
- This is the window for running the Truncate
program discussed previously
12Now lets run PHASER with the output mtzfrom
that last step, but first
- We need a model for lysozyme that we can use as
our search model for molecular replacement - We can go to the PDB to retrieve a model
- http//www.rcsb.org/pdb/home/home.do
- We used this one (1W6Z) successfully in our
Spring Crystallography course
13Its a good idea when you use a solved
structureto edit it to get rid of things like
waters and ionsthat may not be in your structure
in the same place
- Use a text editor to remove the lines after the
protein atoms - the lines starting with ATOM
for example they used holmium (HO) to solve the
structure, and they also modeled some Cl ions
(which we should also be able to do) - This will be your search model file
- More on pdb files ???
14Now lets go back to the PHASER window
- You select your data file at the MTZ in line
- You put your model file in the PDB 1 Line (I
want you to use a special version of the file
where I jumbled up the coordinates (moved the
molecule around) - What about these other items???
- Mode - automated search
- Resolution range - only need to 2.5
- Component - fill in
- Sequence identity -1 for us
- Search details- ensemble1
- We will use the defaults for this simple problem
15When you have filled in all the blanks,you click
on the Run or Run and view com filebutton
- This shows you the script that the GUI has
written into the tmp for temporary, directory - These were the scripts that we used back before
the GUI to run each program individually from the
Unix terminal now you are spared all that
editing - Click continue and it will execute the program
16When PHASER is finished running, you can look at
the various output files
- Click on the right-hand side of the main CCP4
window you can look at the log file, and the
output files. - First, lets look at the workshop_2.sol
17- The one solution is shown on the last line - the
three angles that the search model are rotated
through to match the data, and the three
translations (fractional part of the unit cell) - The scores are shown on the line above - LLG is
large and positive, Z values are way above the
benchmark of 7 - We can also go through the log file
18What happens when we use a model thathas all the
side chains replacedby Ala
- The scores are lower but basically it finds the
same solution - the backbone is so similar that
the side chains are not so important.
19What happens when we use a model thatis related
but not 100 identical
- Try bob-white quail lysozyme
- These numbers LOOK different but its actually
almost superimposed onto the solution with
hen-egg-white lysozyme - the original model for
bob-white quail is in an entirely different unit
cell I could have tried to superimpose the
models first, but this molecule has only side
chain differences and no insertions or main chain
differences
20What happens when we use a model thatis not
related closely enough
- Try T4 lysozyme (phage)
- The scores are a lot lower, and you get more than
one top solution - Negative LLG are a pretty good sign of a bad
solution, as are low Z scores - If you look at the log file for this run, you
will see that it is long and has many many tried
to fit SOMETHING
21PHASER outputs a pdb file that containsthe
search model rotated and translatedby the
appropriate amounts indicated in thesolution
file.
- It also outputs an mtz file with the original
structure factor amplitudes, but also the
coefficients for two types of electron density
maps - 2Fo-Fc
- Fo-Fc
2Fo-Fc (think of as Fo (Fo-Fc)) uses phases
calculated from the model and amplitudes from the
measured data minus the the model and amplitudes
from the measured data minus the calculated data.
Gives you the model electron density PLUS the
differences between the REAL data and the
CALCULATED data Fo Fc (difference map) uses
phases calculated from the model and amplitudes
from the REAL minus the CALCULATED data. Tells
you where you either need atoms (positive
difference density) or where you need to get rid
of atoms (negative density).