Title: Getting started with GEMSA
1Getting started with GEM-SA
2This talk
- Starting GEM-SA program
- Creating input and output files
- Explanation of the menus, toolbars, etc.
- Description of the project window
3Starting GEM-SA
- Double-click the GEM-SA icon to start
- The main window appears, with
- Menu
- Toolbar
- Sensitivity analysis output grid
- Log window
4menu
toolbar
Sensitivity analysis output grid
Log window
5Toolbar icons
- New project
- Open project
- Save project
- Print output report
- Edit project
- Generate input design points
- Rescale an input
- Standardise design
- Copy input design to clipboard
- Convert input to integer
- Run the analysis
- Help
6Sensitivity analysis output grid
- This will report the sensitivity results after
the analysis is complete - One line for each input parameter
- One line for each pair of inputs, if joint
effects are selected
7Log Window output
- Tells us
- Which training data are being loaded/saved
- Transformations applied to the data
- Fitted Gaussian process parameters
- Summary of the uncertainty analysis
8Creating a GEM project
- To build the emulator we first need 3 files
- Data file of code inputs
- Data file of code outputs
- GEM-SA project file
9Restrictions on input/output data
- Single output
- Multiple outputs must be treated individually
- Max 30 input parameters
- Max 400 training points
- The data files are plain text files
- One line for each point
- Input file can be space or tab delimited
10Generating a new input design
- Designs can be generated using the toolbar icon
or the menu Input ? Generate - The design dialog appears
11Generating a new input design
- Click OK and fill in the required range for each
input - Click OK again
12Editing input designs
- If you select a column, you can rescale values of
that input or round values to be integers - Designs can be loaded into or saved from this
window using the Inputs menu. Use to copy the
points to the clipboard for use in other programs
13Types of design
- GEM-SA can generate 2 types of design
- LP-?
- Maximin Latin Hypercube designs
- Both have good space-filling properties
- Ensure all regions of the input space are well
represented
14LP-? design
- Very quick to generate
- Deterministic set of uniform points
- Increasing the sample size just adds points to
the smaller design - Making it useful for sequential analysis
- Only have to generate the extra runs
15Maximin Latin hypercube design
- Maximin Latin Hypercube designs
- Maximise the minimum distance amongst all pairs
of points - Can take a long time to generate
- Univariate projections are equally spaced
- Each input has all its range represented
- Good when only a few inputs are active
16Creating output points from these inputs
- This is the tricky part
- Each row from the input design must be used to
generate a single output, e.g. using - Spreadsheet
- Simple, but requires functional form
- Script
- Only need executable code
- Loop through inputs, modify code input file
- Modify code to loop through the points
- Messy, need source code
17Example using a spreadsheet
- Copy the input design to the clipboard using
- Open Excel and paste inputs
- Create formula in final column
- Copy formula for all rows of the design
- Cut and paste special (values) in a new sheet
- Save as text file
18Example using a script
- Read base input file
- Read training inputs file
- Loop through training file lines
- Replace target inputs using training line
- Write new base input file
- Run code
- Calculate single output and add to training
output file
19my pftchangeline 21 change line 21 within
the input file for each run my _at_pftchangecols
(11,14,23,19) columns within pftchangeline to
modify my _at_pftinlh (0,1,2,3) ordering of
these parameters within training
inputs open(BASEINFILE, "input.dat")
getinitial (fixed) input file used by sdgvmd my
_at_lines ltBASEINFILEgt and store the input
lines in _at_lines close BASEINFILE open(LHFILE,
"training_inputs.txt") my newpftline
linespftchangeline my _at_newpftpoints
split(" ", newpftline) while (ltLHFILEgt)
assigns each line in turn to _
chomp split my _at_lhpoints
_at__ open(INFILE, "gt inputfile.dat") _at_newpftpoin
ts_at_pftchangecols _at_lhpoints_at_pftinlh
modify lines linespftchangeline join(' ',
_at_newpftpoints)."\n" print INFILE _at_lines close
INFILE sdgvm0 input.dat run sdgvm0 with
modified input now do something with the
output files.... ...
20The project window
- Appears whenever you
- Load a project
- Edit a project
- Create new project
- This window has 3 tabs
- Options
- Files
- Simulations
21What are the input names?
How many inputs?
22What should be calculated, and how?
Which joint effects should be calculated?
23What prior mean for the output?
Are the inputs uncertain?
24What kind of prediction?
What kind of cross validation?
25Names for the input files
Names for the output files
26MCMC control parameters
How many realisations of predictions, main and
joint effects to generate
How many points used to calculate main effects,
joint effects
27Input parameter names
- This window appears if you press the Names
button - Giving names is optional, but useful later when
looking at GEM-SA output - Ordering can be changed using the arrows
28Selecting joint effects
- If you select calculate joint effects, individual
items in the joint effects window can be
highlighted for inclusion in joint effect
calculations - Need to unselect the default all inputs first
- Unless you want to consider all pairs
29Other checkboxes
- Sum effects
- Use this if you want main effects of the 2 inputs
to be included in the realisations of the joint
effect of a pair - The sensitivity measure, which computes joint
sensitivity indices separately from the component
main effects
30Other checkboxes
- Code has numerical error
- Use this if your code has numerical errors which
you want to smooth out - The variance of the error will be estimated as
part of the fitting process - Can make the fitting process quite unstable, so
avoid if possible!
31Other checkboxes
- Use MCMC for emulator parameters
- For serious Bayesians only!
- Takes into account uncertainty in the fitting of
the emulator - Slows down the computation substantially, usually
with minimal effect on the results - Auto-tune Metropolis algorithm
- Use only with MCMC
32Input uncertainty options
- All unknown, product normal
- Inputs are independent, normally distributed
- All unknown, uniform
- Inputs are independent, distributed uniformly
between the min and max values of the training
data - All known
- No uncertainty analysis required
33Input uncertainty options
- Some known, rest product normal
- Some input values will be fixed (in the dialog
window or in a prediction file) - Others will be given normal input parameters
34Prior mean options
- If you believe the output is roughly linear
function of its inputs, select linear term for
each input - Otherwise a single value will be used to
represent the prior overall level of the output
35Input normal parameters
- Window appears if you click OK having selected
normal inputs
36Input fixed and normal parameters
- Window appears if you click OK having selected
some fixed inputs, rest normal - For fixed inputs, tick the box and enter the
fixed value in the first test box
37Selecting prediction type
- Predictions can be
- Correlated realisations of outputs at the
prediction inputs - Similar to main effect outputs
- Marginal means and variances of outputs at the
prediction inputs - Faster to compute, especially with many
prediction points - Easy to interpret
38Selecting cross validation type
- Choice of none, leave-one-out or leave final 20
out - Leave-one-out
- Hyper-parameters use all data and are then fixed
when prediction is carried out for each omitted
point - Leave final 20 out
- Hyper-parameters are estimated using the reduced
data subset