Title: Maximum Entropy
 1Maximum Entropy
- RESM 575 
 - Spring 2009 
 - Lecture 13
 
  2Maximum entropy
(Phillips et al. 2008)
- History 
 - E. T. Janes 1957 
 - Thermodynamics 
 - Inference and information theory 
 
  3The Maximum Entropy Method
- Origins Jaynes 1957, statistical mechanics 
 - Recent use machine learning, eg. automatic 
language translation  - To estimate an unknown distribution 
 - Determine what you know (constraints) 
 - Among distributions satisfying constraints 
 - Output the one with maximum entropy
 
  4(No Transcript) 
 5What is it?
- Maxent is a general-purpose method for making 
 - predictions or inferences from incomplete 
information.  - Its origins lie in statistical mechanics (Jaynes, 
1957), and it remains an active area of research 
with an Annual Conference, Maximum Entropy and 
Bayesian Methods, that explores applications in 
diverse areas such as  - astronomy, portfolio optimization, image 
reconstruction, statistical physics and signal 
processing.  
  6Like other Bayesian models
- Uses prior information 
 - Maxent is an alternative to methods of inference 
of classical statistics 
  7Maximum Entropy Principle
 The fact that a certain probability distribution 
maximizes entropy subject to certain constraints 
representing our incomplete information, is the 
fundamental property which justifies the use of 
that distribution for inference it agrees with 
everything that is known but carefully avoids 
assuming anything that is not known (Jaynes, 
1990). 
 8Why?
- Introduced as a general approach for presence 
only modeling of species distributions, suitable 
for all existing applications involving 
presence-only datasets.  
  9Modeling species distributions
Yellow-throated Vireo
occurrence points
environmental variables 
 10Estimating a probability distribution
- Given 
 - Map divided into cells 
 - Environmental variables, with values in each cell 
 - Occurrence points samples from an unknown 
distribution  -  Our task is to estimate the unknown 
probability distribution  - Note 
 - The distribution sums to 1 over the whole map 
 - Most probability values will be very small 
 - Different from estimating probability of presence 
 
  11Entropy
- More entropy  more spread out, closer to 
uniform distribution  - 2nd law of thermodynamics 
 -  Without external influences, a system moves to 
increase entropy  - Maximum entropy method 
 -  Apply constraints to remove external influences 
 -  Species spreads out to fill areas with suitable 
conditions  -  
 
  12Using Maxent for Species Distributions
- Features 
 - Constraints 
 - Regularization
 
  13Features impose constraints
Feature  environmental variable, or function 
thereof
find distribution p of maximum entropy such 
that for all features f mean(f)  sample average 
of f 
 14Features
- Environmental variables or functions thereof. 
 - Maxent has these classes of features (others are 
possible)  - Linear  variable itself 
 - Quadratic  square of variable 
 - Product  product of two variables 
 - Binary (indicator)  membership in a 
category  - Threshold  
 - Hinge 
 
1
0
Environmental variable
1
0
Environmental variable 
 15Constraints
Each feature type imposes constraints on output 
distribution Linear features  mean Quadratic 
features  variance Product features 
  covariance Threshold features  proportion 
above threshold Hinge features  mean above 
threshold Binary features (categorical) 
 proportion in each category 
 16Regularization
precipitation
sample average
true mean
temperature
find distribution p of maximum entropy such 
that Mean(f) in confidence region of sample 
average of f 
 17The Maxent distribution
 is always a Gibbs distribution
q?(x)  exp(Sj ?jfj(x)) / Z
Z is a scaling factor so distribution sums to 
1 fj is the jth feature ?j is a 
coefficient, calculated by the program 
 18Maxent is penalized maximum likelihood
Log likelihood LogLikelihood(q?)  1/m Si 
ln(q?(xi)) where x1  xm are the occurrence 
points. 
Maxent maximizes regularized likelihood LogLike
lihood(q?) - Sj ßj?j where ßj is the width of 
the confidence interval for fj Similar to Akaike 
Information Criterion (AIC). 
 19Output
- When Maxent is applied to presence-only species 
distribution modeling, the pixels of the study 
area make up the space on which the Maxent 
probability distribution is defined,  - Pixels with known species occurrence records 
constitute the sample points, and the features 
are  - climatic variables, 
 - elevation, 
 - soil category, 
 - vegetation type or other environmental variables, 
and functions thereof.  
  20To note
- Sometimes both presence and absence occurrence 
data are available for the development of models, 
in which case general-purpose statistical methods 
can be used  - (for an overview of the variety of techniques 
currently in use, see Corsi et al., 2000 Elith, 
2002 Guisan and Zimmerman, 2000 Scott et al., 
2002). 
  21Opportunity
- However, while vast stores of presence-only data 
exist, (records etc.) absence data are rarely 
available,  - Poorly sampled areas, remote, difficult 
 - Absence data may be of questionable value in many 
situations  
  22(No Transcript) 
 23Background
- 16 modeling methods 
 - 226 well surveyed species in 6 regions of the 
world  
  24The authors used three statistics, the area under 
the Receiver Operating Characteristic curve 
(AUC), correlation (COR) and Kappa, to assess the 
agreement between the presence-absence records 
and the predictions. 
 25(No Transcript) 
 26(No Transcript) 
 27Maximum Entropy
- Only useful when applied to testable information. 
(whether a given distribution is consistent with 
it)  - Given testable information, the maximum entropy 
procedure consists of seeking the probability 
distribution which maximizes information entropy, 
subject to the constraints of the information.  - This constrained optimization problem is 
typically solved using the method of Lagrange 
multipliers. 
  28(No Transcript) 
 29Output format
 Raw output Cumulative output 
 30Cumulative output format
- Gives estimate of omission rate 
 - A pixel p has cumulative value c 
 - Total probability of pixels with lower 
probability than p is c  - Set a threshold of c 
 - Binary model with presence if cumulative value  
c  - Omission rate is c if test data drawn from 
Maxent distribution  - Predict omission rate of c for real test data 
 - Example thresholds 
 - 5 (light red) 
 - 20 (dark red)
 
  31Logistic output format
- Estimates probability of presence 
 - Between 0 and 1 
 - Scaled so that a typical presence has value 0.5 
 - Defined as 
 - c q?(x) / (1  c q?(x)) 
 -  where c  exp(H(q?(x)) 
 - Probability of presence depends on sampling 
details  - Site size 
 - Observation time 
 - These details should correspond to collection 
effort for occurrence points 
  32Response curves
- Show how predicted probability of presence 
depends on each variable  - Simple features ? simpler model 
 - Easier interpretation 
 - Complex features ? complex model 
 - Better fit to data 
 - Linear  quadratic (top) 
 - Threshold features (middle) 
 - All feature types (bottom)
 
  33Effect of regularization multiplier  0.2
Smaller confidence Intervals Lower 
entropy Less spread-out 
 34Effect of regularization multiplier  5
Larger confidence Intervals Higher 
entropy More spread-out 
 35Effect of regularization over-fitting
Regularization multiplier  1.0 (not over-fit)
Regularization multiplier  0.2 (clearly over-fit) 
 36- Sage grouse distribution model 
 - MAXENT software package 
 - Consistently superior to alternative methods 
 - Robust to colinearity between explanatory 
variables  - Accepts continuous and categorical variables 
 - Stable distribution with limited training data 
 - Evaluates relative variable importance
 
  37West Virginia Conservation Prioritization using 
Species Distribution Modeling
- Michael Dougherty 
 - West Virginia Division of Natural Resources
 
The Conservation Fund 
 38- Project Goals 
 - Develop statewide conservation prioritization map 
based on the  - distribution of 
 - Species of Greatest Conservation Need (SGCN) 
 - Habitats of Concern 
 - Existing public land 
 - The Challenge 
 - Develop distribution models for 500 state-tracked 
species  - Species include plants, herps, birds, bats, 
mammals, aquatics  - Modeling process must be defensible, transparent, 
and repeatable 
  39- Occurrence data 
 - 1. State Natural Heritage Program Biotics 
database  -  Biologists collect Source Features 
 -  Source Features are grouped into Element 
Occurrences (EOs)  -  EOs represent known populations 
 -  Species identification is accurate and spatial 
accuracy documented  -  Use of EOs seems to greatly reduce spatial 
autocorrelation  - 2. Community Ecologists Vegetation Plots 
Database  
  40- Predictor Variables 
 - Developed a broad range of predictor variables 
 -  Climate 
 -  Landcover 
 -  Terrain 
 -  Ecoregions 
 -  Geology 
 -  Soils 
 -  Disturbances 
 
  41- Workflow Overview 
 -  Build an array of workstations to run models 
 -  Develop R scripts to automate running the 
maxent models by iterating through all 500 
species  -  Develop web-based map viewer to assist 
biologists in reviewing maxent model results  -  Perform patch and connectivity analysis using 
FunConn  -  (TBD) Assign weights to patches and connectors 
 
  42- Scripting Steps 
 - Developed R script to performed variable 
pre-selection using boosted regression trees to 
reduce the number of variables to an appropriate 
number (30)  - Developed R script to produce the maxent batch 
files and perform file management  - Developed R script to harvest maxent results, a 
Python script to store grids in an ArcSDE 
database, and publish results to a website  - (TBD) Develop R scripts to perform functional 
connectivity analysis  - (TBD) Perform layer weighting to produce 
conservation prioritization index  
  43(No Transcript) 
 44(No Transcript) 
 45(No Transcript) 
 46(No Transcript) 
 47(No Transcript) 
 48Occurrence localities
- Csv file format. Each line has 
 - Species name 
 - X coordinate 
 - Y coordinate 
 - Multiple species can be in 1 file. 
 
Example species,longitude,latitude bradypus_vari
egatus,-65.4,-10.3833 bradypus_variegatus,-65.3833
,-10.3833 bradypus_variegatus,-65.1333,-16.8 brady
pus_variegatus,-63.6667,-17.45 
 49Environmental variables
- ESRI ascii raster grid file format. 
 - One file per environmental variable 
 - All files must have exactly the same bounds, cell 
size  - Coordinate system must be same as for occurrence 
localities  - Alternative Diva .grd format.
 
  50Samples with data (SWD) format
- Environmental data given with samples in a .csv 
format file  - Example 
 - species,longitude,latitude,cld,dtr,ecoreg,frs,h_de
m,pre,pre_l10,pre_l1,pre_l4,pre_l7,tmn,tmp,tmx,vap
  - bradypus_variegatus,-65.4,-10.3833,76.0,104.0,10.0
,2.0,121.0,46.0,41.0,84.0,54.0,3.0,192.0,266.0,337
.0,279.0  - bradypus_variegatus,-65.3833,-10.3833,76.0,104.0,1
0.0,2.0,121.0,46.0,40.0,84.0,54.0,3.0,192.0,266.0,
337.0,279.0  - bradypus_variegatus,-65.1333,-16.8,57.0,114.0,10.0
,1.0,211.0,65.0,56.0,129.0,58.0,34.0,140.0,244.0,3
21.0,221.0  - bradypus_variegatus,-63.6667,-17.45,57.0,112.0,10.
0,3.0,363.0,36.0,33.0,71.0,27.0,13.0,135.0,229.0,3
07.0,202.0  
  51Background data in SWD format
- Environmental data at (typically) random points 
in study area  - Useful 
 -  when environmental grids huge 
 - Maxent needs only small random sample (10,000) 
 -  when doing non-uniform sampling 
 - Example 
 - species,longitude,latitude,cld,dtr,ecoreg,frs,h_de
m,pre,pre_l10,pre_l1,pre_l4,pre_l7,tmn,tmp,tmx,vap
  - background,-61.775,6.175,60.0,100.0,10.0,0.0,747.0
,55.0,24.0,57.0,45.0,81.0,182.0,239.0,300.0,232.0  - background,-66.075,5.325,67.0,116.0,10.0,3.0,1038.
0,75.0,16.0,68.0,64.0,145.0,181.0,246.0,331.0,234.
0  - background,-59.875,-26.325,47.0,129.0,9.0,1.0,73.0
,31.0,43.0,32.0,43.0,10.0,97.0,218.0,339.0,189.0  - background,-68.375,-15.375,58.0,112.0,10.0,44.0,20
39.0,33.0,67.0,31.0,30.0,6.0,101.0,181.0,251.0,133
.0  - background,-68.525,4.775,72.0,95.0,10.0,0.0,65.0,7
2.0,16.0,65.0,69.0,133.0,218.0,271.0,346.0,289.0