Title: A Predictive Model of Inquiry to Enrollment
1A Predictive Model of Inquiry to Enrollment
- Cullen F. Goenner, PhD
- Department of Economics
- University of North Dakota
- cullen.goenner_at_und.nodak.edu
- www.business.und.edu/goenner
- Kenton Pauls
- Director of Enrollment Services
- University of North Dakota
- kenton.pauls_at_mail.und.nodak.edu
2Issues Facing Enrollment Managers
- Finding new markets
- Increasing Tuition
- Declining population (ND)
- Increasing competition
- Need to attract a particular type of student
- Diversity/Quality
- Data driven analysis
- Accountability
3Questions we will answer today
- What is predictive modeling?
- How does one build a predictive model?
- How can predictive modeling be used by
institutions of higher education to improve
enrollment?
4What is Predictive Modeling?
- Predictive modeling uses statistical/econometric
methods to quantitatively predict the future
behavior of individuals. - Steps include
- Data collection on the subject of interest
- Build the model based on data analysis
- Predictions made out of sample
- Model validation/testing
5College Choice
- 3 stage process - Hossler and Gallagher (1987)
- Predisposition/aspiration for higher education
- Encouragement, coursework, and interest.
- Search of potential schools
- Councilors, campus contacts, program
availability - Selection
- SES, Ability, Fit, Geography
6Factors Influencing Choice
- Economic perspective
- Education an investment in human capital
- Cost vs Benefit calculus
- Psychological perspective
- Need of self to find sense of belonging and
fulfillment of needs. - Sociological perspective
- Social interaction dictated by societal/family
norms.
7Existing Empirical Work
- Search Choice
- Applications
- DesJardin, Dundar, Hendel (1999)
- Weiler (1994)
- Interest SAT scores sent
- Toutkoushian (2001)
8Existing Models of Enrollment Choice
- Model a students binary choice to enroll at a
particular college while controlling for a
students characteristics. - Logistic models used
- Conditional on students have
- Applied
- Bruggink and Gambhir (1996)
- Thomas, Dawes, and Reznik (2001)
- Admitted
- DesJardins (2002)
- Leppel (1993)
9Our Predictive Model
- Builds on the models of DesJardins (2002) and
Thomas, Dawes, Reznik (2001) - Focus here is on prediction of enrollment of
students that inquired of our institution. - Inquiry model is relevant because
- Time of information exchange, opinion formation
- Allows for early intervention in a students
decision making process (Target Marketing)
10Inquiry Model Challenges
- Data collection
- Data already collected on those who are admitted
or apply. Typically not collected for inquiries. - Quality of data
- Applicants provide detailed data describing
themselves (demographic data test scores, HSGPA,
etc.), which are not available for most student
inquiries.
11Types of Inquiries We Recorded
- Return of information card
- Attendance of college fair
- Campus visit
- Contact via e-mail
- Contact via phone
- Referral from faculty, coach, or alumni
- ACT automatically submitted
12How these data were captured
- Enrollment Services Prospective Student Network
relational database (ESPSN) - Customized system
- SQL 2000/Visual Basic
13Information Collected From Information Request
Card
- Name
- High School attended
- Interested Major (if any)
- Address
- Lacks the demographic data typical to
application records and use in most predictive
models.
14Geodemography
- Process of attaching demographic characteristics
to geographic characteristics. - Notion is that Birds of a Feather Flock
Together, i.e. individuals living in the same
neighborhood will tend to have similar behavior
patterns. - Ex Neighborhoods homogenous in terms of
household income, occupations, family size, and
purchases.
15Implementation
- US Census data aggregated to zip code level
- Geodemographic variables considered for our
model specification - College age demographic
- Population
- Average Income
- White demographic
- Median age
16Building the model
- Binary choice model Model whether students, who
inquire of UND, either enroll or do not enroll. - 15,827 students made inquiries for Fall 2003
enrollment. Of these students 2067 actually
enrolled. - Logistic regression model used.
17Candidate Control Variables
- Type and Frequency of Contact
- Geographic
- Academic
- Geodemographic
- Interaction Effects
18Contact Variables
19Geographic Variables
20Academic/Geodemographic
21Interaction Terms
22Model Specification
- Researchers typically assume their model
specification is the true model which generates
the data. - Difficult to justify a priori the choice of
variables to include in model, given each by
design is theoretically relevant. - With k candidate variables there are 2k different
linear models one could consider.
23- Consider the case in which several models M1,
MK are theoretically possible. - Basing inference on the results of a single model
is risky. - Bayesian model averaging (BMA) allows us to
account for this type of uncertainty.
24BMA
- The posterior distribution of the parameters
given the data in the presence of uncertainty is
the posterior distribution under each of the K
models, with weights equal to the posterior model
probabilities P(Mk/D) . - (1)
25- Posterior Model Probability is
- (2)
- Where P(D/Mk) is the likelihood and P(Mk) is the
prior probability that model Mk is the true
model, given one of the K models is the true
model.
26Posterior Model Probability
- Assuming a non-informative prior, (P(M1)
P(Mk) 1/K) - (3)
27- The posterior mean and variance summarize the
effects of the parameters on the dependent
variable. Raftery (1995) reports - (9)
- where (k) and Var(k) are MLE under model k,
and the summation is over models that include .
28BMA Implementation
- SPlus function bic.logit performs BMA on
logistic regression models. - 30 regressors implies summation in equation 1
over 1 billion models. - To manage summation we use Occams window.
29Occams Window
- Exclude models that predict the data
sufficiently less than predictions of the best
model. Predictions based on PMP of each model.
Models in A are included
30Results
- 26 Models supported by the data
- Model with highest PMP receives 21 of total.
- Variables that receive strong support for
inclusion include - Geographic Distance, HY State, HY School,
Competitor distance - Geodemog College Age, Average Income
- Contacts Number, Campus visit, Referral
31(No Transcript)
32Out of Sample Predictive Performance
- Split the data into two equal parts
- First part used to build/estimate the model
- Second part used to test the models predictions.
- Outcome (enrollment) is binary, while our model
generates a probability estimate.
33What is a successful prediction?
- Greene (2001) - No correct choice for
probability cutoff. Typical value is .5 - Tradeoff in cutoff choice
- Lower cutoff increases the accuracy of inquiries
that are predicted to enroll and who actually
enroll (sensitivity) at the expense of inquiries
predicted to enroll and do not enroll (false
positive rate)
34Predictive Performance Classification
35Predictive performance
- 89 of observations correctly classified
- Specificity 97
- Sensitivity 36
- ROC curve describes relation between sensitivity
and 1- specificity (false rate) - Area under ROC curve .87
36Another Predictive Performance Method
37- 79 of enrolled found within 22 of entire
population (scores gt 0.2) - Focused efforts without compromising enrollment
numbers - Efficiency implications
38Practical Applications
- Effective regional market segmentation
- Targeted tele-counseling efforts
- Special projects
39Regional Market Segmenting
- Target Marketing and Segmentation
- Prospect names purchased based on zip code.
- Establish a predictive score for all zip codes
in US based on census-level data
40What the data indicated (WA)
41Where enrolled students came from (WA)
42- 83 of enrolled WA students fell within top
scoring zips over three years - Direct Mail Names Purchases
- Prior years very open search criteria
- MN, CO, SD, MT
- This year, much more restrictive to get deeper
into broader markets - Only key zips
- CO, WA, OR, AZ, IL, MN, etc.
43WA Search Names - 2003
44WA Search Names - 2004
45Targeted Tele-Counseling Efforts
- Student calling program
- Top 20 of all model scores identified
- Fluid number excluding applicants
- Prompt student to take action
46Special Projects
- Limited funds but targeted initiatives
- Focus on as many of top scoring students
- Postcards, brochures, etc.
47Possible Future Research
- Cluster analysis for better market segmentation
- Study of marginal effects
48Thank You!