Title: Mark Hamner
1 Predicting Real-Time Percent Enrollment
Increase __________________
Mark Hamner Texas Womans University Department
of Mathematics and Computer Science Preet
Ahluwalia Credit Risk Analyst-AmeriCredit
2Texas Womans University Denton . Dallas . Houston
Year 2005 Facts
- Total Enrollment 11,344
- Undergrad 6,266
- Graduate (Masters) 4,369
- Doctoral - 709
- Campus Enrollment
- Denton 9,157
- Dallas 921
- Houston 1,266
59 academic programs (19 doctoral)
3Outline
Problem Definition Predicting Student Enrollment
at Time t Using Historical Data
- Enrollment Process - For Newly Enrolled
- The predictive problem
- Logistic Prediction Model
- a. Data Issues and programming Solutions
- Quadratic Prediction Model
- a. Exploratory analysis to Identify Patterns
- Combine for overall Prediction Results
4Enrollment
- Enrollment predictions can be broken into two
fundamental pieces
- The focus of this paper is the prediction of
Newly Enrolled students.
Newly Enrolled Students
Re-Enrolling/ Continuing Students
5New Students Enrollment Process
6Idea Behind Enrollment Prediction at Time t
7Enrollment Prediction at Time t
- ? Let Time t denote the prediction date
- For Applicants Before t , we will have data
- For Applicants after time t (denoted by t) ,
we will not have data - Total Enrollment Enroll_t Enroll_t
8Weekly Partition of Prediction Interval
- The prediction interval will be broken up into
weekly Intervals - The diagram below illustrates prediction at
Week 5 - At Week 5 we have 35 more days of applicant
data than at Week 0
Total Enroll Enroll_t Enroll_t
9Enroll_t
- Pt 1, 2, , Nt -- Finite set of applicants
at week t - k ? Pt
- Enrollment is a dichotomous response variable
yk - yk 1 (student enrolled), yk 0 (student did
not enroll) - Enrollment of all applicants at week t ,
10Model Dichotomous Variable
- For each yk, k ? Pt
- ? let ?k represent the probability that yk 1
- There exists applicant information for each
individual - xk (x1k, x2k, , xpk) (Distancek, SATk,,
Major_Ratiok) - Use Logistic Regression to model ?k
11Logistic Regression Model
- The probability of student k enrolling is
-
- Lk ß0 ß1 Distancek ß2 SATk ßp
Major_Ratiok
These are predictor variables
12Predict Enroll_t
- Let Y be the random vector of responses
- ? Thus,
Note 1 is a Nt x 1 vector of ones
Estimated Enroll_t is
13Logistic Model
- Predictor variables Distance, DOB,
Major_Ratio, SAT_M, SAT_V, Gender, Personal,
etc. - What variables will get picked for model
building?
14Programming and Variable Selection
- ? Use SAS to create possibly significant
variables - and dummy code categorical variables
- Example Major_Ratio, Ethnic, etc.
- ? Backward Selection
- Slightly different variables are selected
- for FTIC, Transfer, and Graduate.
15FTIC Variable Selection
Variable Name Variable Type Variable Description
Twelve Response 1 if enrolled 0 otherwise
Distance? Explanatory Continuous variable
SAT_M, SAT_V, ACT Explanatory Continuous Variable SAT Math score, SAT Verbal score, Act Score
Give ACT? Explanatory 1 if score provided 0 otherwise
Program Ratio? Explanatory Continuous variable
Major Ratio? Explanatory Continuous variable
Date of Birth Explanatory Continuous variable
Gender? Explanatory 1 if female 0 for male
Apply Early? Explanatory 1 if apply before January 1 0 otherwise
E1, E2, E3, E4, E5, E6, E7 Explanatory Dummy variables for Ethnicity
Personal? Explanatory Discrete Variable Number of key information available for a student
16Case Study-Logistic Model Prediction
- ? Applicant data for 2003 to predict 2004 FTIC
by weekly time intervals
- The Logistic Model does not predict after week
t
17Enrollment after Week t
- Total Enrollment Enroll_t Enroll_t
- At any week t, we need to predict Enroll_t
- Identify historical relationships that may be
helpful
18Applicant Versus Enrolled by Year
Both applications and enrollment have been
increasing Notice enrollment yield is
decreasing
? Is the increase in enrollment matching the
increase in apply?
19Applicant Yield By Strata
- Enrollment is yield from applicant data is
decreasing for each strata - How does this affect yearly increase in
enrollment?
20Percent Increase Applicant Vs. Enrolled
- Applicant increase is not a viable indicator of
enrollment increase
- What patterns are reliable to model?
21Cumulative FTIC Enrollment by Week
Notice the parallel lines, which implies equal
slopes! At any week t, we can relate
Enroll_t to Total Enrollment (Week 17)
Thus, (Total Enroll Enroll_t) should be very
similar from year to year
22Relationship Between Enrollment Total
Enrollment
By definition, (Total Enroll Enroll_t)
Enroll_t
Model Enroll_t and smooth out the consistent
patterns by week
23Enroll_t Model
Use 2003 Enroll_t Model to predict Enroll_t
for 2004 ? Estimate of Enroll_t
(R2 0.9857)
24Predict 2004 Enroll_t
25Predict 2004 FTIC Total Enroll
- ? Total Enrollment Enroll_t Enroll_t
Note 2004 FTIC Actual Total is 687
26Predict 2005 FTIC Total Enroll
- ? Total Enrollment Enroll_t Enroll_t
Note 2005 FTIC Actual Total is 765
27- END -
Thank you! Any Questions?