Title: Craig S Wright,
1A QUANTITATIVE TIME SERIES ANALYSIS OF MALWARE
AND VULNERABILITY TRENDS
- By
- Craig S Wright,
- DTh LLM (Cand.) MNSA MMIT CISA CISM CISSP ISSMP
ISSAP G7799 GCFA CCE - MSDBA AFAIM MACS
- And a partridge in a pear tree
2Who Am I
Craig S Wright, DTh LLM (Cand.) MNSA MMIT CISA
CISM CISSP ISSMP ISSAP G7799 GCFA CCE MSDBA
AFAIM MACS And a partridge in a pear tree
- Senior IS Audit Manager - BDO
- My Specialties
- ISMS, ISO 7799 Consulting and Audit/Review
- Digital Forensics
- Information Security Design and Review
- Threat/Risk Analysis and Review
- Information Risk and Management (ANZ4360)
- Data Mining
- Neural Networks
- Anomaly Detection Systems
- CAATS
- Technology Related Business Continuity Planning
(BCP) and Disaster Recovery Planning (DRP) - Cryptography
3Todays Presentation
- To effectively protect against attacks to the
computers systems and network architecture, we
need to understand the threats and to be able to
create predictive models for them.
4A Quantitative Time Series Analysis of Malware
and Vulnerability Trends
- Introduction and objectives
- The creation of Quantitative Risk models in
Information Systems Security is a field in its
infancy. - The prediction of threats is oft touted as being
too difficult due to a shortage of data and the
costs associated with collecting an analysing
data for a site.
5Research Design / Methods / Data Collection
- It has been deduced that three main problems
exist within the analytical process involved with
Information Systems security (Valentino, 2003) - utilising all available information sources,
- verifying the validity of a suspected computer
system intrusion, and - following a standard process.
6Research Data Sources
- The Wildlist organisation
- Virus Bulletin
- Vendor Virus bulletins
- Vendor vulnerability announcements
- CERT
7ARIMA techniques for time-series analysis
- Three sets of data have been collected for
analysis. These consist of - The reported monthly Virus Incidents (Virus.No),
- The numbers of infections/incidents associated
with the most prevent malware in the month
(Top.Mth), and - The Wildlist collated monthly data for malware
reported in the wild (Wild.Lst).
8Initial observations
- Visual analysis alone is sufficient to see that
trends in malicious code incidents have increased
significantly over the last 3 years in a
non-linear manner.
9Wildlist Trends
- It is clear that there is a trend and that the
variance increases with the mean.
10A logarithmic transform was selected for the
three datasets
- There is a clear trend with all three sets of
data with the number of malicious code incidents
increasing over time. The trends are all roughly
linear (particularly the Wildlist data), but it
is difficult to be sure in the presence of the
other features.
11Analysis of Wildlist Data
- A Timeplot of d1 of the logarithm for the
Wildlist data shows that the series is stationary
after taking one difference. There appears to be
no seasonality with this timeseries.
12Wildlist ACF
13Wildlist Partial ACF
14Inspection of the ACF PACF Plots
- The ACF/PACF plots suggested that either an AR
(1) or MA (1) model for the differenced series
may be suitable. - Taking the log transformed differenced values
(d1), the ACF plot decreases exponentially to
zero and the PACF plot is significant at lag 1.
15Model Comparison
16Model Selection
- Over-fitting either model gave back values of the
coefficients that where not significant at the
p-value lt 5. - The diagnostic plots for each model produced no
significant values within the residual plots and
we could see no evidence of inadequacy for either
model.
17Comparison of forecasts
- To see if there was any important difference in
the models in terms of the aim of the analysis
(forecasting), forecasts and forecast intervals
were computed to a time of the last 5 months to
May 2006.
18Comparison of forecasts
- ARI models where tested.
- No significant differences where found between
the two models and all forecast data were
contained in the predicted confidence intervals.
19Analysis of Virus Incidents
- The analysis is focused on the overall pattern of
malware incidents reported monthly. A side
comparison of the number of incidents which are
attributable to the most prevalent malware
varietals has also been undertaken.
20(No Transcript)
21Analysis of Virus Incidents
- It is clear from the plot of the two variables
alone that the most prevalent malware varietals
follows a similar pattern to the total number of
incidents and that the two functions are becoming
more closely correlated over time. - This would indicate that individual computer
viruses and worms are having a greater impact
individually.
22Analysis of Virus Incidents
- The trend is thus that fewer numbers of malicious
code types are causing more damage. - In the past a large number of virus types where
generally acting at any given time. - The trend is towards greater effects by specific
malicious code samples.
23ACF
24PACF
25Model Comparison
26ARI (5, 1) Model
Model ARI (5, 1) Parameter Estimates
27The residual plot of the ARI (5, 1) model for the
fitted value v the actual value shows no
recognisable pattern
28Tests of the model
- The residual plot of the ARI (5, 1) model for the
fitted value v the actual value shows no
recognisable pattern. A Normal Q-Q plot of the
residuals shows that the residuals are near to
normal, though they are slightly skewed. - None of the values seem to be extreme outliers
however and have not been excluded.
29Prediction
30The ARI (5, 1) model supports predictions for
the 5 month period with all the observed values
falling into the confidence limits
Forecast Values
31Findings
- The threat is not abating!
- It also seems that the industry is not keeping up
with the threat. - Further research into why this is occurring to
assess the future levels of threats should be
conducted
32Where this can lead
- The results demonstrate that time series analysis
is a valid method of predicting trends in
malicious code incidents. - The results have applications to operational risk
in general and further development of models and
risk engines is warranted from the findings.
33Further Research
- Further research into frequency domain analysis
is expected to aide in the determination of
patterns in past threat frequencies. - Analysis of vulnerability data using stochastic
point-process models to gain more insight into
the mechanistic nature of the time series and how
it is affected through the changing nature and
evolution of the Malware varietals would also be
expected to produce significant findings.
34To Conclude
- It is feasible to use ARIMA models to forecast
short-term malware trends. - The numbers of incidents are modelled and the
incident data are input into the software package
for future analysis. - Monthly trend patterns may be derived from
statistic procedure.
35Thank You
36Bibliography Or a day in the life of an academic
junkie
Berman (1992) Sojourns and Extremes of
Stochastic Processes, Wadsworth. Box, P.,
Jenkins, G. (1976) Time-Series Analysis, Rev.
Ed. Holden-Day, US Bridwell, L.M. Tibbet, P.
(2000) Sixth annual ICSA Labs Computer Virus
Prevalance Survey 2000, ICSA Labs US Brillinger,
David (1975) Time Series Data Analysis and
Theory (context) Priestley Brockwell, P.J.
Davis, R.A. (1991). ITSM An Interactive Time
Series Modelling Package for the PC,
Springer-Verlag. New York Brockwell, P.J.
Davis, R.A. (1991) Time series Theory and
Methods, Springer-Verlag. Brockwell, P.J.,
Davis, R.A. (1996) Introduction to Time Series
and Forecasting, 1996, Springer Brown , Lawrence
D. (2003) Estimation and Prediction in a Random
Effects Point-process Model Involving
Autoregressive Terms Statistics Department, U.
of Penn. Butler, S.A. (2001), Improving Security
Technology Selections with Decision Theory.
Emerald Cox, D. R, Isham, V., (1985) Point
Processes, Chapman Hall. Cox, D. Miller, H.
(1965) The Theory of Stochastic Processes.
Chapman and Hall, London, 1965. Chatfield, C.
(1996) The Analysis of Time Series An
Introduction. 5th Ed, Chapman and Hall Chen, Z.,
Gao, L. Kwiat. K, (2003) Modeling the spread
of active worms. In IEEE INFOCOM Coulthard, A.
Vuori, T. A. (2002) Computer Viruses a
quantitative analysis Logistics Information
Management, Volume 15, Number 5/96, 2002 pp
400-409 Figueiredo Daniel R., Liu, Benyuan,
Misra, Vishal, Towsley, Don (200) On the
autocorrelation structure of TCP traffic,
Department of Computer Science, University of
Massachusetts, Amherst, MA 01003-9264, USA, 2002
Elsevier Science B.V. Forgionne, G.A. (1999),
Management Science, Wiley Custom Services,
USA. Giles. K.E. (2004) On the spectral analysis
of backscatter data. In GMP - Hawai 2004,
URLhttp//www.mts.jhu.edu/ priebe/FILES/-gmp
hawaii04.pdf. Garetto, M., Gong, W., Towsley, D.,
(2003) Modeling Malware Spreading Dynamics, in
Proc. of INFOCOM 2003, San Francisco, April,
2003. Harder, Uli, Johnson, Matt W., Bradley,
Jeremy T. Knottenbelt William J. (200x)
Observing Internet Worm and Virus Attacks with a
Small Network Telescope, Department of
Computing, Imperial College London, South
Kensington Campus, London SW7 2AZ, United Kingdom
Electronic Notes in Theoretical Computer
Science Hipel, K. W., A.I. McLeod, A. I.,
(1994) Time Series Modelling of Water Resources
and Environmental Systems, Elsevier,
Amsterdam Kephart, J. O. White, S. R. (1993)
Measuring and Modeling Computer Virus
Prevalence, Proc. of the 1993 IEEE Computer
Society Symposium on Research in Security and
Privacy, 2-15, May. 1993 Leadbetter, M.R.,
Lindgren, G. and Rootzen, H. (1983) Extremes and
Related Properties of Random Sequences and
Processes. Springer. Berlin. Pouget, F., Dacier,
M., Pham V.H. (200) Understanding Threats a
Prerequisite to Enhance Survivability of
Computing Systems Institut Eur_ecom B.P. 193,
06904 Sophia Antipolis, FRANCE Rohloff, K.,
Basar, T., (2005) Stochastic Behaviour of Random
Constant Scanning Worms, in Proc. of IEEE
Conference on Computer Communications and
Networks 2005 (ICCCN 2005), San Diego, CA, Oct.,
2005. Spafford, Eugene (1989) The Internet Worm
Crisis and Aftermath Communications of the ACM
32, 6 pp.678-687 June 1989 Shumway, R. H
Stoffer, D.S, (2000), Time Series Analysis and
its Applications, Springer-Verlag New York Tong
(1990) Non-linear Time Series A Dynamical
Systems Approach, Oxford Univ. Press. Valentino,
Christopher C. (2003) Smarter computer intrusion
detection utilizing decision modelling
Department of Information Systems, The University
of Maryland, Baltimore County, Baltimore, MD,
USA Yegneswaran, V., Barford, P., Ullrich J.
(2003) Internet Intrusions Global
Characteristics and Prevalence, SIGMETRICS
2003. Zou, C. C., Gong, W., Towsley, D. (2003)
Worm propagation modelling and analysis under
dynamic quarantine defense. In ACM WORM 03,
October 2003. Zou, C. C., Gong, W., Towsley, D.,
Gao, L., (2005) The Monitoring and Early
Detection of Internet Worms, IEEE/ACM
Transactions on Networking, 13(5), 961- 974,
October 2005. Zou, C. C., Gong, W., Towsley, D.
(2003) Monitoring and Early Warning for Internet
Worms, Umass ECE Technical Report TR-CSE-03-01,
2003. Zou, C. C., Gong, W., Towsley, D. On the
Performance of Internet Worm Scanning
Strategies, to appear in Journal of Performance
Evaluation.