Empirical Research Methods in Computer Science - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Empirical Research Methods in Computer Science

Description:

Know how to look for the interesting experiments. Know how to construct experiments. Know how to analyze the results. Be critical of all claims ... – PowerPoint PPT presentation

Number of Views:218
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Empirical Research Methods in Computer Science


1
Empirical Research Methods in Computer Science
  • Lecture 1, Part 1
  • October 12, 2005
  • Noah Smith
  • http//nlp.cs.jhu.edu/nasmith/erm

2
Empiricism
  • empeiros experienced
  • (peira trial or test)

cf. rationalism
3
Exploration Experiment
  • Exploratory Data Analysis (lecture 5)
  • Hypothesis Testing (lectures 1,2)

explore visualize summarize model
experiment confirm yes/no?
4
Computer What?
  • Theory
  • Algorithms, Computation
  • Practice
  • Software Engineering, Application Areas
  • Systems
  • OS, Architecture

5
Who cares?
  • anyone who wants to do research
  • anyone who wants to follow research
  • (i.e., read papers)
  • anyone who wants to be able to make smart
    decisions / draw conclusions
  • anyone who likes thinking critically

6
Basic Research Questions
7
Basic Research Questions
int foo() ...
8
Why bother?
int foo() ...
int foo() ...
int foo() ...
int foo() ...
int foo() ...
int foo() ...
9
Variation ? Statistics
  • determinism isnt good enough any more!

int foo() ...
10
Statistics, in this Course
  • Nonparametric tests
  • Sampling
  • Later
  • Parametric tests (when and why)

11
Warning
  • Theory (complexity analysis, etc.) is important,
    too!
  • Many phenomena arent surprising if you know your
    math.

12
Goals
  • Know how to look for the interesting experiments
  • Know how to construct experiments
  • Know how to analyze the results
  • Be critical of all claims
  • Develop an aesthetic for good empirical work!

13
Empiricism is FUN!
Especially in computer science!
14
Basic Course Information
  • instructors Noah and David
  • n,dasmith_at_cs.jhu.edu
  • Wednesdays 4-515 pm
  • no class Thanksgiving week
  • homeworks (65) final exam (30)

15
About Us
  • Combined 19 years of experience in CS 36 years
    programming
  • Autodidact empiricists
  • Research interests in statistical modeling and
    machine learning (Eisner/Yarowsky lab)
  • NEB 332

16
Plan
  • Hypothesis testing, statistics (2)
  • Case study runtime (2)
  • Exploratory data analysis (1)
  • Parametric testing, modeling (1-2)
  • Statistical analysis of computer programs (1)

17
MO
  • Come to class.
  • Send us feedback anytime.
  • What do you want to know?
  • Bring us papers.

18
Empirical Research Methods in Computer Science
  • Lecture 1, Part 2
  • October 12, 2005
  • David Smith

19
Terminological Prelude
  • Populations
  • Population distributions
  • All possible files. How big?
  • Samples
  • Sampling distributions
  • Files on my system
  • Statistics
  • Functions of data
  • Size of my files
  • Models
  • Parameters

20
And now for some data
21
Abnormality
22
Abnormality
23
The Bootstrap
  • Simulates the sampling distribution
  • Proposed by Efron in 1979
  • Anticipated by permutation tests, jackknife,
    cross-validation
  • From original sample of size n, draw B samples of
    size n with replacement and calculate the
    statistic on each

24
Sampling Distributions
µ
µ
µ
µ
µ
25
Bootstrapping the Mean
26
Whats Going On?
  • Why is bootstrap distribution normal?
  • Remember, this is a mean
  • Linearity of Expectation
  • Central Limit Theorem
  • Closed form standard error for means

27
More Heavy Tails
28
Sampling Still Normal
29
Bivariate Data
30
Compression Performance
31
Bootstrapping Correlation
32
Error, Confidence, Testing
  • Standard error from sampling distribution
  • Confidence intervals bounding error probability
    (e.g. to 5)
  • Hypothesis testing how likely is a particular
    statistic under our assumptions?

33
Hypothesis Testing
  • One-sample
  • Are these data normal/Poisson/?
  • Two-sample
  • Are these two samples from the same
    distribution?
  • Paired-sample
  • Is this technique better than that?

34
Your First Assignment
  • Data compression
  • Three-way tradeoff
  • Compression
  • Speed
  • Loss
  • Degenerate cases (cat, echo , )
  • Unknown distribution of input
Write a Comment
User Comments (0)
About PowerShow.com