CS 178H Introduction to Computer Science Research - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

CS 178H Introduction to Computer Science Research

Description:

scrubs. Search. 2) Gather information and resources ... scrubs. www.star987.com. www.kroq.com. huntsville hospital. ebay.com. scrubs ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 27
Provided by: Raymond
Category:

less

Transcript and Presenter's Notes

Title: CS 178H Introduction to Computer Science Research


1
CS 178HIntroduction to Computer Science Research
  • What is CS Research?

2
What is CS Research?
  • Discovery of new knowledge of computing through
    mathematical analysis and experimental evaluation
    of algorithms and computer software.

3
Epistemology(definitions from Wikipedia)
  • Epistemology (from Greek ep?st?µ? - episteme,
    "knowledge" ?????, "logos") or theory of
    knowledge is the branch of philosophy concerned
    with the nature and scope (limitations) of
    knowledge. It addresses the questions
  • "What is knowledge?"
  • "How is knowledge acquired?"
  • "What do people know?"
  • "How do we know what we know?"

4
Rationalism
  • Rationalism is "any view appealing to reason as a
    source of knowledge or justification" (Lacey
    286). In more technical terms it is a method or a
    theory "in which the criterion of the truth is
    not sensory but intellectual and deductive"
    (Bourke 263).
  • Originated with Socrates (469 BC399 BC) and
    Plato (428/427 BC 348/347 BC).

5
Empiricism
  • Empiricism is a theory of knowledge which asserts
    that knowledge arises from experience. Empiricism
    emphasizes the role of experience and evidence,
    especially sensory perception, in the formation
    of ideas.
  • Originated with Aristotle (384 BC 322 BC)

6
Rationalism in CS(Theoretical CS)
  • Programs are formal mathematical objects.
  • Therefore, important properties of
    algorithms/software can be proven mathematically.
  • Termination
  • Correctness (satisfies a formal specification)
  • Computational Complexity (time and space
    requirements)

7
Theoretical CS Research
  • Algorithm Design and Analysis
  • Design a new (more efficient) algorithm for some
    well-defined problem (e.g. sorting,
    longest-common-subsequence)
  • Mathematically prove the correctness and improved
    complexity of the new algorithm.
  • Theoretical Analysis
  • Form a mathematical conjecture about a
    computational problem (e.g. graph isomorphism is
    NP-complete)
  • Mathematically prove the conjecture as a theorem.

8
Limits of Rationalism in CS
  • Sometimes software is too complex to analyze
    theoretically.
  • Sometimes correctness cannot be characterized
    formally and depends on natural or human
    behavior.
  • Protein folding
  • Handwriting/speech recognition
  • Sometimes software behavior on real data depends
    on unknown natural properties of this data.
  • Locality affecting paging performance

9
Empiricism in CS(Experimental CS)
  • Behavior of software can be studied
    experimentally.
  • Anecdotal evidence (running a few sample cases)
    is insufficient.
  • Collect data (e.g. accuracy, run-time) on running
    programs many times on large, real-world
    benchmark collections.
  • Verify hypotheses about behavior using controlled
    experiments.
  • Statistically analyze results for significance.

10
Scientific Method(steps from Wikipedia)
  • 1) Define the question
  • 2) Gather information and resources (observe)
  • 3) Form hypothesis
  • 4) Perform experiment and collect data
  • 5) Analyze data
  • 6) Interpret data and draw conclusions that serve
    as a starting point for new hypothesis
  • 7) Publish results
  • 8) Retest (frequently done by other scientists)

11
1) Define the question
  • Example from My Research Search Query
    Disambiguation from Short Sessions
  • Can a web search engine disambiguate queries?

scrubs
Search
?
12
2) Gather information and resources
  • Obtained web search session data from Microsoft
  • Find instances of ambiguous queries
  • Find contextual clues that might help
    disambiguate queries

13
Context can Aid Disambiguation
14
3) Form Hypothesis
  • Previous queries and clicks in a session can help
    disambiguate queries by relating them to previous
    sessions involving the same query (where we know
    what result was clicked).

15
4) Perform Experiment and Collect Data
  • Build system that uses prior context and previous
    session data to predict clicked results for new
    user.
  • Reorder results from existing search engine based
    on predicted probability of clicking on a result.
  • Should reduce number of results user needs to
    examine before finding a relevant one.
  • Test on unseen data and compare predictions to
    actual results clicked.

16
Using Relational Information with aMarkov Logic
Network (MLN)
huntsville school
. . .
scrubs
scrubs.com
. . .
hospitallink.com
scrubs
scrubs-tv.com

ebay.com
17
Controlled Experiment
  • Performance of experimental system must be
    compared to some baseline or control.
  • Controls are necessary to demonstrate the system
    is improving over some naïve method (strawman) or
    current best system for a problem.
  • For example, in the old joke, someone claims that
    they are snapping their fingers "to keep the
    tigers away" and justifies this behavior by
    saying "see - its working!" While this
    "experiment" does not falsify the hypothesis
    "snapping fingers keeps the tigers away", it does
    not really support the hypothesis - not snapping
    your fingers does not keep the tigers away as
    well (Wikipedia Experiment)

18
Control for Query Disambiguation
  • Simple control is to order results from search
    engine randomly.
  • Another baseline is to just use ordering from
    existing (non-personalized) search engine.

19
Performance Metrics
  • Need quantitative measure of systems performance
    (runtime or accuracy).
  • Compare quantitative performance of experimental
    system to baseline control system.
  • To measure accuracy of ordering of web search
    results we measure AUC-ROC
  • Percentage of irrelevant results not seen by user
    before finding a relevant result (if scan results
    from top)

20
5) Analyze Data
  • Do results support the hypothesis?
  • Are differences statistically significant?
  • Use statistical test to determine if observed
    differences are unlikely to be due only to random
    variation, i.e. probability of null hypothesis
    .05.

21
Results (AUC-ROC)
Indicates statistically significant improvement
over previous result



22
6) Interpret data and draw conclusions that serve
as a starting point for new hypothesis
  • Is random ordering the best baseline to compare
    to?
  • What if just order results based on popularity
    (i.e. how many people clicked on a particular
    result after submitting a given ambiguous query).

23
New Baseline Results
24
Refine System
  • Develop MLN that incorporates popularity
    information.
  • Rerun experiment to obtain results for revised
    version and verify the hypothesis that it
    performs better than the popularity baseline.

25
Results for Revised System
26
7) Publish Results
  • Paper submitted to the international data mining
    conference.
  • KDD-09 Paris, June 28 July 1, 2009
Write a Comment
User Comments (0)
About PowerShow.com