Problem Description - PowerPoint PPT Presentation

About This Presentation
Title:

Problem Description

Description:

Problem Description Using Machine Learning to Make Money at Horse Races PosDrawBtnHorseWgtJockeyTrainerAgeSPCommentsRaceid. 1 4 Timocracy 10-0 S Drowne A B ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 6
Provided by: www2CsUh8
Learn more at: https://www2.cs.uh.edu
Category:

less

Transcript and Presenter's Notes

Title: Problem Description


1
Problem DescriptionUsing Machine Learning to
Make Money at Horse Races
  • Pos Draw Btn Horse Wgt Jockey Trainer Age SP
    Comments Raceid
  • 1 4 Timocracy 10-0 S Drowne A B Haynes 5 4/9 f
    led after 1f, ridden 2f out, stayed on well and
    in command final furlong opened 4/5 touched 4/5
    800-1100 400-550 400-650 (x3) 400-750
    (x4) 500-1000 (x4) 300-600 200-400 (x2)
    372966
  • 2 5 1¾ Bussell Along (IRE) 9-3 S Sanders Stef
    Higgins 4 8/1 held up early, headway on outside
    to chase leaders 4f out, effort and hung left
    from 2f out, went 2nd 1f out, no chance with
    winner opened 10/1 touched 10/1 372966
  • Task Given a training set learn a function
    which predicts the winner/selects a horse to bet
    10 on from a given set of entries.
  • Performance Measures
  • Accuracy in picking the winner of a race (simple
    version)
  • Return of placing a 10 bet on a horse in the
    race (advanced version solves the real problem
    trying to make money on the track)

Links http//www.racingpost.com
http//www.drf.com/ http//www.shrp.com/
http//socialmediaseo.net/2010/05/01/kentucky-derb
y-2010/
http//www.equibase.com/premium/eqbPDFChartPlus.cf
m?RACE11BorPPTIDCDCTRYUSADT05/01/2010DAY
DSTYLEEQB
2
Problem Description2
  • This is an individual project
  • In general, the problem is a ranking problem one
    approach is to learn a function that assigns a
    score to the horses in a race and pick the horse
    with the highest score. But it can also be viewed
    as a classification or prediction problem.
  • The datasets will be very basic only containing
    a few attributes, but you are allowed to create
    additional attributes by creating statistics from
    datasets/by extracting information from other
    sources (e.g. percentage of races won by a
    jockey)
  • Basically, the project tries to predict the
    future. Likely we will use races of a single race
    track, given you are true temporal sequence of
    race DS1(races in Jan./Feb.), DS2 (races in
    March/April),DS6(true testset---you are not
    allowed to peak into this one only Chun-sheng
    has access to this dataset) which serve as
    training sets, validation sets, test sets, and
    sources of new feature generation in the project.
  • Student have freedom in what approaches to
    usethere are many of them adhoc approaches are
    welcome likely every student will use a
    different approach, and some will be quite
    complicated while others use simpler appraoches.
  • The goal is to get something running students
    who use a well-tuned simple approach will get a
    better grades than students who use a very
    complicated, sophisticated approach which does
    not run at all.
  • Deliverables You will demo your system, write a
    medium-sized report, and Chun-sheng will test
    your system with a test set of his own.
  • You are allowed to use any software/tool in the
    project you just have to mention what you used
    in your report/
  • In general, the submission deadline is We., March
    23, 11p, but the idea is you spent at most 5
    weeks on the project!

3
Data (Wolverhampton (UK))
  • http//maps.google.com/maps?rlz1T4ADRA_enUS403US4
    03um1ieUTF-8qwolverhamptonracecoursefb1g
    lushqwolverhamptonracecoursecid0,0,404086676
    5308641574eiT0RdTfnEFYP98AaOqoGMCwsaXoilocal
    _resultctimageresnum2ved0CC4QnwIwAQ
  • http//www.racingpost.com/horses2/cards/meeting_po
    pup.sd?crs_id513action_date2011-02-17selected_
    tabCOURSE_MAP
  • http//www.wolverhampton-racecourse.co.uk/
  • http//www.racingpost.com/horses2/cards/meeting_po
    pup.sd?crs_id513action_date2011-02-17selected_
    tabMEETING_INFO
  • There is a chance that we still change the race
    track, but the data sets formats likely will not
    change.
  • Data
  • RaceID ID identify the race
  • Pos. Finish position of the horse
  • Draw The start stall that a horse has been
    allocated
  • Dist. The distance a horse has finished behind
    the winner
  • Horse The name of the horse
  • Wt The weight that the horse carries
  • Jockey The name of the Jockey
  • Trainer The name of the trainer
  • Age The age of the horse
  • SP The official starting price of the horse
    (optional)
  • RaceHeader Metadata of the race
  • RaceDetail Metadata of the race

4
Problem Description3
  • More on alternative approaches
  • Treat it as a regression/prediction problem e.g.
    assign 1/n to the n-th finisher in the race (or
    use the percentage of the prize money allocated
    to the place the horse took in the race over the
    total price money) then learn a prediction
    function f if h1,,hn are the horses entered
    into a race, use a decision making policy that
    uses f(h1),,f(hn), possibly the prior odds
    oh1),,o(hn), e.g bet on horse i which has the
    maximum value for
  • o(hi)/(f(hi)/?nf(hi))
  • Treat the problem as a classification problem, in
    which the classification algorithm picks the
    horse to win/the horse to bet on
  • Treat the problem as a ranking problem (you can
    learn a similar function as in the first approach
    except the objective function minimizes the
    number of rank violations(e.g. http//olivier.chap
    elle.cc/pub/err.pdfdescribes such a function
    e.g. look at http//www.cs.cornell.edu/people/tj/s
    vm_light/svm_rank.html which gives the code for a
    ranking support vector machine, this approach is
    also connected to a Yahoo! Contest Learning to
    Rank http//learningtorankchallenge.yahoo.com/in
    dex.php
  • Honesty Rule Datasets a temporally ordered
    D1ltltDN (lt means prior to) when learning from
    dataset Di, you are only allowed to use knowledge
    from datasets D1,,Di but not from later datasets
    Di1,,Dn
  • When using prior statistics, you will have to
    deal with a cold start problem there will be new
    horses, jockeys, usually, you should initialize
    those using average values or other values but
    not 0.
  • Your approach should focus on horses which win,
    and not on horses that place second and third a
    lot, due to the performance measures used in the
    project (see first slide).

5
Problem Description 4
  • You can use any tools and software packages, such
    as WEKA, regression packages
  • Your system should have 3 (2) modules
  • A Preprocessing module that formats the dataset
    and adds additional information to the dataset
    (optional)
  • A Learning module which takes a dataset and
    creates a model (also reports some training
    statistics)
  • A Testing modules that uses a model, picks
    horses, and creates a detailed report of using
    the model for the races in the test set e.g.
  • Race 337 Bet on Sally and lost 10
  • Race 338 Bet on Enforcer and lost 10
  • Race 339 Bet on Caregiver and won 5
    (odds were ½)
  • Race 340 Bet on Trailer and won 30
    (odds were 3/1)
  • Race 341 Bet on Lateentry and lost
    10
  • Total Won 5 total, 1 per race
  • Comment There should be a way to deliver your
    model for testing (e.g. to Chun-sheng)
  • I suggest you the non-horse picking parts of the
    systems first (e.g. Use choosing the horse with
    the second highest odds as your initial horse
    picking function), and then focus on learning
    good horse picking functions from the project.
  • There is a transfer learning aspect of the
    project e.g. you could use the model for other
    race tracks in the UK and US.
  • We might allow the use of some basic statistics,
    for jockeys and trainers (but not for horses) for
    the Wolverhampton Race Track such as
    http//www.racingpost.com/horses2/cards/meeting_po
    pup.sd?crs_id513action_date2011-02-17selected_
    tabMEETING_INFO
Write a Comment
User Comments (0)
About PowerShow.com