Project Presentation - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Project Presentation

Description:

1. 9/23/09. Project Presentation. Team Members: Anna Tinnemore. Gabriel Neer. Yow-Ren Chiang. Lin572 Advanced Statistic Methods in NLP. 2. 9/23/09. PART 3 ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 16
Provided by: annatin
Category:

less

Transcript and Presenter's Notes

Title: Project Presentation


1
Project Presentation
Lin572 Advanced Statistic Methods in NLP
  • Team Members
  • Anna Tinnemore
  • Gabriel Neer
  • Yow-Ren Chiang

2
PART 3
  • MaxEnt
  • (yipee!)

3
The Good Stuff
  • Simple feature templates and extraction
  • Elegant data structures for storage and easy
    access
  • Pretty good results!

4
The Bad Stuff
  • Hmmm. . . .

5
Features
  • A few short loops collected the most relevant
    context features
  • No long-winded feature templates
  • Easy-access hashes

6
Decent Results
  • Mid-nineties increasing with the size of the
    training data
  • Result

7
PART 4
  • Task 2
  • Bagging

8
Tie Function
  • use TieFile
  • use Fcntl
  • for my bag_num (1 .. B)
  • The Nth bag from file foo.txt becomes
    foo.txtbagN, etc.
  • my bag_name "file_name-bagbag_num"
  • open (BAG, "gtbag_name")
  • or die "Can't open bag_name for writing !"
  • for (_at_lines)
  • Pick random line of file.
  • my line lines rand _at_lines
  • print BAG "line\n" Output to the bag.

9
Combination
  • VOTING!!

10
Step 1
  • Loop through file and remember words. Keep them
    grouped by sentence.
  • while (ltFILEgt)
  • foreach (_at_word_tags)
  • my _at_wordtag split /\//
  • push (_at_words, (wordtag0))
  • push (_at_sentences, (\_at_words))

11
Step 2
  • Go through file and for each word, increase the
    count of its tag
  • for (_at_ARGV)
  • my tag_index 0
  • while (ltFILEgt)
  • foreach (_at_word_tags)
  • my _at_wordtag split /\//
  • my tag wordtag1
  • tagstag_index-gttag
  • tag_index

12
Step 3
  • Go through the sentences and print out each
    word/tag pair.
  • my tag_index 0
  • foreach my sent (_at_sentences)
  • foreach my word (_at_sent)
  • my tag max_tag(tagstag_index)
    tag_index print "word/tag "
  • print "\n"

13
Finding the Best Tag
  • Find the tag with the highest count.
  • sub max_tag
  • my tag_hash shift
  • (my tag) keys tag_hash
  • my tag_count tag_hash-gttag
  • foreach (keys tag_hash)
  • if (tag_hash-gt_ gt tag_count)
  • tag _
  • tag_count tag_hash-gttag
  • return tag

14
Procedure
  • Creating Bootstrap samples
  • Treating the file as an array for lines.
  • N random array indices are selected and each
    corresponding line is output to a file
  • Combine_tool.pl
  • opens the file corresponding to its first
    argument
  • reads in all words, aggregated by sentence
  • An array of tag hashes is created.
  • For each file in its arg list, opens that file
    and reads the tags sequentially
  • The hash item corresponding to the tag in the
    appropriate index of the tag area is incremented
  • For each index, the hash label with the highest
    count is chosen as the correct tag
  • Re-associate the tags with their words
  • Print out the word/tag pairs

15
Result
Write a Comment
User Comments (0)
About PowerShow.com