Computer Systems Lab TJHSST Current Projects 2004-2005 First Period - PowerPoint PPT Presentation

About This Presentation
Title:

Computer Systems Lab TJHSST Current Projects 2004-2005 First Period

Description:

Thus, for each cell undergoing meiosis, there are 2n possible gametes. ... Anthony Kim. It's also possible that the data are input so that only right nodes are added. ... – PowerPoint PPT presentation

Number of Views:160
Avg rating:3.0/5.0
Slides: 133
Provided by: tjh5
Learn more at: https://www.tjhsst.edu
Category:

less

Transcript and Presenter's Notes

Title: Computer Systems Lab TJHSST Current Projects 2004-2005 First Period


1
Computer Systems LabTJHSSTCurrent Projects
2004-2005First Period
2
Current Projects, 1st Period
  • Caroline Bauer Archival of Articles via RSS and
    Datamining Performed on Stored Articles
  • Susan Ditmore Construction and Application of a
    Pentium II Beowulf Cluster
  • Michael Druker Universal Problem Solving Contest
    Grader

2
3
Current Projects, 1st Period
  • Matt Fifer The Study of Microevolution Using
    Agent-based Modeling
  • Jason Ji Natural Language Processing Using
    Machine Translation in Creation of a
    German-English Translator
  • Anthony Kim A Study of Balanced Search Trees
    Brainforming a New Balanced Search Tree
  • John Livingston Kernel Debugging User-Space API
    Library (KDUAL)

3
4
Current Projects, 1st Period
  • Jack McKay Saber-what? An Analysis of the Use of
    Sabermetric Statistics in Baseball
  • Peden Nichols An Investigation into
    Implementations of DNA Sequence Pattern Matching
    Algorithms
  • Robert Staubs Part-of-Speech Tagging with
    Limited Training Corpora
  • Alex Volkovitsky Benchmarking of Cryptographic
    Algorithms

4
5
Archival of Articles via RSS and Datamining
Performed on Stored ArticlesRSS (Really Simple
Syndication, encompassing Rich Site Summary and
RDF Site Summary) is a web syndication protocol
used by many blogs and news websites to
distribute information it saves people having to
visit several sites repeatedly to check for new
content. At this point in time there are many RSS
newsfeed aggregators available to the public, but
none of them perform any sort of archival of
information beyond the RSS metadata. The purpose
of this project is to create an RSS aggregator
that will archive the text of the actual articles
linked to in the RSS feeds in some kind of
linkable, searchable database, and, if all goes
well, implement some sort of datamining
capability as well.
5
6
Archival of Articles via RSS, and Datamining
Performed on Stored ArticlesCaroline Bauer
  • Abstract
  • RSS (Really Simple Syndication, encompassing
    Rich Site Summary and RDF Site Summary) is a web
    syndication protocol used by many blogs and news
    websites to distribute information it saves
    people having to visit several sites repeatedly
    to check for new content. At this point in time
    there are many RSS newsfeed aggregators available
    to the public, but none of them perform any sort
    of archival of information beyond the RSS
    metadata. As the articles linked may move or be
    eliminated at some time in the future, if one
    wants to be sure one can access them in the
    future one has to archive them oneself
    furthermore, should one want to link such
    collected articles, it is far easier to do if one
    has them archived. The purpose of this pro ject
    is to create an RSS aggregator that will archive
    the text of the actual articles linked to in the
    RSS feeds in some kind of linkable, searchable
    database, and, if all goes well, implement some
    sort of datamining capability as well.

7
Archival of Articles via RSS, and Datamining
Performed on Stored ArticlesCaroline Bauer
  • Introduction
  • This paper is intended to be a detailed summary
    of all of the author's findings regarding the
    archival of articles in a linkable, searchable
    database via RSS.
  • Background RSS
  • RSS stands for Really Simple Syndication, a
    syndication protocol often used by weblogs and
    news sites. Technically, RSS is an xml-based
    communication standard that encompasses Rich Site
    Summary (RSS 0.9x and RSS 2.0) and RDF Site
    Summary (RSS 0.9 and 1.0). It enables people to
    gather new information by using an RSS aggregator
    (or "feed reader") to poll RSS-enabled sites for
    new information, so the user does not have to
    manually check each site. RSS aggregators are
    often extensions of browsers or email programs,
    or standalone programs alternately, they can be
    web-based, so the user can view their "feeds"
    from any computer with Web access.
  • Archival Options Available in Existing RSS
    Aggregators Data Mining
  • Data mining is the searching out of information
    based on patterns present in large amounts of
    data. //more will be here.

8
Archival of Articles via RSS, and Datamining
Performed on Stored ArticlesCaroline Bauer
  • Purpose
  • The purpose of this project is to create an RSS
    aggregator that, in addition to serving as a feed
    reader, obtains the text of the documents linked
    in the RSS feeds and places it into a database
    that is both searchable and linkable. In addition
    to this, the database is intended to reach an
    implementation wherein it performs some manner of
    data mining on the information contained therein
    the specifics on this have yet to be determined.
  • Development Results Conclusions Summary
    References
  • 1. "RSS (protocol)." Wikipedia. 8 Jan. 2005. 11
    Jan. 2005 lthttp//en. wikipedia.org/wiki/RSS_28pr
    otocol29gt. 2. "Data mining." Wikipedia. 7 Jan.
    2005. 12 Jan. 2005. lthttp//en.
    wikipedia.org/wiki/Data_mininggt.

9
Construction and Application of a Pentium II
Beowulf ClusterI plan to construct a super
computing cluster of about 15-20 or more Pentium
II computers with the OpenMosix kernel patch.
Once constructed, the cluster could be configured
to transparently aid workstations with
computationally expensive jobs run in the lab.
This project would not only increase the
computing power of the lab, but it would also be
an experiment in building a lowlevel, lowcost
cluster with a stripped down version of Linux,
useful to any facility with old computers they
would otherwise deem outdated.
9
10
Construction and Application of a Pentium II
Beowulf ClusterSusan Ditmore
  • Text version needed
  • (your pdf file won't copy to text)

11
Universal Problem Solving Contest GraderMichael
Druker(poster needed)
11
12
Universal Problem Solving Contest GraderMichael
Druker
  • Steps so far
  • Creation of directory structure for the grader,
    the contests, the users, the users' submissions,
    the test cases.
  • -Starting of main grading script itself.
  • Refinement of directory structure for the grader.
  • -Reading of material on bash scripting language
    to be able to write the various scripts that will
    be necessary.

13
Universal Problem Solving Contest GraderMichael
Druker
  • Current program
  • !/bin/bash
  • CONDIR"/afs/csl.tjhsst.edu/user/mdruker/techlab/c
    ode/new/"
  • syntax is "grade contest user program"
  • contest1
  • user2
  • program3
  • echo "contest name is " 1
  • echo "user's name is " 2
  • echo "program name is " 3

14
Universal Problem Solving Contest GraderMichael
Druker
  • Current program continued
  • get the location of the program and the test
    data
  • make sure that the contest, user, program are
    valid
  • PROGDIRCONDIR"contests/"contest"/users/"u
    ser
  • echo "user's directory is" PROGDIR
  • if -d PROGDIR
  • then echo "good input"
  • else echo "bad input, directory doesn't exist"
  • exit 1 fi
  • exit 0

15
Study of Microevolution Using Agent-Based
Modeling in CThe goal of the project is to
create a program that uses an agent-environment
structure to imitate a very simple natural
ecosystem one that includes a single type of
species that can move, reproduce, kill, etc. The
"organisms" will contain genomes (libraries of
genetic data) that can be passed from parents to
offspring in a way similar to that of animal
reproduction in nature. As the agents interact
with each other, the ones with the
characteristics most favorable to survival in the
artificial ecosystem will produce more children,
and over time, the mean characteristics of the
system should start to gravitate towards the
traits that would be most beneficial. This
process, the optimization of physical traits of a
single species through passing on heritable
advantageous genes, is known as microevolution.
15
16
THE STUDY OF MICROEVOLUTION USING AGENTBASED
MODELINGMatt Fifer
  • Abstract
  • The goal of the project is to create a program
    that uses an agent-environment structure to
    imitate a very simple natural ecosystem one that
    includes a single type of species that can move,
    reproduce, kill, etc. The "organisms" will
    contain genomes (libraries of genetic data) that
    can be passed from parents to offspring in a way
    similar to that of animal reproduction in nature.
    As the agents interact with each other, the ones
    with the characteristics most favorable to
    survival in the artificial ecosystem will produce
    more children, and over time, the mean
    characteristics of the system should start to
    gravitate towards the traits that would be most
    beneficial. This process, the optimization of
    physical traits of a single species through
    passing on heritable advantageous genes, is known
    as microevolution.

17
THE STUDY OF MICROEVOLUTION USING AGENTBASED
MODELINGMatt Fifer
  • Purpose
  • One of the most controversial topics in science
    today is the debate of creationism vs. Darwinism.
    Advocates for creationism believe that the world
    was created according to the description detailed
    in the 1st chapter of the book of Genesis in the
    Bible. The Earth is approximately 6,000 years
    old, and it was created by God, followed by the
    creation of animals and finally the creation of
    humans, Adam and Eve. Darwin and his followers
    believe that from the moment the universe was
    created, all the objects in that universe have
    been in competition. Everything - from the
    organisms that make up the global population, to
    the cells that make up those organisms, to the
    molecules that make up those cells has beaten all
    of its competitors in the struggle for resources
    commonly known as life.

18
THE STUDY OF MICROEVOLUTION USING AGENTBASED
MODELINGMatt Fifer
  • This project will attempt to model the day-today
    war between organisms of the same species.
    Organisms, or agents, that can move, kill, and
    reproduce will be created and placed in an
    ecosystem. Each agent will include a genome that
    codes for its various characteristics. Organisms
    that are more successful at surviving or more
    successful at reproducing will pass their genes
    to their children, making future generations
    better suited to the environment. The competition
    will continue, generation after generation, until
    the simulation terminates. If evolution has
    occurred, the characteristics of the population
    at the end of the simulation should be markedly
    different than at the beginning.

19
THE STUDY OF MICROEVOLUTION USING AGENTBASED
MODELINGMatt Fifer
  • Background
  • Two of the main goals of this project are the
    study of microevolution and the effects of
    biological mechanisms on this process. Meiosis,
    the formation of gametes, controls how genes are
    passed from parents to their offspring. In the
    first stage of meiosis, prophase I, the strands
    of DNA floating around the nucleus of the cell
    are wrapped around histone proteins to form
    chromosomes. Chromosomes are easier to work with
    than the strands of chromatin, as they are
    packaged tightly into an "X" structure (two "gt"s
    connected at the centromere). In the second
    phase, metaphase I, chromosomes pair up along the
    equator of the cell, with homologous chromosomes
    being directly across from each other.
    (Homologous chromosomes code for the same traits,
    but come from different parents, and thus code
    for different versions of the same trait.) The
    pairs of chromosomes, called tetrads, are
    connected and exchange genetic material.

20
THE STUDY OF MICROEVOLUTION USING AGENTBASED
MODELINGMatt Fifer
  • This process, called crossing over, results in
    both of the chromosomes being a combination of
    genes from the mother and the father. Whole genes
    swap places, not individual nucleotides. In the
    third phase, anaphase I, fibers from within the
    cell pull the pair apart. When the pairs are
    pulled apart, the two chromosomes are put on
    either side of the cell. Each pair is split
    randomly, so for each pair, there are two
    possible outcomes. For instance, the paternal
    chromosome can either move to the left or right
    side of the cell, with the maternal chromosome
    moving to the opposite end. In telophase I, the
    two sides of the cell split into two individual
    cells. Thus, for each cell undergoing meiosis,
    there are 2n possible gametes. With crossing
    over, there are almost an infinite number of
    combinations of genes in the gametes. This large
    number of combinations is the reason for the
    genetic biodiversity that exists in the world
    today, even among species. For example, there are
    6 billion humans on the planet, and none of them
    is exactly the same as another one.

21
THE STUDY OF MICROEVOLUTION USING AGENTBASED
MODELINGMatt Fifer
  • Procedure
  • This project will be implemented with a matrix
    of agents. The matrix, initialized with only
    empty spaces, will be seeded with organisms by an
    Ecosystem class. Each agent in the matrix will
    have a genome, which will determine how it
    interacts with the Ecosystem. During every step
    of the simulation, an organism will have a choice
    whether to 1. do nothing 2. move to an empty
    adjacent space 3. kill an organism in a
    surrounding space, or 4. reproduce with an
    organism in an adjacent space. The likelihood of
    the organism performing any of these tasks is
    determined by the organism's personal variables,
    which will be coded for by the organism's genome.
    While the simulation is running, the average
    characteristics of the population will be
    measured. In theory, the mean value of each of
    the traits (speed, agility, strength, etc.)
    should either increase with time or gravitate
    towards a particular, optimum value.

22
THE STUDY OF MICROEVOLUTION USING AGENTBASED
MODELINGMatt Fifer
  • At its most basic level, the program written to
    model microevolution is an agentenvironment
    program. The agents, or members of the Organism
    class, contain a genome and have abilities that
    are dependent upon the genome. Here is the
    declaration of the Organism class
  • class Organism
  • public Organism()
  • //constructors Organism(int ident, int row2, int
    col2)
  • Organism(Nucleotide mDNA, Nucleotide dDNA, int
    ident,
  • bool malefemale, int row2, int col2)
  • Organism() //destructor void printGenome()
  • void meiosis(Nucleotide gamete)
  • Organism reproduce(Organism mate, int ident,
    int r, int c)
  • int Interact(Organism neighbors, int nlen) ...

23
THE STUDY OF MICROEVOLUTION USING AGENTBASED
MODELINGMatt Fifer
  • //assigns a gene a numeric value int Laziness()
  • //accessor functions int Rage() int SexDrive()
    int Activity() int DeathRate() int
    ClausIndex() int Age() int Speed() int Row()
    int Col() int PIN() bool Interacted() bool
    Gender() void setPos(int row2, int col2) void
    setInteracted(bool interacted) private void
    randSpawn(Nucleotide DNA, int size) //randomly
    generates a genome Nucleotide mom, dad
    //genome int ID, row, col, laziness, rage,
    sexdrive, activity, deathrate, clausindex, speed
    //personal characteristics double age bool male,
    doneStuff ...

24
THE STUDY OF MICROEVOLUTION USING AGENTBASED
MODELINGMatt Fifer
  • The agents are managed by the environment class,
    known as Ecosystem. The Ecosystem contains a
    matrix of Organisms.
  • Here is the declaration of the Ecosystem class
  • class Ecosystem public Ecosystem()
    //constructors Ecosystem(double oseed)
    Ecosystem() //destructor void Run(int steps)
    //the simulation void printMap() void print(int
    r, int c) void surrSpaces(Organism neighbors,
    int r, int c, int friends) //the neighbors of
    any cell private Organism Population //the
    matrix of Organisms

25
THE STUDY OF MICROEVOLUTION USING AGENTBASED
MODELINGMatt Fifer
  • The simulation runs for a predetermined number of
    steps within the Ecosystem class. During every
    step of the simulation, the environment class
    cycles through the matrix of agents, telling each
    one to interact with its neighbors. To aid in the
    interaction, the environment sends the agent an
    array of the neighbors that it can affect. Once
    the agent has changed (or not changed) the array
    of neighbors, it sends the array back to the
    environment which then updates the matrix of
    agents. Here is the code for the Organisms
    function which enables it to interact with its
    neighbors

26
THE STUDY OF MICROEVOLUTION USING AGENTBASED
MODELINGMatt Fifer
  • int OrganismInteract(Organism neighbors, int
    nlen) //returns 0 if the organism hasn't moved
    1 if it has fout ltlt row ltlt " " ltlt col ltlt " "
    if(!ID)//This Organism is not an organism fout
    ltlt "Not an organism, cannot interact!" ltlt endl
    return 0 if(doneStuff)//This Organism has
    already interacted once this step fout ltlt "This
    organism has already interacted!" ltlt endl return
    0 doneStuff true int loop for(loop 0
    loop lt GENES CHROMOSOMES GENE_LENGTH loop)
    if(rand() RATE_MAX lt MUTATION_RATE) momloop
    (Nucleotide)(rand() 4) if(rand() RATE_MAX
    lt MUTATION_RATE)

27
THE STUDY OF MICROEVOLUTION USING AGENTBASED
MODELINGMatt Fifer
  • The Organisms, during any simulation step, can
    either move, kill a neighbor, remain idle,
    reproduce, or die. The fourth option,
    reproduction, is the most relevant to the
    project. As explained before, organisms that are
    better at reproducing or surviving will pass
    their genes to future generations. The most
    critical function in reproduction is the meiosis
    function, which determines what traits are passed
    down to offspring. The process is completely
    random, but an organism with a "good" gene has
    about a 50 chance of passing that gene on to its
    child. Here is the meiosis function, which
    determines what genes each organism sends to its
    offspring
  • void Organismmeiosis(Nucleotide gamete) int
    x, genect, chromct, crossover Nucleotide
    chromo new NucleotideGENES GENE_LENGTH,
    chromo2 new NucleotideGENES GENE_LENGTH
    Nucleotide gene new NucleotideGENE_LENGTH,
    gene2 new NucleotideGENE_LENGTH ... (more
    code)

28
THE STUDY OF MICROEVOLUTION USING AGENTBASED
MODELINGMatt Fifer
  • The functions and structures above are the most
    essential to the running of the program and the
    actual study of microevolution. At the end of
    each simulation step, the environment class
    records the statistics for the agents in the
    matrix and puts the numbers into a spreadsheet
    for analysis. The spreadsheet can be used to
    observe trends in the mean characteristics of the
    system over time. Using the spreadsheet created
    by the environment class, I was able to create
    charts that would help me analyze the evolution
    of the Organisms over the course of the
    simulation.

29
THE STUDY OF MICROEVOLUTION USING AGENTBASED
MODELINGMatt Fifer
  • The first time I ran the simulation, I set the
    program so that there was no mutation in the
    agent's genomes. Genes were strictly created at
    the outset of the program, and those genes were
    passed down to future generations. If
    microevolution were to take place, a gene that
    coded for a beneficial characteristic would have
    a higher chance of being passed down to a later
    generation. Without mutation, however, if one
    organism possessed a characteristic that was far
    superior to the comparable characteristics of
    other organisms, that gene should theoretically
    allow that organism to "dominate" the other
    organisms and pass its genetic material to many
    children, in effect exterminating the genes that
    code for less beneficial characteristics.

30
THE STUDY OF MICROEVOLUTION USING AGENTBASED
MODELINGMatt Fifer
  • For example, if an organism was created that had
    a 95 chance of reproducing in a given simulation
    step, it would quickly pass its genetic material
    to a lot of offspring, until its gene was the
    only one left coding for reproductive tendency,
    or libido.

31
THE STUDY OF MICROEVOLUTION USING AGENTBASED
MODELINGMatt Fifer
  • As you can see from Figure 1, the average
    tendency to reproduce increases during the
    simulation. The tendency to die decreases to
    almost nonexistence. The tendency to remain
    still, since it has relatively no effect on
    anything, stays almost constant. The tendency to
    move to adjacent spaces, thereby spreading one's
    genes throughout the ecosystem, increases to be
    almost as likely as reproduction. The tendency to
    kill one's neighbor decreases drastically,
    probably because it does not positively benefit
    the murdering organism. In Figure 2, we can see
    that the population seems to stabilize at about
    the same time as the average characteristics.
    This would suggest that there was a large amount
    of competition among the organisms early in the
    simulation, but the competition quieted down as
    one dominant set of genes took over the
    ecosystem.

32
THE STUDY OF MICROEVOLUTION USING AGENTBASED
MODELINGMatt Fifer
  • Figure 4 These figures show the results from the
    second run of the program, when mutation was
    turned on. As you can see, many of the same
    trends exist, with reproductive tendency
    skyrocketing and tendency to kill plummeting.
    Upon reevaluation, it seems that perhaps the
    tendencies to move and remain idle do not really
    affect an agent's ability survive, and thus their
    trends are more subject to fluctuations that
    occur in the beginning of the simulation. One
    thing to note about the mutation simulation is
    the larger degree of fluctuation in both
    characteristics and population. The population
    stabilizes at about the same number, but swings
    between simulation steps are more pronounced. In
    Figure 3, the stabilization that had occurred in
    Figure 1 is largely not present.

33
THE STUDY OF MICROEVOLUTION USING AGENTBASED
MODELINGMatt Fifer
  • Conclusion
  • The goal of this project at the outset was to
    create a system that modeled trends and processes
    from the natural world, using the same mechanisms
    that occur in that natural world. While this
    project by no means definitively proves the
    correctness of Darwin's theory of evolution over
    the creationist theory, it demonstrates some of
    the basic principles that Darwin addressed in his
    book, The Origin of Species. Darwin addresses two
    distinct processes--natural selection and
    artificial selection. Artificial selection, or
    selective breeding, was not present in this
    project at all. There was no point in the program
    where the user was allowed to pick organisms that
    survived. Natural selection, though it is a
    stretch because nature was the inside of a
    computer, was simulated. Natural selection,
    described as the "survival of the fittest," is
    when an organism's characteristics enable it to
    survive and pass those traits to its offspring.

34
THE STUDY OF MICROEVOLUTION USING AGENTBASED
MODELINGMatt Fifer
  • In this program, "nature" was allowed to run its
    course, and at the end of the simulation, the
    organisms with the best combination of
    characteristics had triumphed over their
    predecessors. "Natural" selection occurred as
    predicted.
  • All of the information in this report was either
    taught last year in A.P. Biology last year and,
    to a small degree, Charles Darwin's The Origin of
    Species. I created all of the code and all of the
    charts in this paper. For my next draft, I will
    be sure to include more outside information that
    I have found in the course of my research

35
Using Machine Translation in a German English
TranslatorThis project attempts to take the
beginning steps towards the goal of creating a
translator program that operates within the scope
of translating between English and German.
35
36
Natural Language Processing Using Machine
Translation in Creation of a German-English
TranslatorJason Ji
  • Abstract
  • The field of machine translation - using
    computers to provide translations between human
    languages - has been around for decades. And the
    dream of an ideal machine providing a perfect
    translation between languages has been around
    still longer. This pro ject attempts to take the
    beginning steps towards that goal, creating a
    translator program that operates within an
    extremely limited scope to translate between
    English and German. There are several different
    strategies to machine translation, and this pro
    ject will look into them - but the strategy taken
    to this pro ject will be the researcher's own,
    with the general guideline of "thinking as a
    human."

37
Natural Language Processing Using Machine
Translation in Creation of a German-English
TranslatorJason Ji
  • For if humans can translate between language,
    there must be something to how we do it, and
    hopefully that something - that thought process,
    hopefully - can be transferred to the machine and
    provide quality translations.
  • Background
  • There are several methods of varying difficulty
    and success to machine translation. The best
    method to use depends on what sort of system is
    being created. A bilingual system translates
    between one pair of languages a multilingual
    system translates between more than two systems.

38
Natural Language Processing Using Machine
Translation in Creation of a German-English
TranslatorJason Ji
  • The easiest translation method to code, yet
    probably least successful, is known as the direct
    approach. The direct approach does what it sounds
    like it does - takes the input language (known as
    the "source language"), performs 2 morphological
    analysis - whereby words are broken down and
    analyzed for things such as prefixes and past
    tense endings, performs a bilingual dictionary
    look-up to determine the words' meanings in the
    target language, performs a local reordering to
    fit the grammar structure of the target language,
    and produces the target language output. The
    problem with this approach is that it is
    essentially a word-for-word translation with some
    reordering, resulting often in mistranslations
    and incorrect grammar structures.

39
Natural Language Processing Using Machine
Translation in Creation of a German-English
TranslatorJason Ji
  • Furthermore, when creating a multilingual system,
    the direct approach would require several
    different translation algorithms - one or two for
    each language pair. The indirect approach
    involves some sort of intermediate representation
    of the source language before translating into
    the target language. In this way, linguistic
    analysis of the source language can be performed
    on the intermediate representation. Translating
    to the intermediary also enables semantic
    analysis, as the source language input can be
    more carefully to detect idioms, etc, which can
    be stored in the intermediary and then
    appropriately used to translate into the target
    language.

40
Natural Language Processing Using Machine
Translation in Creation of a German-English
TranslatorJason Ji
  • The transfer method is similar, except that the
    transfer is language dependent - that is to say,
    the French-English intermediary transfer would be
    different from the EnglishGerman transfer. An
    interlingua intermediary can be used for
    multilingual systems.
  • Theory
  • Humans fluent in two or more languages are at the
    moment better translators than the best machine
    translators in the world. Indeed, a person with
    three years of experience in learning a second
    language will already be a better translator than
    the best machine translators in the world as
    well.

41
Natural Language Processing Using Machine
Translation in Creation of a German-English
TranslatorJason Ji
  • Yet for humans and machines alike, translation is
    a process, a series of steps that must be
    followed in order to produce a successful
    translation. It is interesting to note, however,
    that the various methods of translation for
    machines - the various processes - become less
    and less like the process for humans as they
    become more complicated. Furthermore, it was
    interesting to notice that as the method of
    machine translation becomes more complicated, the
    results are sometimes less accurate than the
    results of simpler methods that better model the
    human rationale for translation.

42
Natural Language Processing Using Machine
Translation in Creation of a German-English
TranslatorJason Ji
  • Therefore, the theory is, an algorithm that
    attempts to model the human translation process
    would be more successful than other, more
    complicated methods currently in development
    today. This theory is not entirely plausible for
    full-scale translators because of the sheer
    magnitude of data that would be required. Humans
    are better translators than computers in part
    because they have the ability to perform semantic
    analysis, because they have the necessary
    semantic information to be able to, for example,
    determine the difference in a word's definition
    based on its usage in context. Creating a
    translator with a limited-scope of vocabulary
    would require less data, leaving more room for
    semantic information to be stored along with
    definitions.

43
Natural Language Processing Using Machine
Translation in Creation of a German-English
TranslatorJason Ji
  • A limited-scope translator may seem unuseful at
    first glance, but even humans fluent in any
    language, including their native language, don't
    know the entire vocabulary of the language. A
    language has hundreds of thousands of words, and
    no human knows even half of them all. A computer
    with a vocabulary of commonly used words that
    most people know, along with information to avoid
    semantic problems, would therefore be still
    useful for nonprofessional work.
  • Development
  • On the most superficial level, a translator is
    more user-friendly for an average person if it is
    GUI-based, rather than simply text-based. This
    part of the development is finished. The program
    presents a GUI for the user.

44
Natural Language Processing Using Machine
Translation in Creation of a German-English
TranslatorJason Ji
  • A JFrame opens up with two text areas and a
    translate button. The text areas are labeled
    "English" and "German". The input text is typed
    into the English window, the "Translate" button
    is clicked, and the translator, once finished,
    outputs the translated text into the German text
    area. Although typing into the German text area
    is possible, the text in the German text area
    does not affect the translator process. The first
    problem to deal with in creating a machine
    translator is to be able to recognize the words
    that are inputted into the system. A sentence or
    multiple sentences are input into the translator,
    and a string consisting of that entire sentence
    (or sentences) is passed to the translate()
    function.

45
Natural Language Processing Using Machine
Translation in Creation of a German-English
TranslatorJason Ji
  • The system loops through the string, finding all
    space (' ') characters and punctuation characters
    (comma, period, etc) and records their positions.
    (It is important to note the position of each
    punctuation mark, as well as what kind of a
    punctuation mark it is, because the existence and
    position of punctuation marks alter the meaning
    of a sentence.)
  • The number of words in the sentence is determined
    to be the number of spaces plus one. By recording
    the position of each space, the string can then
    be broken up into the words. The start position
    of each word is the position of each space, plus
    one, and the end position is the position of the
    next space. This means that punctuation at the
    end of any given word is placed into the String
    with that word, but this is not

46
Natural Language Processing Using Machine
Translation in Creation of a German-English
TranslatorJason Ji
  • a problem the location of each punctuation mark
    is already recorded, and the dictionary look-up
    of each word will first check to ensure that the
    last character of each word is a letter if not,
    it will simply disregard the last character. The
    next problem is the biggest problem of all, the
    problem of actual translation itself. Here there
    is no code yet written, but development of
    pseudocode has begun already. As previously
    mentioned, translation is a process. In order to
    write a translator program that follows the human
    translation process, the human process must first
    be recognized and broken down into programmable
    steps. This is no easy task. Humans with five
    years of experience

47
Natural Language Processing Using Machine
Translation in Creation of a German-English
TranslatorJason Ji
  • in learning a language may already translate any
    given text quickly enough, save time to look up
    unfamiliar words, that the process goes by too
    quickly to fully take note of. The basic process
    is not entirely determined yet, but there is some
    progress on it. The process to determine the
    process has been as followed given a random
    sentence to translate, the sentence is first
    translated by a human, then the process is noted.
    Each sentence given has ever-increasing
    difficulty to translate.

48
Natural Language Processing Using Machine
Translation in Creation of a German-English
TranslatorJason Ji
  • For example the sentence, "I ate an apple," is
    translated via the following process 1) Find the
    sub ject and the verb. (I ate) 2) Determine the
    tense and form of the verb. (ate past,
    imperfekt form) a) Translate sub ject and verb.
    (Ich ass) (note - "ass" is a real German verb
    form.) 3) Determine what the verb requires. (ate
    - eat requires a direct ob ject) 4) Find what
    the verb requires in the sentence. (direct ob
    ject comes after verb and article apple) 5)
    Translate the article and the direct ob ject.
    (ein Apfel) 6) Consider the gender of the direct
    ob ject, change article if necessary. (der Apfel
    ein - einen) Ich ass einen Apfel.

49
Natural Language Processing Using Machine
Translation in Creation of a German-English
TranslatorJason Ji
  • References
  • (I'll put these in proper bibliomumbo
    jumbographical order later!)
  • 1. http//dict.leo.org (dictionary) 2. "An
    Introduction To Machine Translation" (available
    online at http//ourworld.compuserve.com/homepages
    /WJHutchins/IntroMT-TOC.htm) 3.
    http//www.comp.leeds.ac.uk/ugadmit/cogsci/spchlan
    /machtran.htm (some info on machine translation)
    4.

50
A Study of Balanced Search TreesThis project
investigates four different balanced search trees
for their advantages anddisadvantages, thus
ultimately their efficiency. Runtime and memory
space management are two main aspects under the
study. Statistical analysis is provided to
distinguish subtledifference if there is any. A
new balanced search tree is suggested and
compared with the four balanced search trees.
50
51
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • Abstract
  • This project investigates four different balanced
    search trees for their advantages and
    disadvantages, thus ultimately their efficiency.
    Run time and memory space management are two main
    aspects under the study. Statistical analysis is
    provided to distinguish subtle differences if
    there is any. A new balanced search tree is
    suggested and compared with the four balanced
    search trees under study. Balanced search trees
    are implemented in C extensively using pointers
    and structs.

52
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • Introduction
  • Balanced search trees are important data
    structures. A normal binary search tree has some
    disadvantages, specifically from its dependence
    on the incoming data, that significantly affects
    its tree structure hence its performance. Height
    of search tree is the maximum distance from the
    root of the tree to a leaf. An optimal search
    tree is one that tries to minimize its height
    given some number of data. To improve its height
    thus its efficiency, balanced search trees have
    been developed that self-balance themselves into
    optimal tree structures that allows quicker
    access to data stored in the trees, For example
    red-black treee is a balanced binary tree that
    balances according to color pattern of nodes (red
    or black) by rotation functions.

53
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • Rotation function is a hall mark of nearly all
    balanced search tree they rotate or adjust
    subtree heights from a pivot node. Many balanced
    trees have been suggested and developed
    red-black tree, AVL tree, weight-balanced tree, B
    tree, and more.

54
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • Background Information
  • Search Tree Basics
  • This pro ject requires a good understanding of
    binary trees and general serach tree basics. A
    binary tree has nodes and edges. Nodes are the
    elements in the tree and edges represent
    relationship between two nodes. Each node in a
    binary tree is connected by edgesto zero to two
    nodes. In general search tree, each node can have
    more than 2 nodes as in the case of B-tree. The
    node is called a parent and nodes connected by
    edges from this parent node are called its
    children. A node with no child is called a leaf
    node. Easy visualization of binary tree is a real
    tree put upside down on a paper with roots on the
    top and branches on the bottom.

55
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • The grandparent of a binary tree is called root.
    From the root, the tree branches out to its
    immediate children and subsequent descendents.
    Each node's children are designated by left child
    and right child. One property of binary search
    tree is that the value stored in the left child
    is less than or equal to the value stored in
    parent. The right child's value is, on the other
    hand, greater than the parent's. (Lef t lt
    Parent, P arent lt Right)

56
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • 3.2 Search Tree Functions
  • There are several main functions that go along
    with binary tree and general search trees
    insertion, deletion, search, and traversal. In
    insertion, a data is entered into the search
    tree, it is compared with the root. If the value
    is less than or equal to the root's then the
    insertion functino proceeds to the left child of
    the root and compares again. Otherwise the
    function proceeds to the right child and compares
    the value with the node's. When the function
    reaches the end of the tree, for example if the
    last node the value was compared with was a leaf
    node, a new node is created at that position with
    the new inserted value. Deletion function works
    similarly to find a node with the value of
    interest (by going left and right accordingly).

57
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • Then the funciton deletes the node and fixes the
    tree (changing parent children relationship etc.)
    to keep the property of binary tree or that of
    general search tree. Search function or basically
    data retrieval is also similar. After traversing
    down the tree (starting from the rot), two cases
    are possible. If there is a value in interest is
    encountered on the traversal, then the functino
    replys that there is such data in the tree. If
    the traversal ends at a leaf node with no
    encounter of the value in search, then the
    function simply returns the otherwise. There are
    three kinds of travesal functions to show the
    structure of a tree preorder, inorder and
    postorder.

58
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • They are recursive functions that print the data
    in special order. For example in preorder
    traversal, as the prefix pre suggests, first the
    value of node is printed then the recursive
    repeats to the left subtree and then to the right
    subtree. Similary, in inorder traversal, as the
    prefix in suggests, first the left subtree is
    output, then the node's value, then the right
    subtree. (Thus the node's value is output in the
    middle of the function.) Same pattern applies to
    the postorder transversal.

59
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • 3.3 The Problem
  • It is not hard to see that the structure of a
    binary search tree (or general search tree) that
    the order of data input is important. In a
    optimal binary tree, the data are input so that
    insertion occurs just right which makes the tree
    balanced, the size of left subtree is
    approximately equal to the size of right subtree
    at each node in the tree. In an optimal binary
    tree, the insertion, deletion, and search
    function occur in O(log N ) with N as the number
    of data in the tree. This follows from that
    whenever data comparison occurs and subsequent
    traversal (to the left or to the right) the
    number of possible subset divides in half at each
    turn. However that's only when the input is
    nicely ordered and the search tree is balanced.

60
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • It's also possible that the data are input so
    that only right nodes are added. (Root- gt right-
    gt right- gt right...)It's obvious that the search
    tree now looks like just a linear array. And it
    is. And this give O(N ) to do insertion, deletion
    and search operation. This is not efficient. Thus
    search trees are developed to perform its
    functions efficiently regardless of data input.
  • 4 Balanced Search Trees
  • Four ma jor balanced search trees are
    investigated. Three of them, namely red-black
    tree, height-balanced tree, and weight-balanced
    tree are binary search trees. The fourth, B-tree,
    is multiple children (gt 2) search tree.

61
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • 4.1 Red-black tree
  • Red-black search tree is a special binary with a
    color scheme each node is either black or red.
    There are four properties that makes a binary
    tree a red-black tree. (1) The root of the tree
    is colored black. (2) All paths frmo the root to
    the leaves agree on the number of black nodes.
    (3) No path from the root to a leaf may contain
    two consecutive nodes colored red. (4) Every path
    from a node to a leaf (of the descendents) has
    the same number of black nodes. The performance
    of balanced search is directly related to the
    height of the balanced tree. For a binary, lg
    (number of nodes) is usually the optimal height.
    In the case of Red-black tree with n nodes, it
    has height at most 2lg (n 1). The proof is
    noteworthy, but difficult to understand.

62
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • In order the prove the assertion that Red-black
    tree's height is at most 2lg (n 1) we should
    first define bh(x). bh(x) is defined to be the
    number of black nodes on any path from, but not
    including a node x, to a leaf.Notice that black
    height (bh) is well defined under the property 2
    of Red-black tree. It is easy to see that black
    height of a tree is the black height of its root.
    First we shall prove that the subtree rooted at
    any given node x contains at least ( 2 bh(x)) - 1
    nodes. We can prove this by induction on the
    height of a node x The base case is bh(x) 0,
    which suggests that x must be a leaf (NIL). This
    is true then it follows that subtree rooted at x
    contains 20 - 1 0. The following is the
    inductive step. Let say node x has positive
    height and has two children.

63
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • Note that each child has a black-height of either
    bx(x), if it is a red node, or bh(x)-1, if it is
    a black node. It follows that the subtree rooted
    at xcontains at least 2(2( bh(x)) - 1 - 1) 1
    2( bh(x)) - 1. The first term refers to the
    minimum bounded by the sum of black height left
    and right. and the second term (the 1) refers to
    the root. Doing some algedra this leades to the
    right side of the equaiton. Having proved this
    then the maximum height of Red-black tree is
    fairly straightforward method. Not Let h be the
    height of the tree. Then by property 3 of
    Red-black tree, at least half of the nodes on any
    simple path from the root to a leaf must be
    black. So then the black-height of the root must
    be at least h/2. n gt 2( h/2) - 1 which is
    equivalent to n gt 2( bh(x)) - 1 n 1 gt 2( h/2)
    lg (n 1) gt lg (2( h/2)) h/2 h lt 2lg (n 1)
    4

64
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • Therefore we just proved that a red-black tree
    with n nodes has height at most 2lg (n 1).
  • 4.2 Height Balanced Tree
  • Height balanced tree is a different approach to
    bound the maximum height of a binary search tree.
    For each node, heights of left subtree and right
    subtree are stored. The key idea is to balance
    the tree by rotating around a node that has
    greater than threshold height difference between
    the left subtree and the right subtree. All boils
    down to the following property (1) At each node,
    the difference between height of left subtree and
    height of right subtree is less than threshold
    value. Height balanced tree should yield lg (n)
    height depends on the threshold value.

65
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • An intuitive, less rigorous and yet valid proof
    is provided. Imagine a simple binary tree in the
    worst case scenario, a line of nodes. If the
    simple binary tree were to be transformed into a
    height balanced tree, the following process
    should do it. (1) Pick some node near the middle
    of a given strand of nodes so that the threshold
    property satisfies (absolute value(leftH () -
    rightH ())) (2) Define this node as a parent and
    the resulting two strands (nearly equal in
    length) as leftsubtree and rightsubtree
    appropriately. (3) Repeat steps (1) and (2) on
    the leftsubtree and the rightsubtree. First note
    this process will terminate. It's because at each
    step, the given strand will be split in two
    halves smaller than the original tree. So this
    shows the number of nodes in a given strand will
    decrease.

66
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • This will eventually reach a terminal size of
    nodes determined by the threshold height
    difference. If a given strand is impossible to
    divide so that the threshold height difference
    holds, then that is the end for that sub
    recursive routine. Splitting into two halves
    recursively is analogous to dividing a mass into
    two halves each time. Dividing by 2 in turn leads
    to lg (n). So it follows the height of
    height-balanced tree should be lg (n), or
    something around that magnitude. It is
    interesting to note that height balanced tree is
    roughly complete binary tree. This is because
    height balancing allows nodes to gather around
    the top. There is probably a decent proof for
    this observation, but simple intuition is enough
    to see this.

67
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • 4.3 Weight Balanced Tree
  • Weight balanced tree is very similar to height
    balanced tree. It is very the same idea, but just
    different nuance. The overall data structure is
    also similar. Instead of heights of left subtree
    and right subtree, weights of left subtree and
    right subtree are kept. The weight of a tree is
    defined as the number of nodes in that tree. The
    key idea is to balance the tree by rotating
    around a node that has greater than threshold
    weight difference between the left subtree and
    the right subtree. Rotating around a node shifts
    the weight balance to a favorable one,
    specifically the one with smaller difference of
    weights of left subtree and right subtree. Weight
    balanced tree has the following main property
    (1) At each node, the difference between weight
    of left subtree and weight of right subtree is
    less than the threshold value.

68
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • Similar approach used to prove height balanced
    tree is used to show lg (n) of weight balanced
    tree. The proof uses mostly intuitive argument
    built on recursion and induction. Transforming a
    line of nodes, the worst case scenario in a
    simple binary tree, to a weight balanced tree can
    be done by the following steps. (1) Pick some
    node near the middle of a given strand of nodes
    so that the threshold property satisfies
    (absolutev alue(lef tW () - rig htW ())) (2)
    Define this node as a parent and the resulting
    two strands (nearly equal in length) as
    leftsubtree and rightsubtree appropriately. (3)
    Repeat steps (1) and (2) on the leftsubtree and
    the rightsubtree. It is easy to confuse the
    first step in height balanced tree and weight
    balanced tree, but picking the middle node surely
    satisfies both the height balanced tree property
    and weight balanced tree.

69
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • Maybe the weight balanced tree property is well
    defined, since the middle node presumably has
    same number of nodes before and after its
    position. This process will terminate. It's
    because at each step, the given strand will be
    split in two halves smaller than the original
    strand. So this shows the number of nodes in a
    given strand will decrease. This will eventually
    reach a terminal size of nodes determined by the
    threshold weight difference. Splitting into two
    halves recursively is analogous to dividing a
    mass into two halves each time. Dividing by 2 in
    turn leads to lg (n). So it follows the height of
    weight-balanced tree should be lg (n), or
    something around that magnitude. Like height
    balanced tree, weight balanced tree is roughly
    complete binary tree.

70
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • A New Balanced Search Tree(?)
  • A new balanced search tree has been developed.
    The binary tree has no theoretical value to
    computer science, but probably has practical
    value. The new balanced search tree will referred
    as median-weight-mix tree for each node will have
    a key, zero to two children, and some sort of
    weight.
  • 5.1 Background
  • Median-weight-mix tree probably serves no
    theoretical purpose because its not perfect. It
    has no well defined behavior that obeys a set of
    properties. Rather it serves practical purpose
    mostly likely in statistics.

71
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • Median-weight-mix tree is based on following
    assumption in data processing (1) Given lower
    bound and upper bound of total data input, random
    behavior is assumed, meaning data points will be
    evenly distributed around in the interval. (2)
    Multiple bells is assumed to be present in the
    interval. The first property is not hard to
    understand. This is based on the idea that nature
    is random. The data points will be scatter about,
    but evenly since random means each data value has
    equal chance of being present in the data input
    set. An example of this physical modeling would
    be a rain. In a rain, rain drops fall randomly
    onto ground. In fact, one can estimate amount of
    rainfall by sampling a small area.

72
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • Amount of rain is measured in the small sampling
    area and then total rail fall can be calculated
    by numerical pro jection, ratio or whatever
    method. The total rain fall would be
    rainfall-in-small-area area-of-totalarea /
    area-of-small-area. The second assumption is
    based upon less apparent observation. Nature is
    not completely random, which means some numbers
    will occur more often than others. When the data
    values and the frequency of those data values are
    plotted on 2D plane, a wave is expected. There
    are greater hits in some range of data values
    (the crests) than in other range of data values
    (the trough). A practical example would be
    height. One might expect well defined
    bell-shaped curve based on the average
    height.(People tends to be 5 foot 10 inches.) But
    this is not true when you look at it global
    scale, because there are isolated populations
    around the world. The average height of Americans
    is not necessarily the average height of Chinese.
    So this wave shaped curve is assumed.

73
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • 5.2 Algorithm
  • Each node will have a key (data number), an
    interval (with lower and upper bounds of its
    assigned interval) and weights of left subtree
    and right subtree. The weights of each subtree
    are calculated is based on constants R and S.
    Constant R represents the importance of focusing
    frequency heavy data points. Constant S
    represents the importance of focusing frequency
    weak data points. So the ratio R/S consequently
    represents the relative importance of frequency
    heavy vs. frequency weak data points. Then tree
    will be balanced to adjust to a favorable R/S
    ratio at each node by means of rotating, left
    rotating and right rotating.

74
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • 6 6.1 Methodology
  • Idea
  • Evaluating binary search trees can be done in
    various ways because they can serve number of
    purposes. For this pro ject, a binary search tree
    was developed to take some advantage of random
    nature of statistics with some assumption.
    Therefore it is reasonable to do evaluation on
    this basis. With this overall purpose, several
    behaviors of balanced search trees will be
    examined. Those are (1) Time it takes to process
    a data set (2) Average time retrieval of data (3)
    Height of the binary tree The above properties
    are the ma jor ones that outline the analysis.
    Speed is important and each binary tree is timed
    to check how long it takes to process input data.
    But average time retrieval of data is also
    important because it is best indication of
    efficiency of the data structures. What is the
    use when you can input a number quick but
    retrieve it slow? Lastly, height of the binary
    tree is check to see if how theoretical idea
    works out in practical situation.

75
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • 6.2 Detail
  • It is worthwhile to note how each behaviors are
    measured in C. For measuring time it take to
    process a data set, the starting time and the
    ending time will be recorded by function clock ()
    under time.h library. Then the time duration will
    be (End-Time - StartTime) / CLOCKS PER SEC. The
    average time retrieval of data will be calculated
    by first summing time it takes check each data
    points in the tree and dividing this sum by the
    number of data points in the binary tree. Height
    of the binary tree, the third behavior under
    study, is calculated by tree traversal, pre-, in-
    or post-order, by simply taking the maximum
    height/depth visited as each node is scanned.
    There will be several test cases (identical) to
    check red-black binary tree, height-balanced
    tree, weight-balanced tree, and median-weight-mix
    tree. First category of test run will be test
    cases with gradually increasing number of
    randomly generated data points. Second category
    of test run will be hand manipulated.

76
A Study of Balanced Search Trees Brainforming a
New Balanced Search TreeAnthony Kim
  • Data points will still be randomly generated
    however under some statistical behaviors, such a
    "wave," a single bell curve, etc. Third category
    of test run will be real life data points such as
    heights, ages, and others. Due to immense amount
    of data, some proportional scaling might be used
    to accommodate the memory capability of the
    balanced binary trees.
  • 7 Result Analysis
  • C codes of the balanced search trees will be
    provided. Testing of balanced search trees for
    their efficiency and such. Graphs and table will
    be provided. Under construction
  • 8 Conclusion Under Construction 9 Reference
    Under Construction http//newds.zefga.net/snips/Do
    cs/BalancedBSTs.html
  • App endix A Other Balanced Search Trees App
    endix B Co des

77
Linux Kernel Debugging APIThe purpose of this
project is to create an implementation of much of
the kernel API that functions in user space, the
normal environment that processes run in. The
issue with testing kernel code is that the live
kernel runs in kernel space, a separate area that
deals with hardware interaction and management of
all the other processes. Kernel spacedebuggers
are unreliable and very limited in scope a
kernel failure can hardly dump useful error
information because there's no operating system
left to write that information to disk.
77
78
Kernel Debugging User-Space API Library (KDUAL)
John Livingston
  • Abstract
  • The purpose of this project is to create an
    implementation of much of the kernel API that
    functions in user space, the normal environment
    that processes run in. The issue with testing
    kernel code is that the live kernel runs in
    kernel space, a separate area that deals with
    hardware interaction and management of all the
    other processes. Kernel space debuggers are
    unreliable and very limited in scope a kernel
    failure can hardly dump useful error information
    because there's no operating system left to write
    that information to disk. Kernel development is
    quite likely the most important active project in
    the Linux community.

79
Kernel Debugging User-Space API Library (KDUAL)
John Livingston
  • Any aids to the development process would be
    appreciated by the entire kernel development
    team, allowing them to do their work faster and
    pass changes along to the end user quicker. This
    program will make a direct contribution to kernel
    developers, but an indirect contribution to every
    future user of Linux.
  • Introduction and Background
  • The Linux kernel is arguably the most complex
    piece of software ever crafted. It must be held
    to the most stringent standards of performance,
    as any malfunction, or worse, security flaw,
    could be potentially fatal for a critical
    application. However, because of the nature of
    the kernel and its close interaction with
    hardware, it's extremely difficult to debug
    kernel code.

80
Kernel Debugging User-Space API Library (KDUAL)
John Livingston
  • The goal of this pr
Write a Comment
User Comments (0)
About PowerShow.com