CoxR:%20Open%20Source%20Development%20History%20Search%20System - PowerPoint PPT Presentation

About This Presentation
Title:

CoxR:%20Open%20Source%20Development%20History%20Search%20System

Description:

Analyze past processes/histories kept on existing systems, to help developers to ... Directory/file name. Mailing lists name. Bug class/description. Keywords ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 43
Provided by: ksas9
Category:

less

Transcript and Presenter's Notes

Title: CoxR:%20Open%20Source%20Development%20History%20Search%20System


1
CoxR Open Source Development History Search
System
Makoto Matsushita, Kei Sasaki, and Katsuro Inoue
Osaka University
2
Contents
  • Background
  • Open-source software development
  • Repository analysis system CoxR
  • Supporting Dynamic Communication System
  • Future research interests

3
Open Source Software Development
  • Open and parallel software development
  • Anybody join the party at anytime
  • Developers are living all over the world

source code
source code
source code
source code
source code manual
CVS
email
requests
requests ? fixes
developers
email archives
submit bug-report request feature enhancement
GNATS
4
Reusing repositories
  • System repositories have valuable information
    such as products evolutional histories and each
    developers information
  • processes to be done to products
  • knowledge on requirements and design
  • Analyze and reuse these contents may help to
    reduce time/efforts of whole software development
  • reuse the ways of bug-fix
  • understanding a project itself that are going to
    join
  • reuse (a part of) products/components
  • However, there are some difficulties to reuse
    contents

5
Problem 1less relationship between systems
  • Where can I find what I want?

It seems that bktr driver has a bug so Id like
to fix it
user
files also need to be changed
proposed fix for bktr driver
discussions on bktr driver
source code fixes
CVS
GNATS
email archive
6
Problem 2Interests may vary
  • Even if the problem is same, a solution that is
    done in the past is not suitable for all peoples
  • knowledge and processes may vary for developers
  • information needs may vary on time

Maybe similar bugs were appeared on other drivers
so search them up
Problem theres a bug on bktr driver
Id like to seek authorities of graphics driver
Id like to have a new version of bktr driver
7
Objective
  • Analyze past processes/histories kept on existing
    systems, to help developers to search,
    understand, reuse such processes
  • Modeling information on systems as development
    community, using CVS, Email, and GNATS
  • Propose an information extraction approach from
    development community
  • A prototype of the proposed approach

8
Topics
  • Step 1 Modeling information
  • Step 2 Information extraction algorithm
  • Step 3 System implementation

9
Model elements
  • People developers registered to CVS, email
    archive, and GNATS databases
  • Knowledge contents of CVS, E-mail, and GNATS

integrated model
email archives
GNATS
CVS
10
Extracting people/knowledge
Knowledge
?
file path revision tag, date
source code comments
developer contributor
CVS
Subject body
From To, Cc
Message-Id Date
E-mail
modification
base
file path PR date last modified
Originator Responsible
fix audit-trail status
category bug class description
GNATS
11
People/Knowledge network
  • We assume that the network has 3 types of edges
  • People-Knoledge
  • People-People
  • Knowledge-Knoledge

Development Community
12
Extracting network edges (1/2)
  • People-Knowledge edge
  • People/Knowledge elements in the same CVS, Email
    and GNATS information
  • People-People edge
  • Peoples in the same CVS, Email, and GNATS
    information
  • Peoples subscribed to the same lists
  • Peoples working on the same directory

13
Extracting network edges (2/2)
  • Knowledge-Knowledge edge
  • Directly connected
  • Revision histories to the same file
  • Files in the same directory
  • Modified at the same time
  • Email threads
  • Email/PR IDs
  • Similar Knowledges
  • Source codes
  • Keywords
  • Base/modification information in GNATS

14
Topics
  • Step 1 Modeling information
  • Step 2 Information extraction algorithm
  • Step 3 System implementation

Finding out a small network that is matched to
the users input
15
Topic community
  • Topic reusable process and information
  • Elements related to a topic can be defined as a
    sub-network of development community
  • Topic community may vary to each user

development community
Experts on this area
patches
Topic communmity
16
Topic community extraction (1/6)
  • Select the initial knowledge elements
  • Assume that a topic is given by a user
  • Extract knowledge matched to the topic
  • Select an initial knowledge elements

I found that there is an register error on bktr
driver while watching TV by fxtv program
Code fragments Directory/file name Mailing lists
name Bug class/description Keywords Date
CVSbktr_core.c 1.20 Comment fix register error
Keyword bktr
E-mailSubject bktr module unloding (2002)
user
GNATSDescription fix bktr option error (2000)
Search results
17
Topic community extraction (2/6)
  • Select the initial knowledge elements
  • Assume that a topic is given by a user
  • Extract knowledge matched to the topic
  • Select an initial knowledge elements

It seems that bktr_card.c rev. 1.20 is good
CVSbktr_core.c 1.20 Comment fix register error
E-mailSubject bktr module unloding (2002)
user
Select bktr_card.c
GNATSDescription fix bktr option error (2000)
18
Topic community extraction (3/6)
  • Show related people/knowledges using the network
  • User selects appropriate elements again

Id like to know the people working on bktr_core.c
developer fjoe
bktr_core.c
contributor phk
Search results
user
Search related elements
contributor roger
19
Topic community extraction (4/6)
  • Show related people/knowledges using the network
  • User selects appropriate elements again

developer fjoe
Hmm, fjoe is actual developer so I want to know
more about him.
bktr_core.c
contributor phk
Select fjoe
user
contributor roger
20
Topic community extraction (5/6)
  • Search and select elements repeated

Variables changed in yuv422_pro()
Same time changed bktr_card.c
Ok, are there any other elements that when fjoe
changed bktr_core.c
developer fjoe
bktr_core.c
Search results
user
Search related elements
21
Topic community extraction (6/6)
  • Search and select elements repeated

Tracking GNATS elements that is talking about
bktr_card.c
Variables changed in yuv422_pro()
Same time changed bktr_card.c
GNATS PR41437 (closed) DescriptionProblems
bktr_card.cyuv422_pro()
developer fjoe
bktr_core.c
Email commented to the change
PR41437 causes a register error
Search results
Topic community
user
The user finally get information about the
changes to bktr_card.c, that helps to fix
register error
Search related elements
22
Topics
  • Step 1 Modeling information
  • Step 2 Information extraction algorithm
  • Step 3 System implementation

CoxR web-based system, using FreeBSD data
23
CoxR implementation
  • Using FreeBSD development data, from 1994 to 2004
  • System development environment
  • CPU Pentium4 1.5GHz
  • RAM 512MB(SDRAM)
  • OS Debian GNU/Linux
  • System size about 10000 LOCs

CVS FreeBSD CVS repository (Total 57822 files,
618186 revisions) E-mail Commited changes
mailing lists (Total 213723) BTS FreeBSD
GNATS PRs (Total 82350)
24
System overview
Topic words
Web Server
Search results
selection
user
System Control
History DB
Matched People/Knowledge
Knowledge-Knowledge relations
People-Knowledge relations
People-People relations
Information Extraction
Knowledge People
Relation DB
Knowledge People
CVS
E-mail
Relation extraction
GNATS
CoxR-C
???????
????????
???????
25
System evaluation
  • Purpose
  • CoxR provides useful information to developers
    with appropriate search results
  • Process
  • Announcing CoxR to freebsd-hackers and
    freebsd-current mailing lists that are mainly
    for FreeBSD developers
  • Trace users behaviors with webservers log
  • Evaluation period Jan/31/2005-Feb/21/2005
  • Total users79 (31 unique users)

26
Initial knowledge selection
  • Unfortunately not all users select knowledge from
    the topic search results
  • Maybe they are just try to use CoxR search, or
    search results
  • is not good for users
  • 18 out of 31 users select initial knowledge
  • Type of information selected
  • CVS 12
  • E-mail 4
  • GNATS 2
  • Selection times average 4 times per topics (min
    1, max 9)

27
Topic community search
  • Users actually search topic community
  • 12 out of 18
  • they used to search related people and knowledge
    within the same subsystem
  • Average network traversal 2 times
  • People-People 1
  • People-Knowledge 8
  • Knowledge-Knowledge 13

28
Discussions
  • Initial knowledge selections
  • 56 search results would leads to valuable
    information
  • Search by keyword, then search by developer
    names and/or date is a typical search patterns
  • Topic community selection
  • 67 users who find initial knowledge elements are
    successfully find their own topic community
  • They used to trace Knowledge-Knowledge and
    People-Knowledge edge of development network

29
Conclusion
  • CoxR, a search system for open-source software
    development
  • CVS, Email, and GNATS
  • Development network, topic community
  • Evaluation helped with real developers
  • Keywords may have its information costs
  • Easy to find important keywords
  • Links between similar keywords
  • Developer roles
  • Easy to find people by their roles
  • Reuse topic community found by others
  • It can be a suggestion of finding out topic
    community

30
  • Fin

31
CoxR
CoxR (Web Server)
CoxR user
CGI-Main
Data Display Record System
Token compare tool
Lexical analysis tool
CVS Info DB
Fusion info DB
E-mail Info DB
Code DB
CoDS
SPxR
Fusion info Create tool
CVS info Create tool
E-mail info Create tool
DB Create tool
E-mail Archive
CVS Repository
32
Example case
Sending a password
Needs improvements
33
Searching the repositories
Identify similar code
34
Searching similar code
Theres an evidence of improvement, but hard to
understand whats are actually changed
35
Searching related information
36
Search by revision histories
37
Search by development time
38
Search by keyword openssh
Combining search results will make it easy to
find what we need
39
Search similar information
Files commit at the same time (2001/03/20
020640) and same developer (green)
Actual source code of how to hide the password
packet length is found by CoxR
40
Solutions
Search how to fix
41
Discussions
  • Search similar codeshows actual changes
  • Search relative infomation Understanding how
  • to fix the
    security hole
  • Easy to detect what we need, since any kind of
    information, including keywords, time, developer
    name, code fragment, can be used.
  • Easy to understand search results by finding
    relative information easily it helps to grasp
    not only what, but also why this change
    happened.

42
Conclusion Remarks
  • Implementing CoxR, a search system for both CVS
    revisions and email archives.
  • Using actual open-source development data, CoxR
    provides easy and quick way to search useful
    information on software development.
  • Broader experimentation
  • Improvements on search method (multiple search at
    one time)
  • Information scoring (define importance/relation
    level of each information)
Write a Comment
User Comments (0)
About PowerShow.com