Title: TweaXML
1TweaXML
A Language to manipulate extract data from XML
files
Kaushal Kumar (kk2457) Srinivasa Valluripalli
(sv2232)
2Contents
- Overview and motivation
- Language features
- XML handling functionalities
- Architectural Design
- Tutorial (with example)
- Lessons learned
- Summary
3Overview and Motivation
- TweaXML is a language to parse and extract data
from XML files and create new csv/txt files
in user defined data-formats. - XML is a universal language and is used to pass
data around between heterogeneous systems. - (But) Parsing an XML file programmatically is
not straightforward. - To parse an XML file
- First you need to learn Java (for example)
- Then learn APIs like DOM-Parser and SAX-Parser.
- These API-usage can be too complicated.
- TweaXML provides a much simpler language to
parse XML files. Moreover, it provides a way to
create output files containing this data in
user-defined formats.
4Language Features
- Carefully chosen set of keywords
- Multiple Types (int, string, node, file, array)
- Several Operators
- Unary Operators (, !)
- Arithmetic Operators (, -, , /)
- Comparison (lt, lt, gt, gt, , !)
- Logical Operators (, )
- node operators (getchild, getvalue)
- file operators (open, create, print, close)
- inbuilt functions (add, subtract, multiply,
divide, length)
5Language Features (cont)
- various types of statements
- Conditional statements (if else)
- Iterative statements (while)
- jump statements (return, continue, break)
- I/O statements (open, create, print, close)
- inbuilt function calls (add, subtract, multiply,
divide, length)
6XML Handling functionalities
- Open an XML file to read (open)
- returns the root node of the xml file
- Get the child nodes of a node, using the xpath
of the child-nodes (getchild) - returns an array of child-nodes
- Get the length of the child nodes array (length)
- Get the value of a node (getvalue)
- returns the value of the node in string format
- add the values of two nodes (add)
- implicit checks of data types
- subtract the values of two nodes (subtract)
- multiply the values of two nodes (multiply)
- divide the values of two nodes (divide)
7File Handling functionalities
- Create an output file to write (create)
- returns the file type
- Write in the file (print)
- close the output file once you are done (close)
8Architectural Design
Front end (TweaXMLLexer TweaXMLParser)
Tree Walker (TweaXmlWalker TweaXmlCodeGen)
Back End (CodeGen.java)
Run time Libraries (Apaches DOM Parser)
9Tutorial - Example
(A tweaxml program to extract students
performance data and create a csv file with the
average marks of each student)
Input XML file (marks_data.xml)
ltstudentsgt ltstudentgt ltnamegtkaushallt/namegt ltho
mework1gt85lt/homework1gt lthomework2gt85lt/homework2gt
ltmidtermgt70lt/midtermgt ltfinalgt90lt/finalgt lt/st
udentgt ltstudentgt ltnamegtSrinilt/namegt lthomework
1gt80lt/homework1gt lthomework2gt85lt/homework2gt ltmi
dtermgt87lt/midtermgt ltfinalgt95lt/finalgt lt/studentgt
lt/studentsgt
10Tweaxml program
start() file output node rootNode output
create "AvgMarks.csv" rootNode open
"marks_data.xml" node studentNodes student
Nodes getchild rootNode "student" int
len len length studentNodes if(len gt
0) int j j0 while(j lt
len) node nameNode, homework1Node,
homework2Node, midtermNode,
finalNode string name,
homework1Marks, homework2Marks, midtermMarks,
finalMarks nameNode getchild
studentNodesj "name" homework1Node
getchild studentNodesj "homework1" homework2
Node getchild studentNodesj
"homework2" midtermNode getchild
studentNodesj "midterm" finalNode
getchild studentNodesj "final"
11 name getvalue nameNode0 homework1Ma
rks getvalue homework1Node0 homework2Marks
getvalue homework2Node0 midtermMarks
getvalue midtermNode0 finalMarks getvalue
finalNode0 string totalMarks totalMa
rks add homework1Marks homework2Marks totalM
arks add totalMarks midtermMarks totalMarks
add totalMarks finalMarks string
avgMarks avgMarks divide totalMarks
"4" print output name print output
"\t" print output avgMarks print output
"\n" j j 1 close output
12Output
Output file (AvgMarks.csv)
kaushal 82.5 Srini 86.75
13Lessons Learned
- Start early on the project
- More functionalities could have been added
- More data types could have been provided
- User defined functions could have been added
14Summary
- TweaXML provides an easier way to deal with xml
files. - Data can be extracted and written out in
user-defined formats. - No need to learn APIs like DOMParser and
SAXParser - Its not perfect, but its highly useful.
- More functionalities could have been provided if
given more time.