Title: Building EDEN NonXML Files
1 - Building EDEN Non-XML Files
- September 2006
Keith Fox, Perot Team John Brewer, Perot Team
2Agenda
- ESS Introduction
- ESS Databases
- File formats
- File Specification Review
- Order of File Submission
- Review of Errors and Edits
- SEA Tools - What do you use?
3ESS Sub-Applications
Access to ESS Home Page
Access to transmit data files
Access to view status of transmitted files.
Access to view submission status by SEA, LEA and
School.
This sub-application is available to view and
verify data at a State, LEA, or School level.
Information on data groups, categories, permitted
codes, and file specifications
This area allows you to change your password as
needed.
4System Components
- EDEN Submission System
- Receives files and checks for errors
- Staging Database
- Performs reasonability edits
- Data Repository
- Responsibility of data transferred to ED
5EDEN Submission System
- Manages transmission of files from the SEAS to
EDEN - Receives files from SEA
- Performs format and validation edits
- Generates submission and error reports
- Emails file receipt and status to submitter
- If a file contains any format or validation
errors, its data does not go beyond this point
6Staging Database
- Contains data from files that passed format and
validation edits - Contents can be viewed from the Education Unit
Profile - Performs reasonability edits on submitted data
- Data requiring explanations from the SEA does not
move past this point - Before leaving this stage, ED will contact SEA
regarding any data quality concerns
7Data Repository
- Contains cleaned and edited files
- Responsibility of the data has transferred from
SEA to ED - SEA will receive notice 96 hours before data is
moved to the Data Repository - ED subject matter experts review the data in the
Data Repository before it is used or made
available to public
8Component Overview
9File Format Options
10File Formats
- Fixed Format (.txt)
- Comma Delimited (.csv)
- Tab Delimited (.tab)
- XML Format (.xml)
11Fixed Format
- Each record has a pre-defined length
- Each field has a beginning position and a
pre-defined length - All fields must be included in a record, whether
used or not - Filler fields and optional fields where no value
is reported should be blank - Unused positions of a field should be blank
12Fixed Format
- The external file name in the header record must
have a file extension of .txt - Numeric fields must not contain commas
- Data in decimal fields must contain the decimal
point (e.g., 1.123)
13Fixed Format Examples
- Header Record Field Content
14Fixed Format Example
15Comma Delimited Format
- Similar to the fixed files order and format
- Each field is separated by a comma
- All fields, including filler fields, must be
accounted for
16Comma Delimited Format
- Except for the last field in a record, each field
whether valued or not, must be designated by a
comma, including filler fields - Example ( indicates CRLF)
- First Field Three Bears School
- Second Field Filler
- Third Field 123 Main Street
- Last Field Filler
- Three Bears School,,123 Main Street,
17Comma Delimited Format
- For valued fields, the comma should immediately
follow the data value (e.g. no extra spaces, Ex.
,200,) - For a comma delimited file, the external file
name in the header must have a file extension of
.csv
18Comma Delimited Format Examples
- Header Record Field Content
19Comma Delimited Format Examples
20Tab Delimited Format
- Almost identical to comma delimited format except
fields are separated by a tab character instead
of a comma - For a tab delimited file, the external file name
in the header record must have a file extension
of .tab
21Tab Delimited Format
- Each field (valued, not valued, and filler
fields) must be designated by a tab character, - except for the last field in a record
- Examples
- gt indicates a tab character
- indicates CRLF)
- First Field Three Bears School
- Second Field Filler
- Third Field 123 Main Street
- Last Field Filler
- Three Bears Schoolgtgt123 Main Streetgt
22Tab Delimited Format Examples
- Header Record Field Content
23Tab Delimited Format Examples
24Extensible Markup Language XML
- XML is a standard for creating documents that
describe and apply a structure to data - XML documents are self-describing making them
readable by both humans and computers - Extensible means there is not a fixed set of
elements like HTML. Authors can define their own
tags - A common format for exchanging data over the web
25Example of XML
- ltFILETRANSMISSION FILELAYOUTTYPELEA MEMBERSHIP
TABLE" - FILEIDLEA MEMBERSHIP 15 OCT 2006"
- SCHOOLYEAR"2006-2007"gt
- ltAGENCY FIPSSTATECODE80" STATEAGENCYIDNUMBER
"01" - STATELEAIDNUMBER"518"gt
- ltTABLETYPE TYPEABBRV"MEMBER"
TOTALINDICATOR"N"gt - ltCATEGORY TYPE"GRADELVMEM"
VALUE"03"/gt - ltCATEGORY TYPE"RACEETHNIC"
VALUE"AS"/gt - ltCATEGORY TYPE"GENDER" VALUE"F"/gt
- ltAMOUNTgt456lt/AMOUNTgt
- ltEXPLANATION/gt
- lt/TABLETYPEgt
- lt/AGENCYgt
- lt/FILETRANSMISSIONgt
26 27File Specifications
- Contain layout of all non-XML file types
- Have 1 specification for each file type
- All share a similar format
- Serve as the primary reference point for
preparing and submitting files - Current specifications are available at
http//www.ed.gov/pbdmi
28File Specification Sections
- Section 1.0 - Purpose
- Section 2.0 - Naming Convention
- Section 3.0 - Guidance for Submitting this file
- Sections 4-6 - Details for SEA, LEA, and School
versions of the file
29Section 1.0 - Purpose
- Provides an overview of what data is submitted in
the file - Lists changes from previous years specification
- Lists other ED collections that request this data
- Clarifies the set of Educational Units from which
the data should be collected - The Category Set table is now located in this
section - Easier to determine requirements of a file at a
glance
30Section 1.0 - Purpose, contd
- Category
- Lists the data groups that make up a specific
table entry - Details which fields should be populated for any
given record in a file
31Section 2.0 - File Naming Convention
- Describes naming convention for consistency
- File names can be no longer than 25 characters
- Not strictly necessary, but strongly recommended
32Section 3.0 - Guidance for Submitting this File
- Helpful hints and frequently asked questions
appear in this section - Information is specific to each particular file
specification
33Sections 4-6 - File Level Specifications
- Sections 4, 5, and 6 provide the technical
details needed to prepare the SEA, LEA, and
School level files - If data is not collected at a given level, that
level is left out - Includes tables listing the data groups and their
permitted values - Consists of 3 subsections
- Header Record Definition
- Data Record Definition
- Examples
34Sections 4-6 - File Level Specifications, contd
- The Header and Data Record tables contain 7
columns - Data Group Name
- Start Position
- Length
- Type
- Mandatory/Optional
- Definition/Comments
- Permitted Values
35Sections 4-6 - File Level Specifications, contd
- Data groups are listed in the order they should
exist within each record - Start position
- Indicates the starting point of a data group
measured in characters from the beginning of a
line - Only applies to fixed-width files
- Length
- The maximum number of characters for a data group
value - Values exceeding this length can cause format
errors - Applies to all non-XML file types
36Sections 4-6 - File Level Specifications, contd
- String
- Can consist of any alphanumerical character
- Leading zeros are significant in numbers
expressed as strings - Number
- Can consist of any integer value -1 or greater
- Leading zeros are not required
- Decimal
- Used to report percentages
- Consists of a digit followed by 4 decimal places
(0.8355 83.55)
37Mandatory vs. Optional
- An M in the Mandatory/Optional column indicates
that the field is mandatory and must contain some
data for each record within the file - An O in the Mandatory/Optional column indicates
that the field is Optional and data is not
required for that field - Optional fields may still be required in
accordance with the category chart of the file
specification
38Zero, Missing, Not Collected
- Zero should be used in a numerical field when
there are no instances of a requested count - -1 can be used to indicate missing or not
collected in a numerical field - MISSING and NOTCOLLECT can be used as permitted
values for most string fields
39Definition
- Definition/Comments- gives a brief explanation of
the data group along with additional information
on preparing the file - Permitted Values- lists the only accepted values
for a given field - If blank, permitted values must be consistent
with the data type
40Examples
- The examples section includes an excerpt of
properly formatted file in each of the 3 non-XML
formats
41Order of File Submissions
42Order of File Submissions
- Timing of 06-07 Submissions
- Importance of Directory Files
- Relationships Among Files
43Timing for 06-07 EDEN Data Submissions
- The Submission time frame for School Year
2006-2007 data begins in January 2007 - SEAs should begin submitting data as soon as the
data is available after January 1, 2007 - SY 2006-07 EDEN Workbook will be provided before
December 1, 2006
44Importance of Directory Files
- Prior to any other data being submitted for an
educational unit, a complete and correct
Directory record must be in place. - Directory files are hierarchical.
45Importance of Directory Files
- Directory and Grades Offered files are not school
year specific, but continue from year to year - They only need to be submitted when data changes
or if data for the education unit does not exist
in the database
46Relationships Among Files
- Grades Offered files should be submitted for any
LEA or School which has students enrolled - Data for an education unit will not be accepted
unless a record for that education unit exists in
the Directory file
47Updating Directory Files
- Submitting a Directory file with only a new or
corrected educational unit is satisfactory, or
the entire Directory file may be resubmitted - If the entire Directory file is submitted, be
sure to include the new and/or corrected
educational unit(s) - Submit a Grades Offered file and Membership file
for any new educational units
48Directory File Identifiers
- State Identifiers are the primary identifiers
used to match a submitted educational unit
Directory record to an educational unit Directory
record in the EDEN database. - NCES Identifiers are used as secondary checks
when a match cannot be determined with the
primary identifiers.
49Primary Identifiers
50Adding/Updating an SEA
51Adding/Updating an LEA/School
52Modifying State Identifiers
- It is not currently possible to modify a states
ID by submitting an updated Directory file. - EDEN will see the modified ID as a new unit and
add it to the system, resulting in duplicate
records. - Investigation is under way to better support
updating primary identifiers.
5306-07 Directory Enhancements
- Improved implementation of Directory data
effective dating functionality - Effective Dates will be required for all
non-future education units - EDEN will now track the history of certain
fields Operational Status, Addresses, and
Contact Information - EDEN will prohibit file submissions against
education units that have an Operational Status
of Closed, Inactive, or Future Agency
54File Submission Priorities
- States should submit the CCD related files first
to support the CCD-EDEN merger - Those files are Directory/Grades Offered,
Membership, NCLB School Year Start Status, Staff
FTE, Graduates and Dropouts - Next states should submit files that can support
the CSPR process. - Those files are the Student Performance, Students
Tested, NCLB end of year Choice, and Supplemental
Services
55EDEN Edits, Errors, and Warnings
56Edits, Errors, and Warnings
- Format Edits Errors
- Validation Edits Errors
- Reasonability Edits Errors
57Format Edits and Errors
- Definition transmitted file must be in an
allowable specified format - Purpose If the transmitted file is not in an
allowable specified format, DRPS cannot interpret
it - Result If the file is acceptably formatted,
Validation Edits are performed. If not, the file
is rejected with a format error
58Format Edits and Errors Format Error Checker
- PSC has developed an EDEN file format checker.
- PSC uses this internally.
- Currently there is only a 2003-05 version.
- Distributed to states last year
- The 2005-06 version is in the final testing phase
and will be released in the near future.
59Format Edits and Errors Format Error Checker,
contd
- Format Checker
- Need Perl interpreter installed
- Free download of Perl interpreter available
- Checks for Format Errors
- Shows Row Number, Record Number and Field Name
- Allows users to save, copy and paste to print
results. - Currently designed for Non-XML files only
- Can be edited with any text editor. We encourage
improvements and sharing amongst other EDEN
participants - This tool is work in progress
- PSC will send updates
60Format Edits and Errors Format Error Checker
Output
- Open the EDEN Format Checker
- Select the file you wish to check format of
- Click Go
- Results are provided
- Showing Record and Field Name
61Validation Edits and Errors
- Definition Identification of invalid data used
in a field where specific permitted values or a
data type has been specified - Purpose To ensure that the data stored in the
Data Repository can be interpreted according to
agreed upon rules - Result Any validation error will cause the
entire file to be rejected. - If a file contains no validation errors
- Any applicable reasonability edits will be
performed. - The submission will be loaded into the Staging
Database.
62Validation Edits and ErrorsValidation Error
Thresholds
- Validation errors will be identified up to a
threshold - Currently 1000 or more errors identified will
result in the termination of validation edit
processing
63Reasonability Edits
- Definition Identification of submissions where
questionable data has been submitted - Purpose To flag for explanation data that
changed significantly from one year to the next - Result
- The submission is loaded into the staging
database - Records where the reasonability edits failed must
be explained - File submissions will not be transferred to the
Data Repository until a reasonable explanation
has been approved
64Errors and Edits
- Questions concerning specific errors?
- Questions concerning problems or concerns with
regards to file submissions?
Open Session
65Recap - Objectives
- Provided a basic understanding of ESS and its
sub-applications. - Provided an understanding of the data file
submission lifecycle. - Provided an in-depth knowledge of the File
Specifications and how to read them. - Provided information on ESS edits, errors, and
warnings.