Building EDEN NonXML Files - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

Building EDEN NonXML Files

Description:

Length. The maximum number of characters for a data group value. Values exceeding this length can cause format errors. Applies to all non-XML file types ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 66
Provided by: rebecc121
Category:

less

Transcript and Presenter's Notes

Title: Building EDEN NonXML Files


1
  • Building EDEN Non-XML Files
  • September 2006 

Keith Fox, Perot Team John Brewer, Perot Team
2
Agenda
  • ESS Introduction
  • ESS Databases
  • File formats
  • File Specification Review
  • Order of File Submission
  • Review of Errors and Edits
  • SEA Tools - What do you use?

3
ESS Sub-Applications
Access to ESS Home Page
Access to transmit data files
Access to view status of transmitted files.
Access to view submission status by SEA, LEA and
School.
This sub-application is available to view and
verify data at a State, LEA, or School level.
Information on data groups, categories, permitted
codes, and file specifications
This area allows you to change your password as
needed.
4
System Components
  • EDEN Submission System
  • Receives files and checks for errors
  • Staging Database
  • Performs reasonability edits
  • Data Repository
  • Responsibility of data transferred to ED

5
EDEN Submission System
  • Manages transmission of files from the SEAS to
    EDEN
  • Receives files from SEA
  • Performs format and validation edits
  • Generates submission and error reports
  • Emails file receipt and status to submitter
  • If a file contains any format or validation
    errors, its data does not go beyond this point

6
Staging Database
  • Contains data from files that passed format and
    validation edits
  • Contents can be viewed from the Education Unit
    Profile
  • Performs reasonability edits on submitted data
  • Data requiring explanations from the SEA does not
    move past this point
  • Before leaving this stage, ED will contact SEA
    regarding any data quality concerns

7
Data Repository
  • Contains cleaned and edited files
  • Responsibility of the data has transferred from
    SEA to ED
  • SEA will receive notice 96 hours before data is
    moved to the Data Repository
  • ED subject matter experts review the data in the
    Data Repository before it is used or made
    available to public

8
Component Overview
9
File Format Options
10
File Formats
  • Fixed Format (.txt)
  • Comma Delimited (.csv)
  • Tab Delimited (.tab)
  • XML Format (.xml)

11
Fixed Format
  • Each record has a pre-defined length
  • Each field has a beginning position and a
    pre-defined length
  • All fields must be included in a record, whether
    used or not
  • Filler fields and optional fields where no value
    is reported should be blank
  • Unused positions of a field should be blank

12
Fixed Format
  • The external file name in the header record must
    have a file extension of .txt
  • Numeric fields must not contain commas
  • Data in decimal fields must contain the decimal
    point (e.g., 1.123)

13
Fixed Format Examples
  • Header Record Field Content

14
Fixed Format Example
15
Comma Delimited Format
  • Similar to the fixed files order and format
  • Each field is separated by a comma
  • All fields, including filler fields, must be
    accounted for

16
Comma Delimited Format
  • Except for the last field in a record, each field
    whether valued or not, must be designated by a
    comma, including filler fields
  • Example ( indicates CRLF)
  • First Field Three Bears School
  • Second Field Filler
  • Third Field 123 Main Street
  • Last Field Filler
  • Three Bears School,,123 Main Street,

17
Comma Delimited Format
  • For valued fields, the comma should immediately
    follow the data value (e.g. no extra spaces, Ex.
    ,200,)
  • For a comma delimited file, the external file
    name in the header must have a file extension of
    .csv

18
Comma Delimited Format Examples
  • Header Record Field Content

19
Comma Delimited Format Examples

20
Tab Delimited Format
  • Almost identical to comma delimited format except
    fields are separated by a tab character instead
    of a comma
  • For a tab delimited file, the external file name
    in the header record must have a file extension
    of .tab

21
Tab Delimited Format
  • Each field (valued, not valued, and filler
    fields) must be designated by a tab character,
  • except for the last field in a record
  • Examples
  • gt indicates a tab character
  • indicates CRLF)
  • First Field Three Bears School
  • Second Field Filler
  • Third Field 123 Main Street
  • Last Field Filler
  • Three Bears Schoolgtgt123 Main Streetgt

22
Tab Delimited Format Examples
  • Header Record Field Content

23
Tab Delimited Format Examples

24
Extensible Markup Language XML
  • XML is a standard for creating documents that
    describe and apply a structure to data
  • XML documents are self-describing making them
    readable by both humans and computers
  • Extensible means there is not a fixed set of
    elements like HTML. Authors can define their own
    tags
  • A common format for exchanging data over the web

25
Example of XML
  • ltFILETRANSMISSION FILELAYOUTTYPELEA MEMBERSHIP
    TABLE"
  • FILEIDLEA MEMBERSHIP 15 OCT 2006"
  • SCHOOLYEAR"2006-2007"gt
  • ltAGENCY FIPSSTATECODE80" STATEAGENCYIDNUMBER
    "01"
  • STATELEAIDNUMBER"518"gt
  • ltTABLETYPE TYPEABBRV"MEMBER"
    TOTALINDICATOR"N"gt
  • ltCATEGORY TYPE"GRADELVMEM"
    VALUE"03"/gt
  • ltCATEGORY TYPE"RACEETHNIC"
    VALUE"AS"/gt
  • ltCATEGORY TYPE"GENDER" VALUE"F"/gt
  • ltAMOUNTgt456lt/AMOUNTgt
  • ltEXPLANATION/gt
  • lt/TABLETYPEgt
  • lt/AGENCYgt
  • lt/FILETRANSMISSIONgt

26
  • File Specifications

27
File Specifications
  • Contain layout of all non-XML file types
  • Have 1 specification for each file type
  • All share a similar format
  • Serve as the primary reference point for
    preparing and submitting files
  • Current specifications are available at
    http//www.ed.gov/pbdmi

28
File Specification Sections
  • Section 1.0 - Purpose
  • Section 2.0 - Naming Convention
  • Section 3.0 - Guidance for Submitting this file
  • Sections 4-6 - Details for SEA, LEA, and School
    versions of the file

29
Section 1.0 - Purpose
  • Provides an overview of what data is submitted in
    the file
  • Lists changes from previous years specification
  • Lists other ED collections that request this data
  • Clarifies the set of Educational Units from which
    the data should be collected
  • The Category Set table is now located in this
    section
  • Easier to determine requirements of a file at a
    glance

30
Section 1.0 - Purpose, contd
  • Category
  • Lists the data groups that make up a specific
    table entry
  • Details which fields should be populated for any
    given record in a file

31
Section 2.0 - File Naming Convention
  • Describes naming convention for consistency
  • File names can be no longer than 25 characters
  • Not strictly necessary, but strongly recommended

32
Section 3.0 - Guidance for Submitting this File
  • Helpful hints and frequently asked questions
    appear in this section
  • Information is specific to each particular file
    specification

33
Sections 4-6 - File Level Specifications
  • Sections 4, 5, and 6 provide the technical
    details needed to prepare the SEA, LEA, and
    School level files
  • If data is not collected at a given level, that
    level is left out
  • Includes tables listing the data groups and their
    permitted values
  • Consists of 3 subsections
  • Header Record Definition
  • Data Record Definition
  • Examples

34
Sections 4-6 - File Level Specifications, contd
  • The Header and Data Record tables contain 7
    columns
  • Data Group Name
  • Start Position
  • Length
  • Type
  • Mandatory/Optional
  • Definition/Comments
  • Permitted Values

35
Sections 4-6 - File Level Specifications, contd
  • Data groups are listed in the order they should
    exist within each record
  • Start position
  • Indicates the starting point of a data group
    measured in characters from the beginning of a
    line
  • Only applies to fixed-width files
  • Length
  • The maximum number of characters for a data group
    value
  • Values exceeding this length can cause format
    errors
  • Applies to all non-XML file types

36
Sections 4-6 - File Level Specifications, contd
  • String
  • Can consist of any alphanumerical character
  • Leading zeros are significant in numbers
    expressed as strings
  • Number
  • Can consist of any integer value -1 or greater
  • Leading zeros are not required
  • Decimal
  • Used to report percentages
  • Consists of a digit followed by 4 decimal places
    (0.8355 83.55)

37
Mandatory vs. Optional
  • An M in the Mandatory/Optional column indicates
    that the field is mandatory and must contain some
    data for each record within the file
  • An O in the Mandatory/Optional column indicates
    that the field is Optional and data is not
    required for that field
  • Optional fields may still be required in
    accordance with the category chart of the file
    specification

38
Zero, Missing, Not Collected
  • Zero should be used in a numerical field when
    there are no instances of a requested count
  • -1 can be used to indicate missing or not
    collected in a numerical field
  • MISSING and NOTCOLLECT can be used as permitted
    values for most string fields

39
Definition
  • Definition/Comments- gives a brief explanation of
    the data group along with additional information
    on preparing the file
  • Permitted Values- lists the only accepted values
    for a given field
  • If blank, permitted values must be consistent
    with the data type

40
Examples
  • The examples section includes an excerpt of
    properly formatted file in each of the 3 non-XML
    formats

41
Order of File Submissions
42
Order of File Submissions
  • Timing of 06-07 Submissions
  • Importance of Directory Files
  • Relationships Among Files

43
Timing for 06-07 EDEN Data Submissions
  • The Submission time frame for School Year
    2006-2007 data begins in January 2007
  • SEAs should begin submitting data as soon as the
    data is available after January 1, 2007
  • SY 2006-07 EDEN Workbook will be provided before
    December 1, 2006

44
Importance of Directory Files
  • Prior to any other data being submitted for an
    educational unit, a complete and correct
    Directory record must be in place.
  • Directory files are hierarchical.

45
Importance of Directory Files
  • Directory and Grades Offered files are not school
    year specific, but continue from year to year
  • They only need to be submitted when data changes
    or if data for the education unit does not exist
    in the database

46
Relationships Among Files
  • Grades Offered files should be submitted for any
    LEA or School which has students enrolled
  • Data for an education unit will not be accepted
    unless a record for that education unit exists in
    the Directory file

47
Updating Directory Files
  • Submitting a Directory file with only a new or
    corrected educational unit is satisfactory, or
    the entire Directory file may be resubmitted
  • If the entire Directory file is submitted, be
    sure to include the new and/or corrected
    educational unit(s)
  • Submit a Grades Offered file and Membership file
    for any new educational units

48
Directory File Identifiers
  • State Identifiers are the primary identifiers
    used to match a submitted educational unit
    Directory record to an educational unit Directory
    record in the EDEN database.
  • NCES Identifiers are used as secondary checks
    when a match cannot be determined with the
    primary identifiers.

49
Primary Identifiers
50
Adding/Updating an SEA
51
Adding/Updating an LEA/School
52
Modifying State Identifiers
  • It is not currently possible to modify a states
    ID by submitting an updated Directory file.
  • EDEN will see the modified ID as a new unit and
    add it to the system, resulting in duplicate
    records.
  • Investigation is under way to better support
    updating primary identifiers.

53
06-07 Directory Enhancements
  • Improved implementation of Directory data
    effective dating functionality
  • Effective Dates will be required for all
    non-future education units
  • EDEN will now track the history of certain
    fields Operational Status, Addresses, and
    Contact Information
  • EDEN will prohibit file submissions against
    education units that have an Operational Status
    of Closed, Inactive, or Future Agency

54
File Submission Priorities
  • States should submit the CCD related files first
    to support the CCD-EDEN merger
  • Those files are Directory/Grades Offered,
    Membership, NCLB School Year Start Status, Staff
    FTE, Graduates and Dropouts
  • Next states should submit files that can support
    the CSPR process. 
  • Those files are the Student Performance, Students
    Tested, NCLB end of year Choice, and Supplemental
    Services

55
EDEN Edits, Errors, and Warnings
56
Edits, Errors, and Warnings
  • Format Edits Errors
  • Validation Edits Errors
  • Reasonability Edits Errors

57
Format Edits and Errors
  • Definition transmitted file must be in an
    allowable specified format
  • Purpose If the transmitted file is not in an
    allowable specified format, DRPS cannot interpret
    it
  • Result If the file is acceptably formatted,
    Validation Edits are performed. If not, the file
    is rejected with a format error

58
Format Edits and Errors Format Error Checker
  • PSC has developed an EDEN file format checker.
  • PSC uses this internally.
  • Currently there is only a 2003-05 version.
  • Distributed to states last year
  • The 2005-06 version is in the final testing phase
    and will be released in the near future.

59
Format Edits and Errors Format Error Checker,
contd
  • Format Checker
  • Need Perl interpreter installed
  • Free download of Perl interpreter available
  • Checks for Format Errors
  • Shows Row Number, Record Number and Field Name
  • Allows users to save, copy and paste to print
    results.
  • Currently designed for Non-XML files only
  • Can be edited with any text editor. We encourage
    improvements and sharing amongst other EDEN
    participants
  • This tool is work in progress
  • PSC will send updates

60
Format Edits and Errors Format Error Checker
Output
  • Open the EDEN Format Checker
  • Select the file you wish to check format of
  • Click Go
  • Results are provided
  • Showing Record and Field Name

61
Validation Edits and Errors
  • Definition Identification of invalid data used
    in a field where specific permitted values or a
    data type has been specified
  • Purpose To ensure that the data stored in the
    Data Repository can be interpreted according to
    agreed upon rules
  • Result Any validation error will cause the
    entire file to be rejected.
  • If a file contains no validation errors
  • Any applicable reasonability edits will be
    performed.
  • The submission will be loaded into the Staging
    Database.

62
Validation Edits and ErrorsValidation Error
Thresholds
  • Validation errors will be identified up to a
    threshold
  • Currently 1000 or more errors identified will
    result in the termination of validation edit
    processing

63
Reasonability Edits
  • Definition Identification of submissions where
    questionable data has been submitted
  • Purpose To flag for explanation data that
    changed significantly from one year to the next
  • Result
  • The submission is loaded into the staging
    database
  • Records where the reasonability edits failed must
    be explained
  • File submissions will not be transferred to the
    Data Repository until a reasonable explanation
    has been approved

64
Errors and Edits
  • Questions concerning specific errors?
  • Questions concerning problems or concerns with
    regards to file submissions?

Open Session
65
Recap - Objectives
  • Provided a basic understanding of ESS and its
    sub-applications.
  • Provided an understanding of the data file
    submission lifecycle.
  • Provided an in-depth knowledge of the File
    Specifications and how to read them.
  • Provided information on ESS edits, errors, and
    warnings.
Write a Comment
User Comments (0)
About PowerShow.com