File and Data Conversion - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

File and Data Conversion

Description:

Then the end of the record is padded so that the whole record length is a multiple of 8. ... The lengths of n, a, and label are 8 bytes, 8 bytes, and 17 bytes ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 21
Provided by: mch88
Category:
Tags: conversion | data | file | length

less

Transcript and Presenter's Notes

Title: File and Data Conversion


1
File and Data Conversion
  • Jonathan Carter
  • NERSC User Services
  • jcarter_at_nersc.gov
  • 510-486-7514

2
Introduction
  • Converting file and data for use on the IBM SP
  • IBM uses IEEE data representation
  • Industry standard Fortran unformatted file
    structure
  • Tools available on the Cray systems
  • Tools available on the IBM SP

3
Demand for File Conversion
  • Currently, CTSS text files
  • ctou, rlib will be available on the IBM SP
  • After decommissioning the Cray Systems in October
    2002
  • Cray Fortran unformatted files
  • Cray C binary files

4
Tools on the Cray Systems - FFIO
  • Flexible File I/O - general system of specifying
    how data should be written or read
  • Can be used without recompiling or linking
    (Fortran)
  • Can be changed at runtime
  • Various layers available to convert both file
    structure and data
  • Controlled via the assign command

5
assign Command
  • Can specify how I/O is done
  • On a Fortran unit basis assign F f77 u10
  • On a filename basis assign F f77 ffilename
  • Common options
  • Clear assigns assign -R
  • See current assigns in effect assign -V

6
Fortran Unformatted Sequential-access Files
  • Cray uses a vendor specific format called COS
    blocked, or simply blocked
  • IBM (and most Unix vendors) use f77 blocking
  • Use F f77 option to have the FFIO f77 blocking
    layer used instead of the default COS blocking
  • assign F f77 u10
  • T3E already uses IEEE arithmetic, so F f77 is
    sufficient
  • Note that default real and integer data types on
    the T3E are 64 bit
  • SV1 data needs to be converted, so an IEEE
    conversion layer is needed
  • -N ieee performs basic conversion
  • assign F f77 -N ieee ffilename

7
Fortran Unformatted Direct-access Files
  • Files are not blocked on Cray or IBM
  • Data conversion layers can be used as in
    sequential-access files for the SV1 machines
  • assign -N ieee u20
  • T3E files dont need any conversion

8
C Binary Files
  • Files are not blocked on Cray or IBM
  • FFIO conversion layer not easy to use
  • Use library routines such as cry2cri

9
Using FFIO to Convert a File
  • Isolate I/O statements for the file from program
    to make a simple conversion program
  • Pair each read with a write
  • Use assign to have all written data converted, or
    use data conversion routines

10
Tools on the IBM SP - NCARU Library
  • Library developed by the SCD at NCAR
  • Read COS blocked file
  • Convert Cray data to IEEE data
  • Does not use Fortran API, so program modification
    is required
  • Basic calls are crayopen, crayread, crayrew,
    crayback, crayclose
  • Calls to crayread can convert data if record is
    composed of one data type only, otherwise user
    must handle explicitly
  • Conversion routines are ctodpf, ctospf, ctospi
  • Cray Fortran I/O sometimes inserts padding, user
    must handle explicitly

11
Using the NCARU Library
  • To use
  • module load ncaru
  • xlf -o a.out b.f NCARU
  • Limitations
  • 2GB limit for unblocked files
  • Currently no 64 bit address space support
  • Not thread-safe
  • No support for 128 bit data

12
Dealing with Different Files
  • Open using blocked option to crayopen for Fortran
    unformatted sequential access, open with
    unblocked option for Fortran unformatted direct
    access
  • If written on the SV1 use conversion option on
    read, or call conversion routines directly
  • C binary files can be read by the unblocked I/O
    calls or by usual C I/O followed by data
    conversion routines

13
Records with Mixed Data Types
  • Read into a buffer and convert items one by one

real x(50) integer n(50) real8 buffer(100) !
open in blocked mode ifc crayopen(filename,10,
0) ! read record without converting nwds
crayread(ifc,buffer,100,0) ! convert data call
ctospf(buffer,x,50) call ctospi(buffer(51),n,50)
14
Data Padding
  • With Cray Fortran I/O, extra bytes are inserted
    into the user data.
  • In cases where padding occurs, bytes are inserted
    so that any datum of length 8 bytes is at a byte
    offset, which is measured from the beginning of
    the record, that is a multiple of 8 bytes. Then
    the end of the record is padded so that the whole
    record length is a multiple of 8.
  • Padding will only occur if you have used
    character variables that are not of lengths that
    are a multiple of 8 or have used real4 or
    integer4 data on the T3E (on the SV1 systems, 8
    bytes are used).

15
Example
A Fortran record is written on an SV1
real a(50) integer n(50) character17
label write(50) n, a, label
The lengths of n, a, and label are 8 bytes, 8
bytes, and 17 bytes respectively. Within the
Fortran record, n starts at offset 0, a at offset
400, and label at offset 800. The only padding
that occurs is at the end of the record, where 7
bytes are added to make the total record length
816 bytes, which is a multiple of 8.
16
Example
A Fortran record is written on an SV1
real a(50) integer n(50) character17
label write(50) label, n, a
Without padding, the alignments are label at
offset 0, a at offset 17, and n at offset 417.
Since a has elements of length 8 bytes, it must
be written at an offset that is a multiple of 8
bytes therefore a pad of 7 bytes is inserted
between the end of label and the beginning of a.
In the record that is written to the file, the
alignments are label at offset 0, a at offset 24,
and n at offset 424.
17
Example
A Fortran record is written on the T3E
real a(40), b(40) integer4 n(13),
m(13) character12 label write(50) label, n, a,
m, b
The data has lengths label 12 bytes, n and m 52
bytes, and a and b both 320 bytes. Without
padding, the alignments are label at offset 0, n
at offset 12, a at offset 64, m at offset 384,
and b at offset 436. a and b need to be at
offsets that are a multiple of 8 bytes the
offset of a is already correct, but 4 bytes must
be inserted before b, so that it starts at offset
440.
18
crayconv Utility
  • crayconv automatically converts files written on
    the SV1 to IBM compatible format
  • Basic Fortran data types only
  • Sequential access unformatted files only
  • Possible problem if compiler option -Onofastint
    used, or integer8 explicitly declared and
    written-- Integers over 246 not correctly
    interpreted
  • Pad data not removed
  • Extension to T3E data and direct access
    unformatted files planned

19
More Information
  • http//hpcf.nersc.gov/computers/SP/ffio.html
  • -by Mike Stewart
  • http//hpcf.nersc.gov/computers/crayretire.html
  • man ncaru

20
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com