Facilitating Standardization and Exchange of Array Design - PowerPoint PPT Presentation

About This Presentation
Title:

Facilitating Standardization and Exchange of Array Design

Description:

Checking lists (composite) CompositeSequence. File/Data structure checklist: ... Column order is correct (non mandatory) Data/file content checklist ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 27
Provided by: pier222
Category:

less

Transcript and Presenter's Notes

Title: Facilitating Standardization and Exchange of Array Design


1
Facilitating Standardization and Exchange of
Array Design
  • ADF MAGE-ML Tool

Pierre Marguerite Friday Seminar
15 October 2004
EBI Microarray Informatics Team
2
ADF MAGE-ML Tool
  • Application
  • stand-alone
  • plateform independant
  • Supports
  • Simple/Complex microarray layout
  • Differents microarray applications
  • gene_expression
  • snp_detection
  • comparative_genomic_hybridization
  • binding_site_identification
  • Others (minimal)
  • Respects Good practices

3
conversion tool
4
MAGE-ML (MAGE-OM)
Array
Description
Biosequence
DesignElement
Array Design
DesignElement
5
MAGE-ML (next)
6
ADF (previous)
7
Array Design File
adc
adr
contacts
adh
Header
Technical Information
8
Array Design File
Reporters
Features
adc
adr
Feature /Reporter
9
Array Design File
Composite
Map to reporters
Characteristics
10
ADF version differences
  • 3 parts (files) instead of 1
  • As Workbook or text files
  • No Reporter Identifier item
  • No Reporter Group role item
  • New Chromosome item
  • New Chromosome_band item
  • New Species item

11
  • 2 mandatory steps
  • Validation
  • Conversion

12
Validation
  • File format validation
  • File content validation
  • Validation of controlled vocabulary
  • MGED ontology terms
  • Approved Databases (Tags, Accession numbers)
  • Automatic curation (when possible)

13
Validation
  • two levels of checking
  • Relaxed
  • Strict
  • two execution modes
  • A complete mode
  • A step-by-step mode
  • Error Log for correction

14
Checking lists (header)
  • File/Data structure checklist
  • Header file is a tab-delimited-file
  • Item names are correct or can be identified
  • if an item is not identified, it is skipped.
  • All mandatory items are present in the header
  • Data/file content checklist
  • Correct field value format
  • Possible value types
  • "Integer"
  • "Free Text"
  • "Controlled vocabulary"
  • "MGED ontology term"
  • "DatabaseEntry"
  • "Sequence"
  • "Species"
  • Check single multiple value

15
Checking lists (feature reporter)
  • Feature Reporter file
  • File/Data structure checklist
  • Header File is correct (structure and data )
  • FeatureReporter file is a tab-delimited-file
  • Header item names are correct (unknown items are
    skipped)
  • All mandatory items are present. item
    cardinalities and dependences are correct.
  • Database tags are approved and database
    accession numbers are correct
  • Item order is correct (Optional, do not fail the
    checking)
  • Field dependences are correct
  • Data/file content checklist
  • FeatureReporter file structure must be correct
  • Mandatory Field are present. Field cardinalities
    and field value multiplicities must be correct.
  • Field values are in a mandatory format
  • Database tags are approved by ArrayExpress and
    are supplied in lower caseand between square
    brackets
  • Database ID are correct
  • Ontology terms are correct (MGED ontology)
  • Sequences are correct following the associated
    polymer type (DNA, RNA, protein)
  • Integer field values are correct
  • Duplicate features must not exist

16
Checking lists (composite)
  • CompositeSequence
  • File/Data structure checklist
  • Feature Reporter file must be correct (structure
    and data)
  • CompositeSequence file is a tab-delimited-file
  • Header item names are correct. (Unknown items
    are skipped)
  • All mandatory items are present. Header item
    cardinalities and dependences are correct
  • Column order is correct (non mandatory)
  • Data/file content checklist
  • Composite file structure must be correct
  • All mandatory fields are present. Field
    cardinalities are correct
  • Field values are in expected format. Field
    multiplicity is correct (same as
    Feature/Reporter)
  • Names in map are reporter or composite sequence
    names
  • No duplicate CompositeSequences (same names)

17
Checking lists
  • Header item names are correct
  • All mandatory items are present
  • All mandatory fields are present.
  • No Duplicate features
  • Duplicate Reporter (equal names) must have the
    characteristics.
  • No duplicate CompositeSequences (same names)
  • Names in map are reporter or composite sequence
    names

18
(No Transcript)
19
(No Transcript)
20
MGED Ontology / DAMLOIL
21
Approved Databases
22
User modes
23
Implementation - technical choices
  • -MAGE-stk
  • JaxB
  • Configuration (default parameters)

Performance 4000 features 10 minutes
24
Installer - izpack
http//www.izforge.com/izpack/
25
http//www.ebi.ac.uk/adf
http//www.ebi.ac.uk/adf/
26
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com