Title: Biomolecular Sequence Analysis RFP Revised Submission Preliminary Presentation
1Biomolecular Sequence Analysis RFP Revised
Submission(Preliminary Presentation)
- Ira Baron - Millennium
- Steve Chervitz - Neomorphic
- Scott Markel - NetGenics
- 22 March 1999
- (lifesci/99-03-08)
2Outline
- Submitters Supporters
- Acknowledgements
- Design Comments
- Bio-objects
- Analysis Machinery
- Open Issues
- Suggested RFPs
- Request for Feedback
3Submitters Supporters
- Submitters
- Concept 5
- EBI
- GenomIx
- Millennium
- Neomorphic
- NetGenics
- Oxford Molecular
4Acknowledgements
- Ira Baron (Millennium)
- Ewan Birney (Sanger Centre)
- Steve Chervitz (Neomorphic)
- Michele Clamp (Sanger Centre)
- Tim Clark (Millennium)
- Mike Dickson (NetGenics)
- Guy Evans (Oxford Molecular)
- Karl Konnerth (Incyte)
- Philip Lijnzaad (EBI)
- Scott Markel (NetGenics)
- Mark Morwood (Millennium)
- Eric Neumann (NetGenics)
- Michael Petrin (Concept 5)
- Keith Robison (Millennium)
- Martin Senger (EBI)
- Ron Zahavi (Concept 5)
- Manfred Zorn (GenomIx)
5Design Comments
- object immutability
- IDL interfaces dont support mutation of
bio-objects - valuetype
- chosen over structs for extensibility
- valuetypes are restricted to data members
only(no methods)
6Design Comments (contd)
- design patterns
- composite pattern
- used for segments on sequences
- allows multiple intervals to be grouped
- list/iterator hybrid
- pattern used in CosPropertyService
list method(in unsigned long how_many,
out iterator remainder)
7Bio-objects
- BioSequence
- NucleicAcidSequence
- AminoAcidSequence
- Regions and annotations on sequences
- Interval, SeqRegion
- Annotation, SeqAnnotation
- Genetic codes
- Sequence alignments
- HitList, DataSource
8(No Transcript)
9BioSequence
enum Basis NOT_KNOWN, EXPERIMENTAL,
PREDICTED, BOTH enum SequenceType DNA, RNA,
AA interface BioSequence readonly
attribute string name readonly
attribute string description
readonly attribute string seq
readonly attribute unsigned long length
readonly attribute Basis basis
string seq_interval(in
Interval interval) SeqAnnotationList
get_seq_annotations (in unsigned long how_many,
out
SeqAnnotationIterator the_rest) unsigned
long num_annotations ()
10(No Transcript)
11NucleicAcidSequence, AminoAcidSequence
interface NucleicAcidSequence BioSequence
boolean is_circular ()
string reverse_complement ()
string reverse_complement_interval (in
Interval interval) string
translate_seq(in short reading_frame) string
translate_seq_region(in SeqRegion
sr) interface AminoAcidSequence
BioSequence
12Sequence Factories Iterators (Optional)
interface BioSequenceFactory
BioSequence create_sequence(in string name,
in string
description, in
string residues,
in SeqAnnotationList annots)
raises (InvalidResidue) interface
BioSequenceIterator boolean next
(out BioSequence seq) boolean next_n
(in unsigned long how_many,
out BioSequenceList seqs) void
reset () void destroy ()
(also have typed factories and iterators for NAS
and AAS)
13FuzzyRegionType, SeqRegionOperator
enum FuzzyRegionType EXACT, //
Region is not fuzzy. IN, //
Single point in the range offset, offset
length . BETWEEN, // Single point
between adjacent positions defined by
// offset, offset length (e.g., a
splice site). PLUSMINUS, // single point
in the range defined by //
offset - length, offset length .
LESS_THAN, // Region is lt offset length is
unused. GREATER_THAN, // Region is gt
offset length is unused. enum
SeqRegionOperator NOT_APPLICABLE, //
Region has no sub regions or the sub regions
// don't need special treatment.
JOIN, // Sub regions should be joined
end-to-end to // form a
contiguous region. ORDER, // Sub
region order is important. UNEQUAL //
Region is everything except the indicated
region(s).
14(No Transcript)
15Interval, SeqRegion
enum StrandType NOT_KNOWN, NOT_APPLICABLE,
PLUS, MINUS, BOTH valuetype Interval
unsigned long offset long
length valuetype SeqRegion Interval
SeqRegionList sub_regions
StrandType strand_type
FuzzyRegionType fuzzy_type
SeqRegionOperator region_operator
16Annotation, SeqAnnotation
interface Annotation attribute string
name // type of annotation
attribute any value // the
annotation attribute string
id // unique identifier for tracking
Basis basis // basis
for annotation CosPropertyServiceProperties
qualifiers // alternatively, Tagged data
interface SeqAnnotation Annotation,
CosObjectIdentityIdentifiableObject
attribute SeqRegion region
SeqAnnotationList get_sub_annotations(in
how_many, out
SeqAnnotationIterator the_rest) void
add_sub_annotation (in SeqAnnotation
annot) void
remove_sub_annotation (in SeqAnnotation annot)
long num_sub_annotations ()
17SeqAnnotationIterator
interface SeqAnnotationIterator boolean
next (out SeqAnnotation region)
boolean next_n (in unsigned long
how_many, out
SeqAnnotationList regions) void
remove_current () void reset ()
void destroy ()
18GeneticCode
typedef char Residue typedef char
Base typedef Base Codon3
valuetype CodeRule Codon codon
Residue residue typedef CodeRule
Coding64 typedef string GeneticCodeName
19GeneticCode (contd)
interface GeneticCode const
GeneticCodeName STANDARD
"standard" const GeneticCodeName BACTERIAL
"bacterial" const
GeneticCodeName YEAST_MITOCHONDRIAL
"yeast mitochondrial" others are
VERTEBRATE_MITOCHONDRIAL, MOLD_MITOCHONDRIAL,
INVERTEBRATE_MITOCHONDRIAL,
ECHINODERM_MITOCHONDRIAL,
ASCIDIAN_MITOCHONDRIAL, FLATWORM_MITOCHONDRIAL
, CILIATE_NUCLEAR,
EUPLOTID_NUCLEAR,
ALT_YEAST_NUCLEAR, BLEPHARISMA_MACRONUCLE
AR readonly attribute Coding
coding readonly attribute GeneticCodeName
name Residue translate_codon(in
Codon codon) string
translate_seq( in NucleicAcidSequence,
in short
reading_frame) string
translate_seq_region(
in NucleicAcidSequence,
in SeqRegion sr)
20(No Transcript)
21AlignmentElement, AlignmentElementIterator
interface AlignmentElement CosObjectIdentity
IdentifiableObject readonly attribute
Object element readonly attribute SeqRegion
conservation interface
AlignmentElementIterator boolean next(out
AlignmentElement elt) boolean next_n(in
unsigned long how_many, out AlignmentElementList
elts) void reset() void destroy()
22Alignment
interface Alignment typedef string
AlignType typedef sequence lt AlignType gt
AlignTypeList const AlignType PROTEIN
"PROTEIN" const AlignType NON_PROTEIN
"NON_PROTEIN" const AlignType SEQUENCE_ERROR
"SEQUENCE_ERROR" AlignmentElementList
get_aligned_elements() AlignmentElementList
get_aligned_elements_by_interval(in Interval
interval)
raises (InvalidInterval)
23Alignment (contd)
unsigned long number_of_columns()
SeqRegion get_seq_region_by_column (
in AlignmentElement elt,
in unsigned long column)
raises ( ObjectNotInAlignment )
SeqRegion get_seq_region_by_interval (
in AlignmentElement elt,
in Interval interval)
raises ( ObjectNotInAlignment,
InvalidInterval ) AlignTypeList
get_align_type_by_column (in unsigned long
col)
24AlignmentEncoder (Optional)
interface AlignmentEncoder CosLifeCycleLifeCy
cleObject readonly attribute Alignment
alignment unsigned long number_of_objects(vo
id) // number of aligned objects. Delegate
unsigned long number_of_columns(void) //
Delegate to Alignment string get_name(in
row) // first object is in row one etc...
stringList get_all_names(void) // all the
Names string get_cell_contents(in unsigned
long row,in unsigned long col) unsigned long
max_width_column(in unsigned long col)
interface SimpleAlignmentEncoder
AlignmentEncoder typedef sequence lt
string gt stringList string get_row(in
unsigned long row) string
get_row_Interval(in unsigned long row, in
Internval interval) stringList
get_row_column_Interval(in Interval row, in
Interval col) stringList get_entire_alignmen
t(void) // probably the most common!
25Analysis Machinery
- AnalysisType
- AnalysisState, AnalysisEvent
- AnalysisService
- AnalysisInstance, JobControl
- InputPropertySpec, OutputPropertySpec
- Envelope
- InputEnvelope
- OutputEnvelope
26(No Transcript)
27(No Transcript)
28AnalysisType
valuetype AnalysisType string type
string name string supplier string
version string installation string
description
29AnalysisState
enum AnalysisState new, // The
first state. created, // Instance has been
created but not yet executed. running, //
The analysis intance is running. completed,
// The instance has completed execution.
terminated, // The instance was terminated by
user request. error // The instance
terminated due to an error.
30AnalyisEvent, StateChangedEvent
valuetype AnalysisEvent string
message / StateChangedEvents are
generated whenever the AnalysisInstance
changes state. / valuetype
StateChangedEvent AnalysisEvent
AnalysisState previous_state AnalysisState
new_state
31ProgressEvents
valuetype HeartbeatProgressEvent
AnalysisEvent valuetype
PercentProgressEvent AnalysisEvent long
percentage // percent complete long
timeRemaining // estimate of total time
remainging in secs valuetype
StepProgressEvent AnalysisEvent long
totalSteps long stepsCompleted
32AnalysisService
interface AnalysisService readonly
attribute AnalysisType type readonly
attribute InputPropertySpecList input_metadata
readonly attribute OutputPropertySpecList
output_metadata AnalysisInstance
create_analysis (in CosPropertyServiceProperties
input) raises (CosPropertyServiceMultiple
Exceptions)
33AnalysisInstance, JobControl
interface AnalysisInstance CosLifeCycleLifeC
ycleObject readonly attribute
AnalysisService service readonly attribute
AnalysisState status readonly attribute
CosEventChannelAdminEventChannel
event_channel readonly attribute
AnalysisEvent last_event readonly attribute
InputEnvelope input_parameters readonly
attribute OutputEnvelope results readonly
attribute JobControl job_control
interface JobControl readonly attribute
TimeBaseUtcT created readonly attribute
TimeBaseUtcT elapsed readonly attribute
TimeBaseUtcT started readonly attribute
TimeBaseUtcT ended void run() raises
(NotRunnable) void terminate() raises
(NotRunning) void wait()
34InputPropertySpec, OutputPropertySpec
valuetype InputPropertySpec string
name CORBATypeCode type boolean
mandatory any default_value any
possible_values valuetype
OutputPropertySpec string name
CORBATypeCode type
35Envelope, InputEnvelope, OutputEnvelope
interface Envelope CosPropertyServicePro
pertySet, CosLifeCycleLifeCycleObject
readonly attribute AnalysisInstance instance
interface InputEnvelope Envelope
boolean is_completed() raises
(MultipleExceptions) interface
OutputEnvelope Envelope
36Open Issues
- what level of interoperability is needed?
- how much domain semantics in analysis machinery
interfaces? - need for a controlled vocabulary in IDL or not?
- bio-objects issues
- desire for a simple synchronous call
- Envelopes and life cycle issues
- Envelopes independent of AnalysisInstance?
- life cycle management of Envelope components
37Open Issues bio-objects
- two-tree approach
- SeqRegion
- SeqAnnotation
- should specializations of SeqAnnotation be
defined? - Gene
- Exon
- ...
38(No Transcript)
39Suggested RFPs
- complete meta-data model
- distributed mutation
- computation management
- annotations
- additional bio-objects
- sequence assembly
- HMMs, profiles
40Request for Feedback
- talk to the submitters this week
- email us at
- lsr-bsa_at_netgenics.com