Title: Sequence Alignment Algorithms
1Sequence Alignment Algorithms Application to
Bioinformatics Tool Development
- Dr. S. Parthasarathy
- Reader and Head
- Department of Bioinformatics
- Bharathidasan University
- Tiruchirappalli 620 024
- (E-mail partha_at_cnld.bdu.ac.in)
2Plan
- Introduction to Bioinformatics
- Sequence alignment algorithms
- Global alignment Needleman - Wunsch algorithm
- Local alignment Smith Waterman algorithm
- Predict Fold to a protein
sequence - Methodology
- Algorithm, Coding Tool Development
- Benchmarking
- Conclusions
PredictFold
3Introduction
- Why do we need Bioinformatics?
- What is Bioinformatics?
- Where is Bioinformatics used?
4Why?
- Biological Data Explosion
- How did Biological Data Explosion happen?
- Sequence Databases are HUGE than the Structure
Databases - Why so?
5Introduction Biological Data Genome Projects
- Latest Revolution
- On 26 June, 2000 - Announcement of completion of
the draft of the Human Genome - Genetic Code of Human Life is Cracked by
Scientists - Human Genome contains 3.2 x 109 bps
- Unit of (Genome) sequence length
- bps (base pairs)
- Mbps (Mega base pairs) 106 bps
- Gbps (Giga base pairs) 109 bps
- huge (human genome equivalent) 3.2 Gbps
- Unit of Genetic distance
- centiMorgan (cM) - arbitrary unit Named for
Thomas Hunt Morgan - (e.g. 1 cM 0.01 recombinant frequency)
6Introduction Biological Data Genome Projects
16 February 2001
15 February 2001
7Biological Data Recombinant DNA Technology
- Old Revolution
- 1940 Role of DNA as the genetic material was
confirmed - 1953 Discovery of DNA structure by James Watson
Francis Crick - 1966 Establishment of the Genetic Code
- 1967 DNA ligase was isolated (join two
strands of DNA together) - Molecular Glue
- 1970 Isolation of Restriction enzyme
Molecular Scissors - 1972 Recombinant DNA molecules were generated
at Stanford - University, USA
- 1973 Joining DNA fragments to the plasmid
pSC101 isolated from - E.Coli. They could replicate when
introduced into E.Coli. - The discoveries of 1972 1973 triggered off
the biggest scientific revolution Genetic
Engineering
8Biological Data explosion
- GenBank, NCBI, USA
- 44 Gbps of DNA 40 Million Sequences (upto
2004) - GenBank, National Center for Biotechnology
Information, USA - Protein Data Bank (PDB), RCSB, USA
- 29,000 structures (2004)
- PDB, Research Collaboratory for Structural
Bioinformatics, USA - QUALITY of Data - HIGH
- Experimental error in modern genomic sequencing
is extremely low - QUANTITY of Data - HUGE
- With Recombinant DNA technology genomic
sequencing, size of sequence data bases is
increasing very rapidly - SEQUENCE Versus STRUCTURE Databases
- Sequence Databases are HUGE than Structure
Databases - Leads to Bioinformatics
9What?
- What is Bioinformatics?
- Define Bioinformatics
10Bioinformatics - Definition
F(i,j) max F(i-1, j-1)s(xi,yj), F(i-1, j)
d, F(i, j-1) d.
Bioinformat ics
atcggcatgcatcagtcatgcaactg
PEPTIDESE QSEDITPEP
Bioinformatics is an integration of mathematical,
statistical and computer methods to analyze
biological data. We use computer programs to
make inference from the biological data, to make
connections among them and to derive useful and
interesting predictions. The marriage of biology
and computer science has created a new field
called Bioinformatics. - Arthur M. Lesk
11Biology Basic Definitions
- Cell - It is the building block of living
organisms - Eukaryotic Cells or organisms have the nucleus
separated from the cytoplasm by a nuclear
membrane and the genetic material borne on a
number of chromosomes consisting of DNA and
Protein - Chromosome
- The physical basis of heredity. Deeply staining
- rod-like structures present with the nuclei of
eukaryotes - Contains DNA and protein arranged in compact
manner - Replicate identically during cell division
- Same number of chromosomes present in cells of a
particular species (e.g. Human 22, X and Y)
12GenomeBasic Definitions
- Genome
- A complete set of chromosomes inherited from one
parent - Gene
- One of the units of inherited material carried on
by chromosomes. They are arranged in a linear
fashion on DNAs. Each represents one character,
which is recognized by its effect on the
individual bearing the gene in its cells. There
are many thousand genes in each nucleus. - DNA (Deoxyribo Nucleic Acid)
- DNA is made up of FOUR bases
- a t g c adenine, thymine, guanine,
cytosine - Protein
- Protein is made up of TWENTY different amino
acids - A T G C ... Alanine, Threonine, Glycine,
Cysteine,
13Central Dogma
CCTGAGCCAACTATTGATGAA
CCUGAGCCAACUAUUGAUGAA
PEPTIDE
14Genome DataHuman Model Organisms
- Most mapping and sequencing technologies were
developed from studies of simpler non-human
organisms - Non-Human/Model organisms
- Bacterium Escherichia Coli - 4.6 Mbp
- Yeast Saccharomyces Cerevisiae - 12.1 Mbp
- Fruit Fly Drosophila melanogaster - 180.0 Mbp
- Roundworm C. elegans - 95.5 Mbp
- Laboratory Mouse Mus musculus - 3.0 Gbp
- Human more complex genome
- Human Homo sapiens - 3.2 Gbp
15Genome DataHuman (Homo Sapiens)
- Genome 1
- Chromosomes 23
- Genes / DNAs 30,000
- Nucleotides 3.2 x 109 bps
16Bioinformatics in Genome Research
- Data Collection and Interpretation
- Collecting and Storing Data
- Sequence generated by genome research will be
used as primary information source for human
biology and medicine - The vast amount of data produced will first need
to be collected, stored and distributed - Interpretation of Data
- Recognizing where genes begin and end
- Searching a database for a particular DNA
sequence may uncover these homologous sequences
in a known gene from a model organism, revealing
insights into the function of the corresponding
human gene
17Understanding Gene Function
- Correct protein function depends on the 3-D or
folded structure the protein assumes in
biological environments - Understanding protein structure will be essential
in determining gene function
Gene Protein Function
Structure
18Where?
- Where is Bioinformatics used?
- What are the uses of Bioinformatics?
- Applications of Bioinformatics
19Bioinformatics Tasks
- Sequence Analysis (Protein sequences)
- Similarity Homology
- pairwise local/global alignment
- GCG Seqlab Seqweb
- Scoring Matrices - PAM, BLOSUM
- Database Search
- BLAST, FASTA
- Multiple alignment
- ClustalW, PRINTS, BLOCKS
- Secondary Structure Prediction (from Sequence)
- Proteins ?-Helix, ß-Sheet, Turn or coil
- Protein Folding
20Bioinformatics Tasks
- Structure analysis Experimental Determination
- X-ray crystallography 3 dimensional coordinates
Structure - Nuclear Magnetic Resonance (NMR)
- PDB Protein Data Bank
- RasMol Molecular Viewing Software
- High-throughput crystallographic structure
determination - High flux synchrotron radiation sources (data
collection) - Multiple anomalous diffraction method (data
interpretation) - Bioinformatics - Structure Prediction
- Homology Modelling InsightII, SwissPDBViewer,
Biosuite - ab initio method - Monte Carlo Simulation
- Protein Structure Classification
- SCOP - Structural Classification Of Proteins
- CATH - Class, Architecture, Topology, Homologous
superfamily - FSSP - Fold Classification based on Structure-
Structure alignment - of Proteins obtained by DALI
(Distance-matrix - ALIgnment)
21Bioinformatics Tasks
- Protein Engineering
- Mutations
- Alter particular amino acid/base for desired
effect - Site directed mutagenesis
- Identify the potential sites where we can do
alterations - Applications
- Agricultural Genetically Modified Plants,
Vegetables, GM Food - Pharmaceutical Molecular Modelling base Drug
Design - Medical Gene Therapy
- DNA Bending
- Application to Genomes
- (Ref M.G.Munteanu, K.Vlahovicek,
S.Parthasarathy, I.Simon and S.Pongor, Rod Models
of DNA Sequence-dependent anisotropic elastic
modelling of local phenomena, Trends in
Biochemical Sciences, 23 (1998) 341-347)
22Bioinformatics TasksGenomics Proteomics
- Genomics is the study of the structure, content,
evolution and functions of genes in genomes - Aims of Genomics
- To establish an integrated web based database and
research interface - To assemble Physical,Genetic and Cytological maps
of the Genome - To identify and annotate the complete set of
genes encoded within a genome - To provide the resources for comparison with
other genomes
23Proteomics Proteome
- Proteome is the complete collection of proteins
in a cell/tissue/organism at a particular time.
Unlike genomes, which are stable over the life
time of the organism, proteomes change rapidly as
each cell response to its changing environment
and produces new proteins and at different
amounts. - Genome is a more stable entity. An organism has
only one genome but many proteomes. - For an organism, there may be
- one body wide proteome,
- about 200 tissue proteomes
- about a trillion (1012) individual cell
proteomes.
24Proteomics Definition
- The study of proteomes that includes determining
the 3D shapes of proteins, their roles inside
cells, the molecules with which they interact,
and defining which proteins are present and how
much of each is present at a given time.
25Proteomics Applications
- To correlate proteins on the basis of their
expression profiles. - To observe patterns in protein synthesis and this
observed pattern changes can be used as an
indicator of the state of cell and its gene
expression. - To characterize bacterial pathogens and to
develop novel antimicrobials. - To identify regions of the bacterial genome that
encode pathogenic determinants. - To develop drugs and in toxicology Structural
Proteomics - Proteomics as a tool for plant genetics and
breeding
26Systems Biology
- Systems Biology is a new perspective and emerging
field for research in the post-genomic era. - It aims at system level understanding of
biological systems. - It studies whole cells/tissues/organisms not by a
traditional reductionists approach but by
holistic means in a reiterative attempt to model
the complete cell/tissue/organism. - It is an integrated and interacting network of
genes, proteins and biochemical reactions which
give rise to life.
27Systems Biology
28Sequence Alignment Algorithms
- Similarity and Homology
- Sequence Comparison - Issues
- Types of alignments
- Algorithms Used
29Sequence similarity and homology
- Nature is a tinkerer and not an inventor. New
sequences are adapted from pre-existing sequences
rather than invented de novo . There exists
significant similarity between a new sequence and
already known sequences. Fortunate for
computational sequence analysis - Similarity Measurement of resemblance and
differences, independent of the source of
resemblance. - Homology The sequences and the organisms in
which they occur are descended from a common
ancestor. - If two related sequences are homologous, then we
can transfer information about structure and/or
function, by homology.
303-D Structure and Homology
- 3-D structure patterns (motifs) of proteins are
much more evolutionarily conserved than amino
acid sequences - This type of Homology search
could prove more fruitful - Particular motifs may serve similar functions in
several different proteins, information that
would be valuable in genome analysis - Only a few protein motifs can be recognised at
the sequence level - Development of more analytic capabilities to
facilitate grouping protein sequences into motif
families will make homology searches more useful
31Sequence ComparisonIssues
- Types of alignment
- Global end to end matching
(Needleman-Wunsch) - Local portions or subsequences matching
(Smith-Waterman) - Scoring system used to rank alignments
- PAM BLOSUM matrices
- Algorithms used to find optimal (or good) scoring
alignments - Heuristic
- Dynamic Programming
- Hidden Markov Model (HMM)
- Statistical methods used to evaluate the
significance of an alignment score - Z- score, P- value and E- value
32Substitution Matrices
- PAM (Point Accepted Mutation)
- BLOSUM (BLOcks SUbstitution Matrix)
40
90
Close
62
Default
250
Distant
500
30
33Types of Algorithms
- Heuristic
- A heuristic is an algorithm that will yield
reasonable results, even if it is not provably
optimal or lacks even a performance guarantee. - In most cases, heuristic methods can be very
fast, but they make additional assumptions and
will miss the best match for some sequence pairs. - Dynamic Programming
- The algorithm for finding optimal alignments
given an additive alignment score dynamically - (We are going to discuss about it soon.)
- These type of algorithms are guaranteed to
find the optimal scoring alignment or set of
alignments. - HMM - Based on Probability Theory very
versatile.
34Global AlignmentNeedleman-Wunsch Algorithm
- Formula
- F(i-1,j-1)
s(xi,yj) D - F(i, j) max F(i-1 , j) - d
H - F(i , j-1) - d
V
F(i-1,j-1) D F(i,j-1) V
F(i-1,j) H F(i,j)
35Global AlignmentNeedleman-Wunsch Algorithm
- Gap penalties
- Linear score f(g) - gd
- Affine score f(g) - d (g-1) e
- d gap open penalty e gap extend penalty
- g gap length
- Trace back
- Take the value in the bottom right corner and
trace back till the end. (i.e. align end end
always). - Algorithm complexity
- It takes O(nm) time and O(nm) memory, where n and
m are the lengths of the sequences.
36Local AlignmentSmith-Waterman Algorithm
- Same as Global alignment algorithm with
- TWO differences.
- F(i,j) to take 0 (zero), if all other options
have value less than 0. - Alignment can end anywhere in the matrix.
- Take the highest value of F(i,j) over the whole
- matrix and start trace back from there.
37Local AlignmentSmith-Waterman Algorithm
- Formula
- F(i-1,j-1) S(xi,yj)
D - F(i, j) max F(i-1 , j) - d
H - F(i , j-1) - d
V - 0 (if all other
value is lt 0)
F(i-1,j-1) D V F(i,j-1)
F(i-1,j) H F(i,j)
38Web based server development
- Design the web page to get the data
- Use cgi-bin or Perl script to parse the submitted
data - Invoke the corresponding program to get the
appropriate results - Send the results either by e-mail or to the web
page directly
39Application to Bioinformatics Tool Development
- To predict a fold to protein sequence
PredictFold
40To predict a fold to protein sequence
PredictFold
- To predict possible folds for a given protein
sequence, whose structure is not known - To develop a fold recognition technique / tool
that is sensitive in detecting folds of given
protein sequences in the twilight zone (sequences
sharing less than 25 identity) - Application of the fold recognition strategy to
genomic annotation
41Twilight Zone sequencesExampleCytochrome
Sequences
- 256b
- gt256BA CYTOCHROME B562 (OXIDIZED) - CHAIN A
ADLEDNMETLNDNLKVIEKADNAAQVKDALTKMRAAALDAQKATPPKLED
KSPDSPEMKD FRHGFDILVGQIDDALKLANEGKVKEAQAAAEQLKTTRN
AYHQKYR - gt256BB CYTOCHROME B562 (OXIDIZED) - CHAIN B
ADLEDNMETLNDNLKVIEKADNAAQVKDALTKMRAAALDAQKATPPKLED
KSPDSPEMKD FRHGFDILVGQIDDALKLANEGKVKEAQAAAEQLKTTRN
AYHQKYR - 2ccy
- gt2CCYA CYTOCHROME C(PRIME) - CHAIN A
QQSKPEDLLKLRQGLMQTLKSQWVPIAGFAAGKADLPADAAQRAENMAMV
AKLAPIGWAK GTEALPNGETKPEAFGSKSAEFLEGWKALATESTKLAAA
AKAGPDALKAQAAATGKVCKA CHEEFKQD - gt2CCYB CYTOCHROME C(PRIME) - CHAIN B
QQSKPEDLLKLRQGLMQTLKSQWVPIAGFAAGKADLPADAAQRAENMAMV
AKLAPIGWAK GTEALPNGETKPEAFGSKSAEFLEGWKALATESTKLAAA
AKAGPDALKAQAAATGKVCKA CHEEFKQD
42ExampleSequences similarity
- lalign output
- for
- 256b 2ccy
- follows
43ExampleCytochrome Structures
256b
CYTOCHROME STRUCTURES (seq. similarity 24)
2ccy
44Goals
- Exploration of suitable fold recognition
techniques that are sensitive in detecting
similar folds despite low sequence similarity - Identification of functional motifs in proteins
at sequence (1D) and structure (3D) level - Development of a protocol that aid in the rapid
classification and annotation of genomic data
based on functional motifs
45Methodology
- Reduction of 3D-structure to 1D-environment
string. Environment at each residue position is a
function of local secondary structure and extent
of exposure to the solvent (based on 3D-1D
profile method developed by Eisenberg et al.,
1991). - Extract residue environment profiles of the
available protein structures. - A scoring matrix is generated from a library of
profiles. Each matrix element is the information
value of a residue in the given environment. - A library of environment strings is created for
the available protein fold structures. - The probe sequence is queried against this
library to look for best matches.
46Workflow
47Residue Environments
_Helix
Partially buried
_Exposed
_Coil
Strand_
Buried_
48Residue Environments
- The residue environments are described by
- the area (A) of the residue buried in the protein
- the fraction (f) of side-chain area that is
covered by polar atoms (O and N) - the local secondary structure
49Residue Environments
- CLASS Area (A) Ã…2 FRACTION (f)
-
- BURIED 1 (B1) A gt 114 f lt 0.45
- BURIED 2 (B2) 0.45 lt f lt 0.58
- BURIED 3 (B3) f gt 0.58
-
- PARTIAL 1 (P1) 40 lt A lt 114 f lt 0.67
- PARTIAL 2 (P2) f gt 0.67
-
- EXPOSED (E0) A lt 40 f gt 0.67
50Residue Environment classes
- We have 6 classes based on the extend of exposure
to solvent - We have 3 classes based on secondary structure
Alpha Helix(A), Beta Sheet (B) Coil(C) - Total 6 x 3 18 environments
- B1A,B1B,B1C, B2A,B2B,B2C, B3A,B3B,B3C
P1A,P1B,P1C, P2A,P2B,P2C, E0A,E0B,E0C. - For example
- B1A - Buried 1 Alpha Helix
- P2B - Partially Buried 2 Beta Sheet
- E0C - Exposed 0 Coil
51Scoring Table
- The scoring table used in this case is a 20 x 18
matrix, constructed from a statistical analysis
of the profile library (consisting of 1200
protein structures) provided by PROFILES_3D
module of Insight II (Accelrys Inc.) - The scores Sij are calculated using the formula
- Sij ln P(i j) / Pi x 100
- where P(i j) is the probability of
finding residue i in the environment j and Pi
is the overall probability of finding residue i
in any environment.
52Scoring Table
- The scoring table contains measure of the
compatibility of the 20 amino acids with the 18
environmental classes. - The individual matrix elements are propensities
(information values) for the amino acid residues.
53Scoring Table
54Fold Library
1565 Functional forms
Scan PDB to identify all the structures having
these folds
Identify a representative structure with
resolution 2.5Ã… or better
Quality of the structure (Occupancy, R-Factor,
Stereochemistry)
968 Chains
55DALI / FSSP Fold Library
- DALI http//www.ebi.ac.uk/dali
- Touring protein fold space with DALI/FSSP. Lisa
Holm and Chris Sander, Nucleic Acid Research,
(1998), 26, 316-319 - Mapping the Protein Universe, Lisa Holm and
Chris Sander, Science, (1996), 273, 595-602
56Sequence ComparisonDetails
- Type of Alignment
- Local - portions or subsequences matching
- Smith-Waterman Algorithm
- Scoring Table 3D-1D matrix
- Algorithm used Dynamic Programming
- Alignment Score Z- Score
57Local AlignmentSmith-Waterman Algorithm
- Formula
- F(i-1,j-1) S(xi,yj)
D - F(i, j) max F(i-1 , j) - d
H - F(i , j-1) - d
V - 0 (if all other
value is lt 0)
F(i-1,j-1) D V F(i,j-1)
F(i-1,j) H F(i,j)
58Gap Penalties
- Gap penalties
- Linear score f(g) - gd
- Affine score f(g) - d (g-1) e
- d gap open penalty e gap extend penalty
- g gap length
- Gap penalty values used are
- d 500
- e 50
59Local Alignment
- Trace back
- Alignment can end anywhere in the matrix
- Take the highest value of F(i,j) over the whole
matrix and start trace back from there. - Algorithm complexity
- It takes O(nm) time and O(nm) memory, where n and
m are the lengths of the sequences.
60Significance of an Alignment Score
- Statistical methods used to evaluate the
significance of an alignment score - Z-score, P-value and E-value
- Significance of Score
- Z- score (score mean)/std. dev
- Measures how unusual our original match is.
- Z ? 5 are significant.
- P- value measures probability that the alignment
is no better than random. (Z and P depends on the
distribution of the scores) - P ? 10-100 exact match.
- E- value is the expected number of sequences that
give the same Z- score or better. (E P x size
of the database) - E ? 0.02 sequences probably homologous
61Benchmarking
- All 968 proteins in the fold library were
profiled on each of the other members - A histogram indicating the rank and the number of
sequences which got the self score as the
highest, is shown in Figure.
62Benchmarking
63Benchmarking
- Report
- 797 retain the self as the highest score
- 63 report the self to have the second highest
score - There were about 100 proteins that have ranks
between 5 and 100. - Limitations
- Prediction is restricted to the 968 folds in the
library - The algorithm is insensitive to partially folded
sequences - Specific to globular proteins and not for
membrane proteins - Sequences that fold in the presence of cofactors
and ligands are not accounted for
64Web based server development
- Design the web page to get the data
- Use cgi-bin or Perl script to parse the submitted
data - Invoke the corresponding program to get the
appropriate results - Send the results either by e-mail or to the web
page directly - Prepare a user manual to describe the salient
features of the server
65Conclusions
- PredictFold A program to predict possible folds
for a new protein sequence based on the 3D-1D
profile method - Benchmarking results show the reliability of the
method - There are lot of scopes for further improvements
66Future Directions
- To update the fold library by including more
known folds - To use the predicted secondary structure
information of the given sequence also - To optimise the source code for efficient
handling of genome sequences, automatically - To combine results from other algorithms ORF,
HMM, etc. to detect remote homologs - To develop maintain a web-based sever for fold
recognition
67BT versus IT
- Bioinformatics including Biotechnology (BT)
requires lot of Information Technology (IT)
skills for Genomic annotation projects - Bioinformatics is one of the potential areas for
IT professionals also - Genome Projects will be the next huge task for IT
industries (like the Y2K problem in the past) - BT will take on IT soon in the near future
68Conclusions
- Developing Web based Bioinformatics tools
- Develop/modify useful algorithms
- Generate computer source codes
- Create/Maintain Web based server
- Using existing Web based tools efficiently
- Ethical issues
- Bioethics Biosafety Ensure always that any
bioinformatics tool harmful to environment
society has neither been developed nor been used
by you - Cloning of human, Terminator technology, GM Food,
etc.
69References (latest)
- Arthur M. Lesk, Introduction to Bioinformatics,
Oxford University Press, New Delhi (2003). - D. Higgins and W. Taylor (Eds), Bioinformatics-
Sequence structure and databanks, Oxford
University Press, New Delhi (2000). - R.Durbin, S.R.Eddy, A.Krogh and G.Mitchison,
Biological Sequence Analysis, Cambridge Univ.
Press, Cambridge, UK (1998). - A. Baxevanis and B.F. Ouellette, Bioinformatics
A Practical Guide to the Analysis of Genes and
Proteins, (Third Edition) Wiley-Interscience,
Hoboken, NJ (2005). - G.Gibson and S.V.Muse, A Primer of Genome
Science, Sinauer Associates, USA (2002). - N. C. Jones and P. A. Pevzner, An Introduction to
Bioinformatics Algorithms, Ane Books, New Delhi
(2005). - Michael S. Waterman, Introduction to
computational Biology, Chapman Hall, (1995). - J. A. Clasel and M. P. Deutscher (Eds),
Introduction to Biophysical Methods for Protein
and Nucleic Acid Research, Academic press, New
York (1995). - D.S. T.Nicholl, An Introduction to Genetic
Engineering, (Second Edition) Cambrdige Univ.
Press, UK (2002).
70References
- 3D-1D Profile method
- J.U.Bowie, E.Luthy D.Eisenberg, Science, 253,
164-170 (1991). - Ostensible Recognition of Folds (ORF) method
- Rajeev Aurora and George D.Rose, Proc. Natl.
Acad. Sci. (USA), 95(6), 2818-2823 (1998). - Superfamily Hidden Markov Model (SHMM) method
- A.Krogh, M.Brown, IS.Mian, K.Sjolander and
D.Haussler, J. Mol. Biol. 235(5), 1501-31 (1994).
71ImportantBioinformatics Resources
- Databases Tools
- NCBI, NIH - www.ncbi.nlm.nih.gov
- EMBL, EBI - www.ebi.ac.uk
- ExPasy, Swiss - www.expasy.org
- DDBJ - www.ddbj.nig.ac.jp
- PDB - www.rcsb.org/pdb
- Software
- Accelrys - www.accelrys.com/products
- GCG, Insight II, Cerius II, Discovery Studio
- TCS - www.atc.tcs.co.in/biosuite/
- BIOSUITE
- Jalaja Technologies - www.jalaja.com
- GENOCLUSTER
72