Honours Research Project - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Honours Research Project

Description:

Sequences are allocated a unique identifier, accession number, upon been ... Gene: The fundamental physical & functional unit of heredity. ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 35
Provided by: cbbcg
Category:

less

Transcript and Presenter's Notes

Title: Honours Research Project


1
Honours Research Project
  • Pair-Wise Feature Based Sequence Alignment of
    Large Genomic Sequences between Same Different
    Species
  • Allison Speed
  • Supervisor Matthew Bellgard

2
Sequence
  • A string of genetic characters
  • Contain genetic information of an organism
  • Genome projects are concerned with the sequencing
    of different species
  • Example Human Genome Project (HGP)
  • Sequences are allocated a unique identifier,
    accession number, upon been submitted to a
    genetic database
  • Sequence length is measured by the number of
    characters or base-pairs (bp)
  • Two types of sequences
  • Nucleotide sequence and
  • Protein sequence

3
Nucleotide Sequence
  • Deoxyribonucleic acid (DNA) is the basic building
    block of life
  • DNA is made up of molecular chemicals called
    nucleotide bases
  • Adenine (A)
  • Guanine (G)
  • Cytosine (C)
  • Thymine (T)
  • These bases are paired together (A-T) (G-C) to
    form a DNA strand

Nucleotide sequence AGTCGCGATCGTGATCGA
4
Protein Sequence
Protein Molecules
  • A triplet of nucleotide bases make 1 amino acid
  • Amino acids are represented by 20 different
    characters
  • Compared to the 4 nucleotide bases
  • Amino acids make proteins

Protein sequence GHILMNPNRSTYWHGHHN
Proteins are required for the structure,
function and regulation of cells, tissues and
organs (Atwood Parry-Smith 1999, p.207)
5
Comparative Genomic Analysis
  • Similarities between different sequences exist
  • Sequences are graphed using a dot plot to view
    this similarity
  • Genetic sequences from the same and different
    species are compared
  • Assists in understanding the functionality
    evolutionary history of DNA
  • Can infer the functionality of one sequence based
    on the known function of another, similar sequence

6
Comparative Genomics
  • By comparing the human genome with the genomes
    of different organisms, researchers can better
    understand the structure and function of human
    genes and thereby develop new strategies in the
    battle against human disease
  • (Spencer 2002)

7
Sequence Alignment
  • To ensure correct comparison, sequences must
    firstly be aligned
  • Pair-wise sequence alignment is the matching of
    genetic characters between 2 sequences
  • ATGGTGAGGATTGCCTTTG
  • ATGGTGAGGATTGCCTTTG

8
Large Sequence Alignment
  • Genome projects have generated, continue to
    generate, vast amounts of sequence data.
  • There is a need to analyse the data
  • Resulting in a strong demand for quality
    alignment tools
  • Alignment of large sequences (gt 1000bp) is a
    difficult task
  • An accurate alignment takes time much
    processing power to produce
  • The need for effective large sequence alignment
    methods is the problem of interest for this
    research project

9
Features
Sequence
Gene B
Gene A
  • A feature is a segment within a sequence that has
    structure
  • Have biological relevance and provide useful
    information about a sequence
  • Example genes
  • Features are known prior to sequence alignment
  • Some features interfere with alignment algorithms
  • So normally such features are removed to create a
    more accurate alignment
  • If features are known prior to alignment, can
    they be used to assist the alignment process?

10
FBSA
  • Feature based sequence alignment (FBSA)
  • A new concept to sequence alignment
  • Proposed by Bellgard Kenworthy (2003)
  • Use biological features to anchor an alignment
    between two large sequences
  • Features which cause problems in other alignment
    methods, assist the alignment process of FBSA

11
FBSA Process
Sequence 1
Gene A
Gene B
Gene C
Gene D
Sequence 2
Gene A
Gene B
Gene C
  • Identify features
  • Compare sequences and match shared features
  • Align at feature based level
  • Align at nucleotide level

12
Advantages to FBSA
  • At the feature-based level, sequences are much
    shorter thus sequence alignment is faster
  • Dependent on feature density
  • Smith Waterman algorithm is used to align the
    features
  • Produces accurate alignments
  • Enables parallel processing
  • Features can break sequences up into natural
    partitions for individual analysis and processing

13
FBSA Research
  • Kenworthy (2003) demonstrated the pair-wise FBSA
    of two large sequences from the same species
  • Developed FBSA program

14
Honours Project
  • Research Question
  • Can FBSA be further developed to align two large
    sequences between
  • different species?

15
Scope of Research Project
  • 3 aims
  • FBSA of large sequences from the same species,
    such as human
  • Further develop FBSA to align large sequences
    between different species, such as mouse and
    human
  • Develop a prototype of a visual FBSA tool

16
Aim 1
  • Two human sequences from chromosome 6
  • Accession numbers
  • AC004213.1 (41,617 bp)
  • AL022723.4 (148,834 bp)
  • Suitable for FBSA because high level of
    similarity
  • Depicted by dot plot

Dot plot Horizontal axis AL022723.4 Vertical
axis AC004213.1
17
Method Taken for Aim 1
Sequence 1
Sequence 2
Repeatmasker
List of repetitive elements
FBSA Program
output
Feature based alignment of Sequence 1 Sequence 2
Feature plot of Sequence 1 Sequence 2
18
Aim 1 Results
  • Feature-based alignment analysed
  • Feature matches were followed to verify a correct
    alignment
  • A feature based alignment is considerably easier
    to analyse than a nucleotide or protein alignment
  • Features are sizeable chunks of sequence data
    that are more human readable
  • Following an alignment between features is both
    easier and less time consuming
  • Aim 1 successfully completed
  • 1 feature was sufficient to assist the alignment
    process

19
Aim 2 FBSA Different Species
  • Mouse sequence from chromosome 19
  • Human Sequence from chromosome 10
  • Both sequences 100,000bp in length
  • Regions of high similarity

20
Method Aim 2
  • Repeated FBSA method used in aim 1
  • ? Failed
  • Not enough features in common
  • More features needed to be identified
  • Specific areas of similarity were selected for
    additional feature investigation
  • 9 regions identified and extracted for individual
    processing

21
Region 3
Region 2
Region 1
Region 4
Region 5
Region 6
Region 7
Region 8
Region 9
22
Further Feature Investigation
  • Search for additional features categorized into 3
    stages
  • Stage 1 repetitive elements and predicted genes
  • Stage 2 expressed sequence tags (ESTs) and
    proteins
  • Stage 3 nucleotide matches

23
Stage 1 Repetitive Elements Predicted Genes
  • The 9 regions were processed for
  • repetitive elements using RepeatMasker and
  • predicted genes using GenScan
  • A feature map for each region was created from
    the output
  • The maps were analysed and any feature matches
    highlighted
  • Although several features were found to match, it
    was clear that further feature investigation was
    needed

Mouse
Region 1
Human
24
Stage 2ESTs and Proteins
  • Region 9 was selected as the initial region for
    additional feature processing
  • The largest of the regions is highly conserved
  • Region 9 was searched for
  • ESTs using blastn both the mouse and human EST
    databases
  • proteins using blastx
  • Search results needed to be interpreted
  • Top ten results added to the feature map of
    region 9
  • Highly successful
  • Remaining 8 regions processed for ESTs and
    proteins
  • Search results of poor quality
  • Only 1 EST identified in region 6 of the mouse
    sequence

25
Stage 3Nucleotide Matches
  • First 8 regions required more features to be
    identified
  • Another method was devised
  • Sequences in each region were aligned
  • Areas of alignment were extracted to create
    sub-regions
  • Each sub-region searched for matches from the
    nucleotide database using blastn

26
Stage 3 2
  • From the 8 regions, 16 sub-regions were extracted
    processed for nucleotide matches
  • Search results included in feature maps
  • Repeated for region 9
  • 24 sub-regions extracted processed

Region 2 6 alignments ? 6 sub-regions
27
Feature Map of Region 9
28
Aim 2 Results 1
  • Sequence conservation shown in dot plot was not
    reflected by the number of
  • repetitive elements, proteins, ESTs and predicted
    proteins
  • But, nucleotide matches in the areas of
    similarity provided the features needed.
  • Nucleotide matches are not typically, in a
    biological sense, a feature
  • Have been treated as a feature to assist
    alignment
  • Better to align nucleotide matches rather than to
    force an alignment between features that do not
    match

29
Aim 2 Results 2
  • Many features were not shared between the two
    species
  • Feature density does not indicate the
    appropriateness of a sequence for FBSA
  • Two sequences may be rich in features, but at the
    same time share few features.
  • The number of features shared between 2 sequences
    is a more meaningful calculation
  • Aim 2 successfully completed

30
Aim 3 Develop a Visual FBSA Tool
  • Program design began while working on aim 2
  • Challenges in aim 2 prevented further work on aim
    3
  • Became out of scope for the research
  • The program would need to be highly
    user-interactive
  • Process of additional feature investigation is
    human dependent
  • Overcome challenges of feature matching
  • An ideal research project in the future
  • The development of such a program would benefit
    FBSA

31
Future Research
  • Automate FBSA process
  • With use of additional features
  • Investigate feature calculations
  • How much of the sequence needs to be made up of
    shared features for FBSA to be worthwhile?
  • Develop cut-off values to assist user
  • FBSA of sequences from other species
  • Types of features needed for FBSA may vary

32
Conclusion
  • FBSA is a new concept in the pair wise alignment
    of large genomic sequences with much future
    possibility
  • Same species and between-species FBSA has been
    successfully demonstrated
  • Additional features have been explored and used
  • Future research would be most beneficial to its
    development

33
Thankyou
  • Questions?

34
Definitions
  • Chromosome Structural carrier of hereditary
    characteristics A certain number of
    chromosomes is characteristic of each species of
    plant animal. E.g. the potato has 48
    chromosomes
  • Gene The fundamental physical functional unit
    of heredity. A gene is an ordered sequence of
    nucleotides located in a particular position on a
    particular chromosome that encodes a specific
    functional product (Atwood Parry-Smith 1999)
  • Homology A similar component in two organisms
    (e.g. genes with strongly similar sequences) that
    can be attributed to a common ancestor of the two
    organisms during evolution (Mount, 2001)
  • Repetitive DNA Sequences of varying lengths that
    occur in multiple copies in the genome it
    represents much of the human genome. (Doe
    Genomics)
Write a Comment
User Comments (0)
About PowerShow.com