Human Genome: sequence, structure, diseases - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

Human Genome: sequence, structure, diseases

Description:

Noncoding DNA types, amount, distribution, information content, and ... BANANA- - ANANAS. Sequence alignment. Compare two words. How many conserved positions? ... – PowerPoint PPT presentation

Number of Views:27

Avg rating:3.0/5.0

Slides: 36

Provided by: admi1145

Category:

more less

Transcript and Presenter's Notes

Title: Human Genome: sequence, structure, diseases

1
Human Genome sequence, structure, diseases

Lecture 8
BINF 7580

2
Question

What is the next step after the genome
sequence is completed?

The new Research challenges in genetics now
Gene number, exact locations, and functions
Gene regulation
DNA sequence organization
Chromosomal structure and organization
Noncoding DNA types, amount, distribution,
information content, and functions
Coordination of gene expression, protein
synthesis, and post-translational events
Interaction of proteins in complex molecular
machines

Protein conservation (structure and function)
Proteomes (total protein content and function) in
organisms
Correlation of SNPs (single-base DNA variations
among individuals) with health and disease
Disease prediction based on gene sequence
variation
Genes involved in complex traits and multigene
diseases
Developmental genetics, genomics

For most of these problems we need to locate DNA
fragment in chromosome
4

Chromosomes
Each chromosome contains one long piece of DNA
Chromosomes are visible in the light microscope
Banding The chromosomes themselves looked
striped. They have dark regions (bands)
alternating with light regions (interbands). The
dark regions are dark because they have highly
compacted and coiled DNA. The interbands are
regions where the uncoiled DNA connects the
bands.
As long as the DNA in the band remains tightly
coiled, it is not available for transcription.
The puffing of a band is a site of RNA
transcription they were the sites of RNA
synthesis.
Each chromosome has a characteristic length and
banding pattern.

5
Identification of chromosomes Each human
chromosome is numbered from 1-22,
sex chromosomes
either X or Y

Each arm divided into sub-regions and identified
by a number.
Each sub-region divided into bands identified
with a number

p arm (short arm)
Centromere
q arm (long arm)
Example - 1q2.4 . The first chromosome, long arm,
second region of the chromosome, the fourth band
of that sub-region
6
H.A. Prepare a couple slides about Chromosomes
analysis, present data about Karyotype (?) . How
differ karyotypes in different species? What
technique is used to visualize all the pairs of
chromosomes in an organism in different colors
Spectral karyotype of a human female
7
Nucleotide and Amino acids Sequence Analysis

Here is a short list of problems
sequence comparison compare two sequences and
show the similarities and differences.
The trivial method to compare two
sequences is to compare them character by
character, allowing for gaps
The Best Alignment ?
Try every possible alignment between two
sequences
and give each aligned position a score according
to the scoring matrix.
The alignment with highest score is the
best.

8
The question How many possible alignments are
possible?
9
Unfortunately, all possible combinations of one
sequence against another is enormous amount of
combinations
Therefore, the main problem is
to make
alignment process applicable in relatively short
time.
10
Sequence comparison In bioinformatics, a sequence
alignment is a way of arranging the sequences (?)
of DNA, RNA, or protein
to identify regions of
similarity Similarity may be a consequence of
functional, structural, or evolutionary
relationships between the sequences.
How many conserved positions?
BANANA-
-
ANANAS
Sequence alignment Compare two words
How many Gaps?
The goal of sequence alignment is to find
optimal residue-to-residue correspondences.

The
optimization gives the maximum number of
conserved positions occupied by identical or
similar residues in all aligned sequences.
To achieve this
goal one sometimes needs to allow for gaps
within sequences so that chemically similar amino
acids can be aligned to each other.
11
In Bioinformatics use a computational method -
Dynamic Programming

to align two proteins or nucleic acids The
term dynamic programming to describe the process
of solving problems where one needs to find the
best decisions one after another.
At first, we select the best path from Start to
A,
then we select the best path from A to
Finish. The choice of the best path from A to
Finish is independent of the choice of path from
Start to A
12
How to determine an optimal path?
The crucial observation The choice of the best
path from A to Finish is independent of the
choice of path from the Start to A
If we determine the best of 6 paths from Start to
A and the best of 6 paths from A to Finish
Then

the best paths Start to Finish is the
best path from Start to A followed the best path
from A to Finish. Question How many variants of
pathway do we need to consider?
Answer No more than 12 of the paths. (instead of
36 paths)
The algorithm does not guarantee that the given
path is the best one, but the method do find the
optimal one of the best solutions.
13
Thus the path is subdivided into a set of
steps. The goal is to find the optimal way for
each step Any step along the true optimal path
must itself be the optimal path. This is the
main idea of dynamic programming method. Dynamic
programming is typically used when a problem has
many possible solutions and an optimal one needs
to be found.
14
Dynamic Programming An example of global sequence
alignment
the two sequences to be globally
aligned are
G A A T T C A G T T A (sequence 1) M 11
length of sequence G G A T C G A
(sequence 2) N 7 length of sequence
The step 1. COST.
We have to
assign a cost to each comparison A simple
scoring scheme is for a residue at position i of
sequence 1 and a residue at position j of
sequence 2
AAA AAA
A A
AAA ADA
A A A
Si,j 1 Si,j 0
Si,j 0
(match score) (mismatch score)
(gap penalty)
15
The step 2. The solutions for each alignment
position is saved in a matrix with M 1 columns
and N 1 rows where M and N correspond to the
size of the sequences to be aligned.

The first row
and first column of the matrix can be initially
filled with 0.
.
i
WHY ?
M 1,1 G . G M 1,0 G A
. -- G
j
16
Matrix Fill Step
The steps 3. To find maximal score Mi,j for each
position i,j . in the
matrix. . GAATT
.
GGATC
The question is
How to better align
residues at the i and j position?
.
For example, GAA
GGA or G A
GGA or ?
17
To find the score Mi,j for the position i, j we
have to know the score for the matrix positions
to the left (Mi-1,j ), above (Mi,j-1
), and diagonal (Mi-1,j-1 ), to i, j to check all
possible alignment
Why ?
i A C
T D Q
FHASY
j
Because positions to the left Mi-1,j ), above
(Mi,j-1 ), and diagonal (Mi-1,j-1 are the
positions before the position Mi,j We have to
select the best previous position to make the
next step to Mi,J
18
There are two Sequences
A ACGCTG,

B CATGT The best alignment ?
Question explain the cell in
the first row and the first column
19
A C G... C A T...

20
QUESTION How do we estimate the gap?
21
Question
How
do we calculate the score of this alignment?
22
How do we calculate the scores?
23
Question How do we estimate the mismatch? 0, -1,
1?
24
Question How do we estimate the match? 0, 1,
2 Thus in this alignment the penalty for a gap
is .
the score for a mismatch is
25
Explain the score in the cell G3/ C1 Check the
score for mismatch with the previous slides.
26
Check the score in the cell G3/A2
27
After filling in all of the values the score
matrix is as follows
28
The next procedure is the traceback step. The
traceback step determines the actual alignment
that result in the maximum score. The traceback
step begins in the N,M position in the matrix,
i.e. the position where both sequences are
globally aligned

29
The algorithm of the traceback
a) step begins
with the last cell
Traceback takes the current cell and looks to the
neighbor cells that could be direct predacessors

? to the neighbor to the
left (gap in sequence 2), ? the diagonal
neighbor (match/mismatch), and
? the neighbor above it
(gap in sequence 1).
there is a G6/T5 in this case).
30
For the current cell there are two possible
predacessors with the maximum score 3.
b) If more than one possible predacessor
(? left and ? above) with the same
maximum score exists, any can be chosen. If the
diagonal neighbor ? has the same maximum score,
diagonal way is selected to avoid a gap.
Variant 1 select left cell ? as the predacessor.
TG
T -
Select the best alignment and compare with the
alignment at the next slide.
31
Question Does your alignment coincide with this
one?
Make another possible alignment (Variant 2) and
then compare it with the alignment at the next
slide.
32
Variant 2
Question

What are the maximum scores of these two
possible alignments?
33
H.A. Create an alignment according this matrix
H.A. Construct the table (calculate the value of
all cells ) for the same sequences but with the
different scores Si,j 2
Si,j -1 Si,j -2
(match score) (mismatch
score) (gap penalty) Find the
optimal alignment and compare with the previous
one.
34

Nucleotide Sequence Analysis
HomoloGene - a gene homology tool that compares
nucleotide sequences between pairs of organisms
in order to identify putative orthologs.
BLAST - sequence similarity searching set of
programs
Nucleotide-nucleotide BLAST (blastn)
Search for short, nearly exact matches
Translated query vs. protein database (blastx)
Protein query vs. translated database (tblastn)
Immunoglobin BLAST (IgBlast)

H.A. Look at A user-friendly introduction to
BLAST http//www.geospiza.com/outreach/BLAST/slide
1.html
35
H.A.BLAST - sequence similarity searching
program. Short power-point presentation

Write a Comment

User Comments (0)