Human Genome Project - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Human Genome Project

Description:

human genome project – PowerPoint PPT presentation

Number of Views:391
Avg rating:3.0/5.0
Slides: 29
Provided by: t809
Category:

less

Transcript and Presenter's Notes

Title: Human Genome Project


1
Human Genome Project
2
Basic Strategy
  • How to determine the sequence of the roughly 3
    billion base pairs of the human genome. Started
    in 1995.
  • Various side projects genetic diseases,
    variations between individuals, ethnic variation,
    comparison to other species.
  • Strategy
  • 1. physical map relating specific DNA markers to
    the proper chromosomal position.
  • 2. Overlapping set of cloned DNAs (contigs)
  • 3. sequencing and assembly
  • 4. finding the genes in the sequence
  • 5. annotation of gene function

3
Genetic mapping
  • Where and why genes are present inside
    chromosomes
  • Simply means we need to locate genes in total
    genome
  • A genetic map uses recombination, crossing over
    during meiosis, to determine how frequently two
    genes (or markers) are inherited together.
  • Genes genotypes
    phenotypes

4
  • Gene map
  • Linkage map physical
    map
  • It tells you whether
    the presence of genes in chromosome
  • 2 genes are close
  • or distantly related
  • No location

5
Linkage map
  • Genetic linkage is the tendency of genes that are
    located proximal to each other on a chromosome to
    be inherited together during meiosis.
  • Genes whose loci are nearer to each other are
    less likely to be separated onto different
    chromatids during chromosomal crossover, and are
    therefore said to be genetically linked.
  • In other words, the nearer two genes are on a
    chromosome, the lower is the chance of a swap
    occurring between them, and the more likely they
    are to be inherited together.

6
Chromosome Theory of Linkage
  • Morgan, along with Castle formulated the
    chromosome theory of linkage. It has the
    following postulates
  • 1. Genes are found arranged in a linear manner in
    the chromosomes.
  • 2. Genes which exhibit linkage are located on the
    same chromosome.
  • 3. Genes generally tend to stay in parental
    combination, except in cases of crossing over.
  • 4. The distance between linked genes in a
    chromosome determines the strength of linkage.
    Genes located close to each other show stronger
    linkage than that are located far from each
    other, since the former are less likely to enter
    into crossing over.

7
  • However crossing over does not occur between
    linked genes in every meiotic event, especially
    when the positions of the genes on the chromosome
    are very near one another.
  • The frequency with which crossing over occurs
    between any two linked genes is proportional to
    the distance between the loci along the
    chromosome.

8
  • 1. At very small distances, crossover is very
    rare, and most gametes are parental.
  • 2. As the distance between two genes increases,
    crossover frequency increases. More recombinant
    gametes, fewer parental gametes.
  • 3. When genetic loci are very far apart on the
    same chromosome, crossing over nearly always
    occurs, and the
  • frequency of recombinant gametes approaches 50
    percent.

9
What is molecular marker?
  • DNA sequence used to mark a particular location
    on a particular chromosomes.

10
Genetic markers
  • Modern genetic markers SNPs
  • A genetic marker is a gene or DNA sequence with a
    known location on a chromosome that can be used
    to identify individuals or species.
  • It can be described as a variation (which may
    arise due to mutation or alteration in the
    genomic loci) that can be observed.
  • A genetic marker may be a short DNA sequence,
    such as a sequence surrounding a single base-pair
    change (single nucleotide polymorphism, SNP), or
    a long one, like

11
What are they? Variable sites in the genome What
are their uses? Finding disease
genes Testing/estimating relationships Studying
population differences
Phenotype Genotype
Brown eyes BB or Bb
Blue eyes bb
12
Physical mapping
  • Cytogenetic mapping
  • A cytogenetic map is the visual appearance of a
    chromosome when stained and examined under a
    microscope.
  • Particularly important are visually distinct
    regions, called light and dark bands, which give
    each of the chromosomes a unique appearance.
  • This feature allows a person's chromosomes to be
    studied in a clinical test known as a karyotype,
    which allows scientists to look for chromosomal
    alterations

13
Physical Maps
  • A physical map determines where a given DNA
    marker is located on the DNA of the chromosome.
  • Genetic and physical maps are (supposed to be)
    colinearall the genes appear in the same order
    in both maps. But, distances are quite
    different there is very little recombination in
    the centromeres, so large DNA distances are very
    short recombination distances.
  • Genetic maps using microsatellite (SSR) markers
    were used to develop physical maps the
    appropriate SSR sites were expected to be found
    on the corresponding cloned DNA.

14
Sequence Tagged Sites
  • Produced by sequencing RNA which in turn
    transcript from genes
  • RNA present Genes which are turned on in tissue
  • Its called taq because they are not really
    complete sequence of genes, its only partially
    sequenced

15
Sequence Tagged Sites
  • a sequence tagged site (STS) is a short sequence
    that is unique in the genome.
  • You obtain the sequence information from cloned
    DNA, and then locate it in the genome.
  • Using PCR it is then possible to determine
    whether your STS is present in any other clone or
    cell line.
  • Obtaining STS sequencing the ends of large
    cloned DNAs (BACs or YACs, for example).
  • Uniqueness use the cloned DNA from the STS as a
    probe on a Southern blot of genomic DNA if the
    STS is unique, only 1 band will hybridize.
  • Repetitive DNA is very common in the human
    genome, and many DNA sequences are not unique.
  • A good source of unique DNA is EST clones cDNA
    made from messenger RNA.

16
Somatic Cell Hybrids
  • Human and mouse (or hamster) cultured cells can
    be fused together using polyethylene glycol.
  • The resulting fused cell is a heterokaryon it
    has 2 nuclei from different species.
  • If the heterokaryon undergoes mitosis, the nuclei
    fuse.
  • Human chromosomes are unstable in a mixed
    nucleus, and most of them are randomly lost. The
    mouse chromosomes all stay.
  • Different cell lines can be established that
    contain different combinations of human
    chromosomes
  • You can identify which human chromosomes remain
    using chromosome banding techniques.
  • A good way to determine which chromosome a DNA
    sequence is on. Sometimes also for gene products
    or phenotypes.

17
Radiation Hybrids
  • Standard somatic cell fusions contain entire
    human chromosomes. To locate a gene more
    closely, you need to use chromosome fragments.
  • Start by irradiating human cells with a
    controlled dose of X-rays chromosomes break up.
    Then, fuse the cells to mouse cells. The human
    chromosome fragments get integrated into the
    mouse chromosomes.
  • Create a panel of mouse/human hybrid cell lines.
  • The current standard panels contain about 100
    cell lines.
  • Each line contains about 32 of the human genome
  • Average size of human genome fragment 25 kbp
  • More radiation smaller fragments
  • Mapping the hybrid cell lines contain random
    human chromosome fragments, but closely linked
    sites are usually in the same cell line (same
    basic principle as recombination mapping).
  • Until you have located some of the markers on the
    chromosomes, radiation hybrid mapping only gives
    you information about whether any two sequences
    are close together on the chromosome.

18
Contigs
  • A contig is a set of partially overlapping
    clones, a contiguous set of clones. No gaps
    between them.
  • Contigs allow you to build up the sequence of the
    chromosome over much larger regions than any
    single clone.
  • The first reasonably complete physical map of the
    human genome involved contigs generated by YACs
    (yeast artificial chromosomes).
  • Initially, you have a collection of clones with
    no information about how they are ordered on the
    chromosome.
  • Contigs are built up by using PCR to identify
    unique sequences (STS or EST) on each clone, and
    then looking for overlaps between the clones.

19
Sequencing Strategy
  • Once a contig map of the genome was obtained, it
    was necessary to sequence each individual clone.
  • Most of the actual human genome sequencing was
    done on BAC clones, which are less prone to
    rearrangement than YAC clones. BACs are about
    100-200 kbp long.
  • Large clones are generally sequenced by shotgun
    sequencing The large cloned DNA is randomly
    broken up into a series of small fragments ( less
    than 1 kb). These fragments are cloned and
    sequenced. A computer program then assembles
    them based on overlaps between the sequences of
    each clone.
  • To ensure that every bit has been covered, you
    need to sequence random clones until you have
    covered each spot 5-10 times on average.

20
Whole Genome Shotgun Sequencing
  • Why bother with creating a large scale physical
    map all that YAC and BAC cloning, radiation
    hybrids, STS comparisons, etc? Why not just
    fragment the whole genome into 1 kb pieces,
    sequence them all, and let the computer assemble
    the whole genome?
  • In practice, the genome is cloned into large
    fragments first, and then each large fragment is
    broken up for shotgun sequencing. But, the large
    fragments are not ordered no physical map or set
    of contigs is created.
  • Requires a lot of overlapping coverage
  • Also requires good software.
  • Very successful for prokaryotic genomes (10 Mbp
    or less).
  • but the human genome is 300 times larger
  • Big problem repeat sequence DNA, which is
    everywhere, and especially near the centromere.
    To find overlaps between clones, you need unique
    regions.
  • It remains unclear whether whole genome shotgun
    sequencing will work if there is no other
    information available to provide order. It has
    not been widely adopted for eukaryotic projects
    (so far).

21
EST (expressed sequence tag)
  • A unique stretch of DNA within a coding region of
    a gene that is useful for identifying full-length
    genes and serves as a landmark for mapping.
  • An EST is a sequence tagged site (STS) derived
    from cDNA.
  • An STS is a short segment of DNA which occurs but
    once in the genome and whose location and base
    sequence are known. STSs are detectable by the
    polymerase chain reaction (PCR), are helpful in
    localizing and orienting mapping and sequence
    data, and serve as landmarks in the physical map
    of the genome.

22
Expressed-sequence tags (ESTs)
  • are cDNA sequences that have been sequenced from
    either the 5 or 3 ends.
  • They may contain all or part of a particular cDNA
    coding sequence,
  • and are useful for identifying unknown genes,
    mapping their positions within a genome,
  • and as a potential source for genetic material
    when a full-length cDNA is not available for a
    specific gene of interest.

23
Gene Detection
  • the best evidence that a given DNA sequence is
    expressed is to find an EST (cDNA copy of mRNA)
    that matches it.
  • Large numbers of EST libraries have been
    constructed and sequenced.
  • The primary result of this was to determine that
    many genes have several different intron slicing
    patterns sequences are exons in some tissues but
    introns in others.

24
Gene Detection
  • Homology searches, using BLAST, are a good way to
    find genes. If a DNA sequence closely matches a
    sequence from another organism, it has been
    evolutionarily conserved, and that usually means
    that it is an expressed gene.
  • Exon prediction exons need to be open reading
    frames (no stop codons), and they display
    patterns of nucleotide usage different from
    random DNA. Several different programs exist,
    and they give somewhat varying results.
    Hypothetical genes are genes whose existence
    has been predicted by computer but which lacks
    any experimental or cross-species data to confirm
    it.
  • a conserved hypothetical gene is a sequence
    that matches other species even though there is
    no EST or other experimental evidence for its
    expression

25
(No Transcript)
26
Genome annotation
  • The process of identifying the locations of
    genes and all of the coding regions in a genome
    and determining what those genes do.
  • Once a genome is sequenced, it needs to be
    annotated to make sense of it.

27
Gene Annotation
  • There is a big problem of too much information
    not uniformly coded or maintained. The
    scientific literature contains numerous examples
    of the same gene or protein with several
    different names, and getting common definitions
    of functions is even harder.
  • To counter this, the Gene Ontology Consortium
    (GO) has created a controlled vocabulary of about
    11,000 terms.
  • Every gene product (protein) can be annotated
    into three general categories
  • molecular function what the protein actually
    does, such as kinase activity
  • biological process what cellular process the
    protein participates in, such as signal
    transduction
  • cellular component where the protein is found in
    the cell, such as integral to the plasma
    membrane
  • Each gene product can have multiple descriptive
    terms.
  • The terms are hierarchical more specific terms
    are contained within less specific terms.
  • But, a given term can have more than one parent
    and more than one child term.

28
GO Example
Write a Comment
User Comments (0)
About PowerShow.com