Affymetrix Expression Data - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Affymetrix Expression Data

Description:

36. Bone marrow. Mouse. Human. Annotation provided. title. geneSymbol. geneAlias. exemplarAcc ... May be a combination of PCP and ME should be used ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 19
Provided by: Hul78
Category:

less

Transcript and Presenter's Notes

Title: Affymetrix Expression Data


1
Affymetrix Expression Data
  • Comics Group
  • 12-05-2003 Nijmegen
  • Tim Hulsen

2
General Information (1)
  • Affymetrix oligo microarrays HG-U133 A and B
    (human) and MG-U74v2 A, B and C (mouse)
  • Updated every two months releases used here
    november 2002 and january 2003
  • UniGene-based
  • Probes 25mer oligos complementary to the
    sequences of interest
  • Probe pairs perfect match (PM) probe and
    mismatch (MM) probe, MM is different from PM in
    the 13th position

3
General Information (2)
  • Human chips 3269 samples, 44792 fragments, 115
    tissue categories (114 for nov. 2002 release), 15
    SNOMED tissue categories
  • Mouse chips 859 samples, 36701 fragments, 25
    tissue categories, 12 SNOMED tissue categories
  • Results from all samples within a tissue category
    are combined by generating electronic Northerns
  • For each tissue fragment and each tissue category
    is determined
  • Median expression value
  • Present call (percentage)

4
Median expression value
  • Expression value intensity
  • All expression values that have a present call
    are used to determine the median expression value
  • Varies from 0 to 65,000 in human and from 0 to
    97,000 in mouse

5
Present Call (Percentage)
  • Normalization/scaling procedures (MAS 5.0) are
    used to determine an expression intensity value
    with an associated confidence level to each
    fragment
  • When confidence level p for expression is smaller
    than 0.05, the expression intensity for this
    specific fragment in this particular sample is
    called present (P)
  • Call values are used to calculate a present call
    percentage (P calls / total calls)

6
Snomed category definitions (1)
  • SNOMED Systematised Nomenclature of Medicine
  • Combines specific categories into more global
    categories, i.e. organ systems
  • In human far more useful than in mouse
    (115-gt15,25-gt12)
  • Categories like cardiovascular system, digestive
    organs, endocrine gland, female genital system,
    male genital system, musculoskeletal system,
    nervous system, respiratory system, etc.

7
Snomed category definitions (2)
  • Example cat. 7 hematopoietic system

8
Annotation provided
For each fragment, if available
  • title
  • geneSymbol
  • geneAlias
  • exemplarAcc
  • omimId
  • snpId
  • refseqId
  • refseqprotId
  • ncbiNuclId
  • ncbiProtId
  • unigeneAcc
  • unigeneId
  • interproId
  • pfamId
  • swissprotId
  • goId
  • goFunction
  • goProcess
  • goComponent
  • comment

9
Goals Problems
  • Goal use data set to see if co-expression
    between orthologous/paralogous gene pairs is
    higher than between unrelated gene pairs, in
    human mouse
  • Problem 1 limited annotation
  • Problem 2 empty expression profiles
  • Problem 3 size of data set

10
Limited annotation (1)
  • For example for three of the most used protein
    ids
  • ncbiProtId (in red), refseqProtId (in green),
    swissprotId (in blue)
  • Human Mouse

11
Limited annotation (2)
  • Solutions
  • Smith-Waterman of (SWX) of all Affymetrix
    sequences to the human mouse IPI sets, for
    which orthologs and paralogs were already defined
    -gt IPI id added to database
  • Smith-Waterman (SWN) of all Affymetrix sequences
    to each other for better orthology/paralogy
    prediction

12
Empty expression profiles
  • Lots of genes have no expression at all in any
    tissue category
  • Useless for correlation calculation two genes
    with no expression will have a top correlation!
  • For human 4114 out of 44792 fragments completely
    no expression in all tissue categories -gt 40678
    left
  • For mouse 6791 out of 36701 fragments completely
    no expression in all tissue categories -gt 29910
    left

13
Size of data set
  • Correlation between gene pairs is calculated the
    number of pairs is (x2-x)/2 for x genes -gt
    millions of data points
  • Number of gene pairs is already brought down by
    the no expression gene removal in human from
    1,003,139,236 to 827,329,503, in mouse from
    673,463,350 to 447,289,095
  • For some quick analyses, sets of e.g. 1000
    randomly selected genes were used -gt 499,500 gene
    pairs

14
Uncentered Correlation
  • Uncentered from 0 to 1
  • UC(X,Y) S( X / ( sqrt ( S( X2 / N ) ) ) ( Y /
    ( sqrt ( S( Y2 / N ) ) ) ) / N
  • Calculated correlations between gene pairs were
    used to see if the co-expression for orthologous
    pairs and/or paralogous pairs is higher than for
    unrelated pairs
  • This was measured by using the KEGG Pathway map
    (release 25)
  • The best, however not completely convincing,
    result was found using PCP and not ME

15
Correlation KEGG Pathway Check
  • Data points above a correlation threshold of 0,9
    and 1,0 were left out because of very low numbers
    (irreliability)
  • Only orthologous conserved gene pairs have a
    higher accuracy when increasing the correlation
    threshold
  • May be a combination of PCP and ME should be
    used
  • Another measure could be used same GO category
    instead of KEGG, GO is already annotated by
    Affymetrix
  • Lots of genes have only an expression value in
    one tissue this correlation method is not really
    suitable -gt mutual information analysis

16
Mutual Information
  • For each tissue category 0 or 1 (ME/PCP value
    below/above a specified threshold)
  • x0 of 0s, x1 of 1s
  • x00 of 0-0 pairs, x01 of 0-1 pairs, x10
    of 1-0 pairs, x11 of 1-1 pairs
  • Entropy per gene -(x0ln(x0)x1ln(x1))
  • Entropy per gene pair -(x00ln(x00)x01ln(x01)x
    10ln(x10)x11ln(x11))
  • MI Entropy(1) Entropy(2) Entropy(1,2)
  • 0ltMIlt0,693147

17
MI GO Category Check
  • Mutual information check using GO Biological
    Process 3rd level of specification
  • Horizontal axis shows log(MI)
  • Different lines different thresholds for
    defining as a 0 or a 1
  • Accuracy indeed seems to be higher for pairs
    with much mutual information, but there is also a
    peak at -9ltlog(MI)lt-8
  • Orthologous/paralogous pairs not checked yet

18
Future plans
  • Complete mutual information analysis, using both
    KEGG Pathway and GO databases look at
    orthologous and paralogous gene pairs too
  • Check alternative splicing
  • Speed license ends at the end of June
Write a Comment
User Comments (0)
About PowerShow.com