Affymetrix Expression Data - PowerPoint PPT Presentation

About This Presentation
Title:

Affymetrix Expression Data

Description:

Affymetrix oligo microarrays: HG-U133 A and B (human) and MG-U74v2 A, B and C (mouse) ... SNOMED: Systematised Nomenclature of Medicine ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 19
Provided by: Hul95
Category:

less

Transcript and Presenter's Notes

Title: Affymetrix Expression Data


1
Affymetrix Expression Data
  • Comics Group
  • 12-05-2003 Nijmegen
  • Tim Hulsen

2
General Information (1)
  • Affymetrix oligo microarrays HG-U133 A and B
    (human) and MG-U74v2 A, B and C (mouse)
  • Updated every two months releases used here
    november 2002 and january 2003
  • UniGene-based
  • Probes 25mer oligos complementary to the
    sequences of interest
  • Probe pairs perfect match (PM) probe and
    mismatch (MM) probe, MM is different from PM in
    the 13th position

3
General Information (2)
  • Human chips 3269 samples, 44792 fragments, 115
    tissue categories (114 for nov. 2002 release), 15
    SNOMED tissue categories
  • Mouse chips 859 samples, 36701 fragments, 25
    tissue categories, 12 SNOMED tissue categories
  • Results from all samples within a tissue category
    are combined by generating electronic Northerns
  • For each tissue fragment and each tissue category
    is determined
  • Median expression value
  • Present call (percentage)

4
Median expression value
  • Expression value intensity
  • All expression values that have a present call
    are used to determine the median expression value
  • Varies from 0 to 65,000 in human and from 0 to
    97,000 in mouse

5
Present Call (Percentage)
  • Normalization/scaling procedures (MAS 5.0) are
    used to determine an expression intensity value
    with an associated confidence level to each
    fragment
  • When confidence level p for expression is smaller
    than 0.05, the expression intensity for this
    specific fragment in this particular sample is
    called present (P)
  • Call values are used to calculate a present call
    percentage (P calls / total calls)

6
Snomed category definitions (1)
  • SNOMED Systematised Nomenclature of Medicine
  • Combines specific categories into more global
    categories, i.e. organ systems
  • In human far more useful than in mouse
    (115-gt15,25-gt12)
  • Categories like cardiovascular system, digestive
    organs, endocrine gland, female genital system,
    male genital system, musculoskeletal system,
    nervous system, respiratory system, etc.

7
Snomed category definitions (2)
  • Example cat. 7 hematopoietic system

Human Mouse
36. Bone marrow 10. Blood
37. Dendritic reticulum cell 11. Bone marrow
38. Lymph node 12. Mesenteric lymph node
39. Lymphocyte 13. Spleen
40. Monocyte 14. Thymus
41. Segmented neutrophil
42. Spleen
43. Thymus
44. Tonsil
45. White blood cell
8
Annotation provided
For each fragment, if available
  • title
  • geneSymbol
  • geneAlias
  • exemplarAcc
  • omimId
  • snpId
  • refseqId
  • refseqprotId
  • ncbiNuclId
  • ncbiProtId
  • unigeneAcc
  • unigeneId
  • interproId
  • pfamId
  • swissprotId
  • goId
  • goFunction
  • goProcess
  • goComponent
  • comment

9
Goals Problems
  • Goal use data set to see if co-expression
    between orthologous/paralogous gene pairs is
    higher than between unrelated gene pairs, in
    human mouse
  • Problem 1 limited annotation
  • Problem 2 empty expression profiles
  • Problem 3 size of data set

10
Limited annotation (1)
  • For example for three of the most used protein
    ids
  • ncbiProtId (in red), refseqProtId (in green),
    swissprotId (in blue)
  • Human Mouse

11
Limited annotation (2)
  • Solutions
  • Smith-Waterman of (SWX) of all Affymetrix
    sequences to the human mouse IPI sets, for
    which orthologs and paralogs were already defined
    -gt IPI id added to database
  • Smith-Waterman (SWN) of all Affymetrix sequences
    to each other for better orthology/paralogy
    prediction

12
Empty expression profiles
  • Lots of genes have no expression at all in any
    tissue category
  • Useless for correlation calculation two genes
    with no expression will have a top correlation!
  • For human 4114 out of 44792 fragments completely
    no expression in all tissue categories -gt 40678
    left
  • For mouse 6791 out of 36701 fragments completely
    no expression in all tissue categories -gt 29910
    left

13
Size of data set
  • Correlation between gene pairs is calculated the
    number of pairs is (x2-x)/2 for x genes -gt
    millions of data points
  • Number of gene pairs is already brought down by
    the no expression gene removal in human from
    1,003,139,236 to 827,329,503, in mouse from
    673,463,350 to 447,289,095
  • For some quick analyses, sets of e.g. 1000
    randomly selected genes were used -gt 499,500 gene
    pairs

14
Uncentered Correlation
  • Uncentered from 0 to 1
  • UC(X,Y) S( X / ( sqrt ( S( X2 / N ) ) ) ( Y /
    ( sqrt ( S( Y2 / N ) ) ) ) / N
  • Calculated correlations between gene pairs were
    used to see if the co-expression for orthologous
    pairs and/or paralogous pairs is higher than for
    unrelated pairs
  • This was measured by using the KEGG Pathway map
    (release 25)
  • The best, however not completely convincing,
    result was found using PCP and not ME

15
Correlation KEGG Pathway Check
  • Data points above a correlation threshold of 0,9
    and 1,0 were left out because of very low numbers
    (irreliability)
  • Only orthologous conserved gene pairs have a
    higher accuracy when increasing the correlation
    threshold
  • May be a combination of PCP and ME should be
    used
  • Another measure could be used same GO category
    instead of KEGG, GO is already annotated by
    Affymetrix
  • Lots of genes have only an expression value in
    one tissue this correlation method is not really
    suitable -gt mutual information analysis

16
Mutual Information
  • For each tissue category 0 or 1 (ME/PCP value
    below/above a specified threshold)
  • x0 of 0s, x1 of 1s
  • x00 of 0-0 pairs, x01 of 0-1 pairs, x10
    of 1-0 pairs, x11 of 1-1 pairs
  • Entropy per gene -(x0ln(x0)x1ln(x1))
  • Entropy per gene pair -(x00ln(x00)x01ln(x01)x
    10ln(x10)x11ln(x11))
  • MI Entropy(1) Entropy(2) Entropy(1,2)
  • 0ltMIlt0,693147

17
MI GO Category Check
  • Mutual information check using GO Biological
    Process 3rd level of specification
  • Horizontal axis shows log(MI)
  • Different lines different thresholds for
    defining as a 0 or a 1
  • Accuracy indeed seems to be higher for pairs
    with much mutual information, but there is also a
    peak at -9ltlog(MI)lt-8
  • Orthologous/paralogous pairs not checked yet

18
Future plans
  • Complete mutual information analysis, using both
    KEGG Pathway and GO databases look at
    orthologous and paralogous gene pairs too
  • Check alternative splicing
  • Speed license ends at the end of June
Write a Comment
User Comments (0)
About PowerShow.com