Discussion Points for 2nd Pseudogene Call - PowerPoint PPT Presentation

About This Presentation
Title:

Discussion Points for 2nd Pseudogene Call

Description:

Ignores tricky cases flagged by manual annotation ... multiple approaches and then explicitly flag each group's unique ones in final annotation ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 8
Provided by: markge1
Learn more at: http://pseudogene.org
Category:

less

Transcript and Presenter's Notes

Title: Discussion Points for 2nd Pseudogene Call


1
Discussion Points for 2nd Pseudogene Call
  • Mark Gerstein
  • 2005,09.22 1100 EST

2
Intersection of Pseudogenes from Three Groups
Original
42
45
Havana-Gencode 167 pseudogenes
35
21
86
Yale 184 pseudogenes
87
87
18
17
18
16
22
UCSC retrogenes 15 expressed (7-8 pseudogenes)
143 not expressed (all pseudogenes)
86 havana peudogenes overlap with any Yale
pseudogene and 87 Yale pseudogenes overlap with
any havana pseudogene (idem for retrogenes). This
is a global result maybe in some loci three
havana pseudogenes overlap with only one yale
pseudogene, but in other loci, several yale
pseudogenes overlap with one havana pseudogene.
Provided by France.
3
Intersection of Pseudogenes from 4 Groups Updated
52 (2)
Havana-Gencode 167 pseudogenes
14 (2)
16 (0)
Yale 164 pseudogenes
82 (34)
15 (1)
17 (7)
33 (1)
UCSC retrogenes 146 not expressed
  • The numbers in parentheses are pseudogenes from
    GIS.
  • All from http//pseudogene.org/ENCODE/cross-ref
  • Pseudo-exons were merged to form pseudogenes and
    used for this comparison (now a pseudogene has
    only a single start and end)
  • Strand information is ignored
  • There are a total of 229 pseudogenes in the union

4
Intersection of Pseudogenes from 4 Groups
Non-processed Consensus
52 (2)
Havana-Gencode 167 pseudogenes
14 (2)
16 (0)
82 (34)
Yale 164 pseudogenes
15 (1)
17 (7)
33 (1)
UCSC retrogenes 146 not expressed
Roughly agreement now is 82 52 7 127 from
229 total What to do with 102?
GENCODE Processed GENCODE Non-Processed
Yale Processed 7 / 8 5 / 5
Yale Non-Processed 4 / 4 39 / 37
5
How to Pick Pseudogenes for RT-PCR?
  • Start with the intersection 127
  • Duplicated v processed how many of each? (21?)
  • Rank Pseudogenes
  • By likelihood to be transcribed according to
    ENCODE evidence
  • ditag, then CAGE, then tiling array
  • By their uniqueness in genome
  • Good primers
  • Non cross-hybridizing probes
  • How to get a consistent rank?
  • Who will do RT-PCR ?
  • What coordinates to use ?
  • (Ignore 1 processed pseudogene already being
    sequenced by GIS group.)

6
How to generate a consensus for remaining 102
pseudogenes?
  • Stick with the intersection 127
  • Develop a consistent criteria for identifying
    pseudogenes and uniformly apply to ENCODE
  • E.g. protein matches with disablements found from
    a pipeline
  • Ignores tricky cases flagged by manual annotation
  • Do a simple union of UCSC, Havana Yale, giving
    229
  • GIS is a subset of other 3
  • Describe pseudogenes as being identified by
    multiple approaches and then explicitly flag each
    groups unique ones in final annotation
  • Easy but perhaps biases stats
  • Do a qualified union
  • Allow each group to question particular
    pseudogenes in anothers set
  • Send questions around and then have a call to
    sort out differences
  • Need a way to arbitrate e.g. we could demand an
    obvious disablement
  • We might learn something!
  • How do we represent this in the browser in
    stats?

7
Once we have consensus, how to agree on
pseudogene boundaries?
  • Keep unchanged each groups boundaries
  • If pseudogenes overlap, take largest region
    (union) or smallest
  • Develop a uniform criteria for assigning
    pseudogene boundaries and apply it to each of the
    pseudogenes in the consensus set
  • Could just take each pseudogene in the consensus
    and have one group realign it against parent
Write a Comment
User Comments (0)
About PowerShow.com