SUCEST: - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

SUCEST:

Description:

SUCEST: o projeto genoma da cana-de-a car. Felipe Rodrigues da Silva Embrapa Recursos Gen ticos e Biotecnologia Volume de dados dispon veis publicamente Volume ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 66
Provided by: FelipeRod
Category:

less

Transcript and Presenter's Notes

Title: SUCEST:


1
SUCEST
o projeto genoma da cana-de-açúcar.
  • Felipe Rodrigues da Silva
  • Embrapa Recursos Genéticos e Biotecnologia

2
Volume de dados disponíveis publicamente
3
Volume de dados disponíveis publicamente
http//www.ncbi.nlm.nih.gov/Genbank/genbankstats.h
tml
4
Genomas completos de organismos
http//wit.integratedgenomics.com/GOLD/
5
(No Transcript)
6
(No Transcript)
7
SOPÃO
de letrinhas...
8
Cana-de-açúcar
  • Cultivada em mais de 90 países
  • Ocupando cerca de 20 milhões de hectares
  • Família das Gramíneas (Poace)

http//apps.fao.org
9
A cana-de-açúcar no Brasil
  • 25 da produção mundial
  • 300 milhões de tons.
  • 5 milhões de hectares plantados
  • 14.5 milhões de tons. de açúcar
  • 15.3 bilhões de litros de álcool
  • 350 industrias
  • 50 mil produtores
  • 1.4 milhões de empregos direto
  • 3.6 milhões de empregos indiretos

10
Origem e tamanho
  • Saccharum officinarum
  • 2n 80
  • Saccharum spontaneum
  • 2n 64 ou 2n 112
  • 10 25

X
S. berberi, S. sinence, S. robustum
11
Projeto Genoma
Estrutural Funcional
  • Seqüenciamento Completo do Genoma
  • Região Gênica e Região Intergênica
  • EST Expressed Sequence Tag
  • Regiões que codificam proteínas (Genes)

12
Seqüenciamento Completo
Biblioteca de BACs
Mapa físico
BAC a ser seqüenciado
Genomic DNA
Clones Shotgun
...ATGTTGGGCCACAGTTGACCATTGAAACTG
Seqüência
GTTGACCATTGAAACTGACCTTGACGTAACGTGGTA....
13
EST Expressed Sequence Tag
14
(No Transcript)
15
GenBank - dbEST Março de 1998
  • Total de Entradas 1,528,715
  • Homo sapiens 967,015
    (63,4)
  • Plantas (total) 73,087
    (4.8)
  • Mus musculus domesticus (camundongo) 306,544
  • Caenorhabditis elegans 72,521
  • Arabidopsis thaliana 36,173
  • Drosophila melanogaster 27,625
  • Oryza sativa (arroz) 25,844
  • Rattus sp. (rato) 20,311
  • Brugia malayi (nematoide parasita) 13,641
  • Toxoplasma gondii 10,671
  • Emericella nidulans 5,787
  • Schistosoma mansoni 3,659
  • Trypanosoma brucei rhodesiense 3,519
  • Danio rerio (zebrafish) 3,373
  • Saccharomyces cerevisiae 3,042

16
Os Objetivos do projeto SUCEST
  • Identificar 50.000 genes únicos
  • (ou seqüenciar 300.000 ESTs)
  • Desenvolver um Banco de Dados para a
    cana-de-açúcar
  • Disponibiilizar este Banco de Dados para grupos
    de Data Mining
  • Análise funcional dos ESTs

17
O Cronograma
  • Data Meta
  • Jul/1999 Distribuição dos Primeiros Clones
  • Dec/1999 20,000 ESTs
  • Jul/2000 60,000 ESTs
  • Dec/2000 100,000 ESTs
  • Jul/2001 140,000 ESTs
  • Dec/2001 180,000 ESTs
  • Jul/2002 220,000 ESTs
  • Dec/2002 260,000 ESTs
  • Jul/2003 300,000 ESTs

18
As Bibliotecas de cDNA
  • Tecidos / Órgãos
  • Raiz
  • Meristema
  • Caule
  • Sementes
  • Flores
  • Cartucho da Folha
  • Zona de Transição Folha-Raiz
  • Gema Lateral
  • Calli
  • Plântulas imaturas
  • Plântulas infectadas com Herbaspirillum
    rubrisubalbicans
  • Plântulas infectadas com Gluconacetobacter
    diazotroficans
  • Variedades
  • SP80-3280
  • SP70-1143
  • SP80-87432
  • RB 845298
  • RB 805028
  • PB5211 X P57150-4

19
Os Laboratórios de Seqüenciamento
UFSCAR (SC) (1)
IAC (CA) (1)
BIOINFORMATICA UNICAMP (CA)
UMC (MC) (1)
UNICAMP (CA) (1)
USP (SP) (3)
IAC (CO) (1)
UNESP (BT) (2)
UNESP (RC) (1)
UNAERP (RP) (1)
USP (SC) (1)
ABI 377-96
RIO DE JANEIRO
PERNAMBUCO
ALAGOAS
20
EST Expressed Sequence Tag
266.016 clones
291.689 reads 260.352 clones
21
Limpeza das seqüências
  • remoção de seqüências ribossômicas
  • remoção de seqüências de vetor
  • remoção da região de poliA
  • corte por qualidade
  • eliminação das derrapagens

22
poliA
AGGGGAGAATTTATGATCCCCTAGTACACCCGGCAGGACCGGTCCGGAAT
TCCCCGGTCGACCCAC GCGTCCGCTACAACAACAGCAGCAGCTTCCATT
TACCTTGTCGGCTGTTGCAACCGCTGCTGCCTA
CCACCAGCAACTACAGCTGCTACCAGTTAACCCATTGGCACTGGCTAACC
CATTGGCTGCTGCCTT CCTGCAGCAGCAACAATTGCTGCCATTCAACCA
GATGTCTTTGATGAACCCTGCCTTGTCGTGGTA
GCAACCCATCGTTGGAGGTGCCATCTTCTAGAATACAAATGAGTTGTACT
TGATAACAATGTTCTT GTGTCGGCGTGTGCAACTTCCCAGAAATAATCA
ATACATTGATTGAGATTTANAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAATATAATTAAAATAAAAAAATTTATAA
AAAAAAAAAAATAATT TTTTTTTATAAAAAATAAATATAAAATAAAAAG
GGGGGGCCGTTTTAAAGGAACAAAGTTTAAGAC
CGGGGGTATGAAAGGGAAAATTTTTTTATATAGGGCCCCAAAATTAAATA
CATGGGCCGGTGTTAA CAACGGCGGGAGGGAAAAAACCTGGGGGTTACC
AATTTAAAGCCGTGGAAAAAATCCCTTTTTTCA
AGTGGGGTAAAAAGAAAAGGCCCCACCCATCGCCCTTCCAAAAATTGCCC
CCCTTAAAGGAAAAAG GACACCCCCTTTTGGGCGCATATAACCGGGGGG
GTGGGGGTACCCCCAAGGGAACTTATATTTTTC
AGGCCTCATAGCCCTTTTTTTTTTTTTTTTTTTTTTTTTCAAGGTAGCGG
GTTTCCCAGGAAAATT AAAAGGGGGGTCCTTTTGGGTAATAATGTTTTN

23
poliA
AGGGGAGAATTTATGATCCCCTAGTACACCCGGCAGGACCGGTCCGGAAT
TCCCCGGTCGACCCAC GCGTCCGCTACAACAACAGCAGCAGCTTCCATT
TACCTTGTCGGCTGTTGCAACCGCTGCTGCCTA
CCACCAGCAACTACAGCTGCTACCAGTTAACCCATTGGCACTGGCTAACC
CATTGGCTGCTGCCTT CCTGCAGCAGCAACAATTGCTGCCATTCAACCA
GATGTCTTTGATGAACCCTGCCTTGTCGTGGTA
GCAACCCATCGTTGGAGGTGCCATCTTCTAGAATACAAATGAGTTGTACT
TGATAACAATGTTCTT GTGTCGGCGTGTGCAACTTCCCAGAAATAATCA
ATACATTGATTGAGATTTANAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAATATAATTAAAATAAAAAAATTTATAA
AAAAAAAAAAATAATT TTTTTTTATAAAAAATAAATATAAAATAAAAAG
GGGGGGCCGTTTTAAAGGAACAAAGTTTAAGAC
CGGGGGTATGAAAGGGAAAATTTTTTTATATAGGGCCCCAAAATTAAATA
CATGGGCCGGTGTTAA CAACGGCGGGAGGGAAAAAACCTGGGGGTTACC
AATTTAAAGCCGTGGAAAAAATCCCTTTTTTCA
AGTGGGGTAAAAAGAAAAGGCCCCACCCATCGCCCTTCCAAAAATTGCCC
CCCTTAAAGGAAAAAG GACACCCCCTTTTGGGCGCATATAACCGGGGGG
GTGGGGGTACCCCCAAGGGAACTTATATTTTTC
AGGCCTCATAGCCCTTTTTTTTTTTTTTTTTTTTTTTTTCAAGGTAGCGG
GTTTCCCAGGAAAATT AAAAGGGGGGTCCTTTTGGGTAATAATGTTTTN

24
Quality trimming
CGGAAGACTGGAGTCGTCGCTGCGGCACCGGTCCGGAATTCCCGGGTCGA
CCCACGCGTCCGGCCG CCGCCACCGCATCCCTTGCAGCCCCAATCCCCC
ACGGCGACCATGGCCGGCGCGCAGGAGTCCCTG
TCCCTGGTGGGCACGATGCGTGGCCACAACGGCGAGGTGACGGCGATCGC
CACCCCGATCGACAAC TCGCCGTTCATCGTCTCCTCCTCCCGCGACAAG
TCCGTGCTGGTGTGGGACCTGCAAAACCCGGTC
CACTCCACCCCGGAATCCGGCGCCACCGCCGACTACGGCGTCCCCTTCCG
CCGCCTCACCGGCCAC TCCCACTTCGTCCAGGACGTCGTCCTCAGCTCC
GACGGCCAGTTCGCCCTCTCCGGCTCCTGGGAC
GGCGAGCTCCGCCTCTGGGACCTCTCCACCGGCGTCACCACCCGCCGCTT
CGTCGGCCACGAGAAG GACGTCCTCTCCGTCGCCTTCTCCGTCGACAAC
CGCCAGATCGTCTCCGCGTCCCGCGACAAGACC
ATCAAGCTCTGGAACACCCTCGGTGAGTGCAAGTACACCATTGGTGGCGA
CCTCGGCGGCGGGGAG GGCCACAACGGGTGGGTCTCCTGCGTCAGGTTC
TTCCCCAACACCTTTCAGGCCACCATTGTCTCC
GGATTCTGGGACCGCACCGTCAGGTCTGGAACCTTACCAACTGCAAGCTG
CGATGCACTCTCGATG CCCACGCGGCTATGTTAACGCCGTCGCC
25
Quality trimming
CGGAAGACTGGAGTCGTCGCTGCGGCACCGGTCCGGAATTCCCGGGTCGA
CCCACGCGTCCGGCCG CCGCCACCGCATCCCTTGCAGCCCCAATCCCCC
ACGGCGACCATGGCCGGCGCGCAGGAGTCCCTG
TCCCTGGTGGGCACGATGCGTGGCCACAACGGCGAGGTGACGGCGATCGC
CACCCCGATCGACAAC TCGCCGTTCATCGTCTCCTCCTCCCGCGACAAG
TCCGTGCTGGTGTGGGACCTGCAAAACCCGGTC
CACTCCACCCCGGAATCCGGCGCCACCGCCGACTACGGCGTCCCCTTCCG
CCGCCTCACCGGCCAC TCCCACTTCGTCCAGGACGTCGTCCTCAGCTCC
GACGGCCAGTTCGCCCTCTCCGGCTCCTGGGAC
GGCGAGCTCCGCCTCTGGGACCTCTCCACCGGCGTCACCACCCGCCGCTT
CGTCGGCCACGAGAAG GACGTCCTCTCCGTCGCCTTCTCCGTCGACAAC
CGCCAGATCGTCTCCGCGTCCCGCGACAAGACC
ATCAAGCTCTGGAACACCCTCGGTGAGTGCAAGTACACCATTGGTGGCGA
CCTCGGCGGCGGGGAG GGCCACAACGGGTGGGTCTCCTGCGTCAGGTTC
TTCCCCAACACCTTTCAGGCCACCATTGTCTCC
GGATTCTGGGACCGCACCGTCAGGTCTGGAACCTTACCAACTGCAAGCTG
CGATGCACTCTCGATG CCCACGCGGCTATGTTAACGCCGTCGC
753 bases
ACGTX lt10 ACGTX gt10 and lt15 ACGTX gt15 and
lt20 ACGTX gt20 and lt25 ACGTX gt25 and
lt30 ACGTX gt30
26
Quality trimming
CGGAAGACTGGAGTCGTCGCTGCGGCACCGGTCCGGAATTCCCGGGTCGA
CCCACGCGTCCGGCCG CCGCCACCGCATCCCTTGCAGCCCCAATCCCCC
ACGGCGACCATGGCCGGCGCGCAGGAGTCCCTG
TCCCTGGTGGGCACGATGCGTGGCCACAACGGCGAGGTGACGGCGATCGC
CACCCCGATCGACAAC TCGCCGTTCATCGTCTCCTCCTCCCGCGACAAG
TCCGTGCTGGTGTGGGACCTGCAAAACCCGGTC
CACTCCACCCCGGAATCCGGCGCCACCGCCGACTACGGCGTCCCCTTCCG
CCGCCTCACCGGCCAC TCCCACTTCGTCCAGGACGTCGTCCTCAGCTCC
GACGGCCAGTTCGCCCTCTCCGGCTCCTGGGAC
GGCGAGCTCCGCCTCTGGGACCTCTCCACCGGCGTCACCACCCGCCGCTT
CGTCGGCCACGAGAAG GACGTCCTCTCCGTCGCCTTCTCCGTCGACAAC
CGCCAGATCGTCTCCGCGTCCCGCGACAAGACC
ATCAAGCTCTGGAACACCCTCGGTGAGTGCAAGTACACCATTGGTGGCGA
CCTCGGCGGCGGGGAG GGCCACAACGGGTGGGTCTCCTGCGT
618 bases
ACGTX lt10 ACGTX gt10 and lt15 ACGTX gt15 and
lt20 ACGTX gt20 and lt25 ACGTX gt25 and
lt30 ACGTX gt30
27
Resultado de blastX
read trimmado
gtgi1346109spP49027GBLP_ORYSA GUANINE
NUCLEOTIDE-BINDING PROTEIN BETA
SUBUNIT-LIKE PROTEIN (GPB-LR) (RWD) pirT03764
protein RWD - rice dbjBAA07404.1 (D38231) RWD
Oryza sativa Length 334 Score
315 bits (798), Expect 4e-85 Identities
150/170 (88), Positives 156/170 (91) Frame
1 Query 109 MAGAQESLSLVGTMRGHNGEVTAIATPIDNSPFIV
SSSRDKSVLVWDLQNPVHSTPESGA 288 MAGAQESL
L G M GHN VTAIATPIDNSPFIVSSSRDKSLVWDL NPV E
Sbjct 1 MAGAQESLVLAGVMHGHNDVVTAIATPIDNSPFIVSS
SRDKSLLVWDLTNPVQNVGEGAG 60 Query 289
TADYGVPFRRLTGHSHFVQDVVLSSDGQFALSGSWDGELRLWDLSTGVTT
RRFVGHEKDV 468 YGVPFRRLTGHSHFVQDVVLS
SDGQFALSGSWDGELRLWDLSTGVTTRRFVGHKDV Sbjct 61
ASEYGVPFRRLTGHSHFVQDVVLSSDGQFALSGSWDGELRLWDLSTGVTT
RRFVGHDKDV 120 Query 469 LSVAFSVDNRQIVSASRDKTIKL
WNTLGECKYTIGGDLGGGEGHNGWVSC 618
LSVAFSVDNRQIVSASRDTIKLWNTLGECKYTIGGDLGGGEGHNGWVSC
Sbjct 121 LSVAFSVDNRQIVSASRDRTIKLWNTLGECKYTIGGDL
GGGEGHNGWVSC 170
28
Resultado de blastX
read inteiro
gtgi1346109spP49027GBLP_ORYSA GUANINE
NUCLEOTIDE-BINDING PROTEIN BETA
SUBUNIT-LIKE PROTEIN (GPB-LR) (RWD) pirT03764
protein RWD - rice dbjBAA07404.1 (D38231) RWD
Oryza sativa Length 334 Score
352 bits (893), Expect(2) e-100 Identities
168/192 (87), Positives 175/192 (90) Frame
1 Query 109 MAGAQESLSLVGTMRGHNGEVTAIATPIDNSPFIV
SSSRDKSVLVWDLQNPVHSTPESGA 288 MAGAQESL
L G M GHN VTAIATPIDNSPFIVSSSRDKSLVWDL NPV E
Sbjct 1 MAGAQESLVLAGVMHGHNDVVTAIATPIDNSPFIVSS
SRDKSLLVWDLTNPVQNVGEGAG 60 Query 289
TADYGVPFRRLTGHSHFVQDVVLSSDGQFALSGSWDGELRLWDLSTGVTT
RRFVGHEKDV 468 YGVPFRRLTGHSHFVQDVVLS
SDGQFALSGSWDGELRLWDLSTGVTTRRFVGHKDV Sbjct 61
ASEYGVPFRRLTGHSHFVQDVVLSSDGQFALSGSWDGELRLWDLSTGVTT
RRFVGHDKDV 120 Query 469 LSVAFSVDNRQIVSASRDKTIKL
WNTLGECKYTIGGDLGGGEGHNGWVSCVRFFPNTFQA 648
LSVAFSVDNRQIVSASRDTIKLWNTLGECKYTIGGDLGGGEGHNGW
VSCVRF PNTFQ Sbjct 121 LSVAFSVDNRQIVSASRDRTIKLWN
TLGECKYTIGGDLGGGEGHNGWVSCVRFSPNTFQP 180 Query
649 TIVSGFWDRTVR 684 TIVSG
WDRTV Sbjct 181 TIVSGSWDRTVK 192
29
Determinação do limiar de qualidade
30
Quality trimming
CGGAAGACTGGAGTCGTCGCTGCGGCACCGGTCCGGAATTCCCGGGTCGA
CCCACGCGTCCGGCCG CCGCCACCGCATCCCTTGCAGCCCCAATCCCCC
ACGGCGACCATGGCCGGCGCGCAGGAGTCCCTG
TCCCTGGTGGGCACGATGCGTGGCCACAACGGCGAGGTGACGGCGATCGC
CACCCCGATCGACAAC TCGCCGTTCATCGTCTCCTCCTCCCGCGACAAG
TCCGTGCTGGTGTGGGACCTGCAAAACCCGGTC
CACTCCACCCCGGAATCCGGCGCCACCGCCGACTACGGCGTCCCCTTCCG
CCGCCTCACCGGCCAC TCCCACTTCGTCCAGGACGTCGTCCTCAGCTCC
GACGGCCAGTTCGCCCTCTCCGGCTCCTGGGAC
GGCGAGCTCCGCCTCTGGGACCTCTCCACCGGCGTCACCACCCGCCGCTT
CGTCGGCCACGAGAAG GACGTCCTCTCCGTCGCCTTCTCCGTCGACAAC
CGCCAGATCGTCTCCGCGTCCCGCGACAAGACC
ATCAAGCTCTGGAACACCCTCGGTGAGTGCAAGTACACCATTGGTGGCGA
CCTCGGCGGCGGGGAG GGCCACAACGGGTGGGTCTCCTGCGTCAGGTTC
TTCCCCAACACCTTTCAGGCCACCATTGTCTCC
GGATTCTGGGACCGCACCGTCAGGTCTGGAACCTTACCAACTGCAAGCTG
CGATGCACTCTCGATG CCCACGCGGCTATGTTAACGCCGTCGCC
31
Quality trimming
CGGAAGACTGGAGTCGTCGCTGCGGCACCGGTCCGGAATTCCCGGGTCGA
CCCACGCGTCCGGCCG CCGCCACCGCATCCCTTGCAGCCCCAATCCCCC
ACGGCGACCATGGCCGGCGCGCAGGAGTCCCTG
TCCCTGGTGGGCACGATGCGTGGCCACAACGGCGAGGTGACGGCGATCGC
CACCCCGATCGACAAC TCGCCGTTCATCGTCTCCTCCTCCCGCGACAAG
TCCGTGCTGGTGTGGGACCTGCAAAACCCGGTC
CACTCCACCCCGGAATCCGGCGCCACCGCCGACTACGGCGTCCCCTTCCG
CCGCCTCACCGGCCAC TCCCACTTCGTCCAGGACGTCGTCCTCAGCTCC
GACGGCCAGTTCGCCCTCTCCGGCTCCTGGGAC
GGCGAGCTCCGCCTCTGGGACCTCTCCACCGGCGTCACCACCCGCCGCTT
CGTCGGCCACGAGAAG GACGTCCTCTCCGTCGCCTTCTCCGTCGACAAC
CGCCAGATCGTCTCCGCGTCCCGCGACAAGACC
ATCAAGCTCTGGAACACCCTCGGTGAGTGCAAGTACACCATTGGTGGCGA
CCTCGGCGGCGGGGAG GGCCACAACGGGTGGGTCTCCTGCGT
618 bases
ACGTX lt10 ACGTX gt10 and lt15 ACGTX gt15 and
lt20 ACGTX gt20 and lt25 ACGTX gt25 and
lt30 ACGTX gt30
32
Quality trimming
CGGAAGACTGGAGTCGTCGCTGCGGCACCGGTCCGGAATTCCCGGGTCGA
CCCACGCGTCCGGCCG CCGCCACCGCATCCCTTGCAGCCCCAATCCCCC
ACGGCGACCATGGCCGGCGCGCAGGAGTCCCTG
TCCCTGGTGGGCACGATGCGTGGCCACAACGGCGAGGTGACGGCGATCGC
CACCCCGATCGACAAC TCGCCGTTCATCGTCTCCTCCTCCCGCGACAAG
TCCGTGCTGGTGTGGGACCTGCAAAACCCGGTC
CACTCCACCCCGGAATCCGGCGCCACCGCCGACTACGGCGTCCCCTTCCG
CCGCCTCACCGGCCAC TCCCACTTCGTCCAGGACGTCGTCCTCAGCTCC
GACGGCCAGTTCGCCCTCTCCGGCTCCTGGGAC
GGCGAGCTCCGCCTCTGGGACCTCTCCACCGGCGTCACCACCCGCCGCTT
CGTCGGCCACGAGAAG GACGTCCTCTCCGTCGCCTTCTCCGTCGACAAC
CGCCAGATCGTCTCCGCGTCCCGCGACAAGACC
ATCAAGCTCTGGAACACCCTCGGTGAGTGCAAGTACACCATTGGTGGCGA
CCTCGGCGGCGGGGAG GGCCACAACGGGTGGGTCTCCTGCGTCAGGTTC
TTCCCCAACACCTTTCAGGCCACCATTGTCTCC
GGATTCTGGGACCGCACCGTCAGGTCTGGAACCTTACCAACTGCAAGCTG
CGATGCACTCTCGATG CCCACGCGGCTATGTTAACGCCGTCGCC
33
Quality trimming
CGGAAGACTGGAGTCGTCGCTGCGGCACCGGTCCGGAATTCCCGGGTCGA
CCCACGCGTCCGGCCG CCGCCACCGCATCCCTTGCAGCCCCAATCCCCC
ACGGCGACCATGGCCGGCGCGCAGGAGTCCCTG
TCCCTGGTGGGCACGATGCGTGGCCACAACGGCGAGGTGACGGCGATCGC
CACCCCGATCGACAAC TCGCCGTTCATCGTCTCCTCCTCCCGCGACAAG
TCCGTGCTGGTGTGGGACCTGCAAAACCCGGTC
CACTCCACCCCGGAATCCGGCGCCACCGCCGACTACGGCGTCCCCTTCCG
CCGCCTCACCGGCCAC TCCCACTTCGTCCAGGACGTCGTCCTCAGCTCC
GACGGCCAGTTCGCCCTCTCCGGCTCCTGGGAC
GGCGAGCTCCGCCTCTGGGACCTCTCCACCGGCGTCACCACCCGCCGCTT
CGTCGGCCACGAGAAG GACGTCCTCTCCGTCGCCTTCTCCGTCGACAAC
CGCCAGATCGTCTCCGCGTCCCGCGACAAGACC
ATCAAGCTCTGGAACACCCTCGGTGAGTGCAAGTACACCATTGGTGGCGA
CCTCGGCGGCGGGGAG GGCCACAACGGGTGGGTCTCCTGCGTCAGGTTC
TTCCCCAACACCTTTCAGGCCACCATTGTCTCC
GGATTCTGGGACCGCACCGTCAGGTCTGGAACCTTACCAACTGCAAGCTG
CGATGCACTCTCGATG CCCACGCGGCTATGTTAACGCCGTCGCC
719 bases
antes dif. homol. dif. depois
618 - 66 684 35 719
34
Determinação do limiar de qualidade
35
Exemplo de derrapagem
36
todos os reads 291,689 reads 864.5 186.3
comprimento médio 399.5 161.3 médio bases gt
20/read
37
cluster size (reads) cluster size (reads) HS X phrap X CAP3 X HS total common
1 32202 32202 13731 18535 11634 16838 14296 10744
2 12440 12440 5617 9207 4869 7665 4852 3792
3 6752 6752 2402 5192 2151 4193 1984 1441
4 4225 4225 1239 3329 1145 2709 992 697
5 2856 2856 676 2360 700 1872 521 344
6 2098 2098 442 1806 482 1452 354 231
7 1582 1582 288 1362 317 1115 220 144
8 1245 1245 202 1091 242 862 153 99
9 974 974 156 913 186 720 113 72
10 776 776 105 752 143 634 74 44
11 639 639 76 607 99 511 54 30
12 492 492 71 547 99 429 46 32
13 437 437 47 454 90 400 40 25
14 366 366 42 391 40 341 26 13
15 306 306 31 390 50 295 18 11
16 273 273 25 279 35 275 18 8
17 225 225 15 273 23 235 11 4
18 177 177 11 227 15 191 5 2
19 124 124 6 177 18 176 5 3
gt20 1192 1192 40 1814 87 2228 23 12
total 69381 69381 25222 49706 22425 43141 23805 17748
38
Discrepância interna
39
Discrepância interna
40
Teste de consistência interna
41
Teste de consistência interna
42
Teste de consistência interna
43
Teste de consistência externa
44
Teste de consistência externa
45
Teste de consistência externa
46
Números totais
Total sequences 291,689
cDNA clones sequenced (5or 3) 260,352
5 end sequences 259,325
3 end sequences 32,364
Total high-quality sequences 237,954
Success index () 81.6
Average insert size (bp) 1,250
Average sequence size (bp) 864 / 642
Bases with phred quality 20/read 399
47
Números totais
Trotal sequences analyzed 237,954
Number of contigs 26,803
Number of singletons 16,338
Number of sugarcane assembled sequences (SAS) 43,141
Number of assembled sequences matching to known genes 27,833 (64.5)
Number of clones with full length inserts 14,409 (
48
Contribuição específica por biblioteca
Número de ESTs SAS contigs singletons contribuição contribuição
AD1 8,137 1,474 1,200 1,200 3.4
AM1 5,991 841 664 664 1.9
AM2 6,629 982 705 705 2.3
CL6 3,511 595 467 467 1.4
FL1 8,412 1,753 1,465 1,465 4.1
FL3 5,714 840 667 667 1.9
FL4 7,289 1,082 886 886 2.5
FL5 5,115 861 744 744 2.0
FL8 3,362 378 337 337 0.9
HR1 5,070 717 519 519 1.7
LB1 3,699 459 369 369 1.1
LB2 5,402 790 650 650 1.8
LR1 6,653 984 819 819 2.3
LR2 2,329 299 254 254 0.7
LV1 3,068 384 327 327 0.9
RT1 4,227 569 484 484 1.3
RT2 5,819 942 728 728 2.2
RT3 4,356 614 478 478 1.4
RZ1 2,012 205 175 175 0.5
RZ2 3,177 385 301 301 0.9
RZ3 6,528 929 752 752 2.1
SB1 7,407 1,313 1,132 1,132 3.0
SD1 4,459 792 642 642 1.8
SD2 4,099 857 632 632 2.0
ST1 4,359 645 523 523 1.5
ST3 4,519 507 418 418 1.2
  • 47 dos SAS são formados por reads oriúndos de
    uma única biblioteca
  • 38 dos SAS tecido-especícos são singletons

49
Classificação funcional
50
Porcentagem por órgão
51
SAStecido- específicas
Número de ESTs Melhor hit biblioteca
360 (Y17556) alpha kafirin Sorghum bicolor SD
103 (A23207) zein zA1 Zea mays SD
42 (AF232008) beta-glucosidase aggregating factor precursor Zea mays RT
24 (AC007789) putative low molecular early light-inducible protein Oryza sativa SD
22 (AP002820) putative peroxidase Oryza sativa RT
19 (X56337) alpha-amylase Oryza sativa CL
18 (AP000374) cyclopropane fatty acid synthase Arabidopsis thaliana FL
52
GenBank - dbEST Março de 1998
  • Total de Entradas 1,528,715
  • Homo sapiens 967,015
    (63,4)
  • Plantas (total) 73,087
    (4.8)
  • Mus musculus domesticus (camundongo) 306,544
  • Caenorhabditis elegans 72,521
  • Arabidopsis thaliana 36,173
  • Drosophila melanogaster 27,625
  • Oryza sativa (arroz) 25,844
  • Rattus sp. (rato) 20,311
  • Brugia malayi (nematoide parasita) 13,641
  • Toxoplasma gondii 10,671
  • Emericella nidulans 5,787
  • Schistosoma mansoni 3,659
  • Trypanosoma brucei rhodesiense 3,519
  • Danio rerio (zebrafish) 3,373
  • Saccharomyces cerevisiae 3,042

53
GenBank - dbEST Março de 2001
  • Total de Entradas 7,692,809
  • Homo sapiens 3,369,459
    (43.8)
  • Plantas (total) 1,099,102 (14.3
    )
  • Glycine max (soja) 160,500
  • Arabidopsis thaliana 113,000
  • Medicago truncatula (barrel medic) 112,458
  • Lycopersicon esculentum (tomate) 107,226
  • Zea mays (milho) 86,999
  • Oryza sativa (arroz) 72,657
  • Hordeum vulgare (cevada)
    68,480
  • Chlamydomonas reinhardtii 64,973
  • Sorghum bicolor 62,642
  • Triticum aestivum (trigo)
    58,141
  • Pinus taeda (loblolly pine) 34,896
  • Lotus japonicus 27,078
  • Solanum tuberosum (batata)
    26,177
  • Gossypium arboreum 20,978

54
GenBank - dbEST Setembro de 2002
  • Total de Entradas 12,845,578
  • Homo sapiens 4,691,979
    (36.5)
  • Plantas (total) 2,279,170 (17.4
    )
  • Glycine max (soja) 284,714
  • Triticum aestivum (trigo)
    256,593
  • Hordeum vulgare (cevada)
    240,882
  • Zea mays (milho) 180,587
  • Arabidopsis thaliana 174,624
  • Medicago truncatula (barrel medic) 170,500
  • Lycopersicon esculentum (tomate) 148,346
  • Chlamydomonas reinhardtii 130,324
  • Oryza sativa (arroz) 108,429
  • Solanum tuberosum (batata) 94,420
  • Sorghum bicolor 84,712
  • Lactuca sativa (alface) 68,188
  • Pinus taeda (loblolly pine) 60,226
  • Physcomitrella patens 50,250

55
Genetics and Molecular Biology
  1. The libraries that made SUCEST
  2. Bioinformatics of the sugarcane EST project
  3. Trimming and clustering sugarcane ESTs
  4. The sugarcane signal transduction (SUCAST)
    catalogue prospecting signal transduction in
    sugarcane
  5. In silico characterization and expression
    analyses of sugarcane putative sucrose
    non-fermenting-1 (SNF1) related kinases
  6. Identification of 14-3-3-like protein in
    sugarcane (Saccharum officinarum)
  7. A search for homologues of plant photoreceptor
    genes and their signaling partners in the
    sugarcane expressed sequence tag (Sucest)
    database
  8. Phylogenetic relationships between Arabidopsis
    and sugarcane bZIP transcriptional regulatory
    factors
  9. Identification of sugarcane cDNAs encoding
    components of the cell cycle machinery
  10. Dissecting the sugarcane expressed sequence tag
    (SUCEST) database unraveling flower-specific
    genes
  11. Molecular chaperone genes in the sugarcane
    expressed sequence database (SUCEST)
  12. Oxidative stress response in sugarcane
  13. In silico differential display of defense-related
    expressed sequence tags from sugarcane tissues
    infected with diazotrophic endophytes
  14. Mechanisms of sugarcane response to herbivory
  15. Base excision repair in sugarcane
  1. Preliminary analysis of microsatellite markers
    derived from sugarcane expressed sequence tags
    (ESTs)
  2. Sequence polymorphism from EST data in sugarcane
    a fine analysis of 6-phosphogluconate
    dehydrogenase genes
  3. A search for markers of sugarcane evolution
  4. Sugarcane genes related to mitochondrial function
  5. Mitochondrial and chloroplast localization of
    FtsH-like proteins in sugarcane based on their
    phylogenetic profile
  6. Patterns of expression of cell wall related genes
    in sugacane
  7. Expression of sugarcane genes induced by
    inoculation with Gluconacetobacter diazotrophicus
    and Herbaspirillum rubrisubalbicans
  8. Identifying sugarcane expressed sequences
    associated with nutrient transporters and peptide
    metal chelators
  9. Prospecting sugarcane genes involved in aluminum
    tolerance
  10. N-glycosylation in sugarcane
  11. Sugarcane expressed sequences tags (ESTs)
    encoding enzymes involved in lignin biosynthesis
    pathways
  12. Biosynthesis of secondary metabolites in
    sugarcane
  13. Identification of sugarcane genes involved in the
    purine synthesis pathway
  14. A new member of the chalcone synthase (CHS)
    family in sugarcane
  15. Classification. expression pattern and
    comparative analysis of sugarcane expressed
    analysis of sugarcane expressed sequences tags
    (ESTs) encoding glycine-rich proteins (GRPs)
  16. Identification. classification and expression
    pattern analysis of sugarcane cysteine
    proteinases
  17. Identification of metalloprotease gene families
    in sugarcane
  18. Sugarcane phytocystatins Identification.
    classification and expression pattern analysis
  1. DNA repair-related genes in sugarcane expressed
    sequence tags (ESTs)
  2. Distribution of DNA repair-related ESTs in
    sugarcane
  3. Survey of transposable elements in sugarcane
    expressed sequence tags (ESTs)

56
Genetics and Molecular Biology
http//www.sbg.org.br/revista24_index.htm
57
Grupo do SUCEST
58
Uma parte do LBI
59
Uma parte do LBI
60
Os trimmadores
61
Grupo Genoma - CBMEG
62
Grupo Genoma - CBMEG
felipes_at_cenargen.embrapa.br
http//www.lbi.ic.unicamp.br/
63
www.laerte.com.br
64
(No Transcript)
65
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com