Title: SUCEST:
1SUCEST
o projeto genoma da cana-de-açúcar.
- Felipe Rodrigues da Silva
- Embrapa Recursos Genéticos e Biotecnologia
2Volume de dados disponíveis publicamente
3Volume de dados disponíveis publicamente
http//www.ncbi.nlm.nih.gov/Genbank/genbankstats.h
tml
4Genomas completos de organismos
http//wit.integratedgenomics.com/GOLD/
5(No Transcript)
6(No Transcript)
7SOPÃO
de letrinhas...
8Cana-de-açúcar
- Cultivada em mais de 90 países
- Ocupando cerca de 20 milhões de hectares
- Família das Gramíneas (Poace)
http//apps.fao.org
9A cana-de-açúcar no Brasil
- 25 da produção mundial
- 300 milhões de tons.
- 5 milhões de hectares plantados
- 14.5 milhões de tons. de açúcar
- 15.3 bilhões de litros de álcool
- 350 industrias
- 50 mil produtores
- 1.4 milhões de empregos direto
- 3.6 milhões de empregos indiretos
10Origem e tamanho
- Saccharum officinarum
- 2n 80
- Saccharum spontaneum
- 2n 64 ou 2n 112
- 10 25
X
S. berberi, S. sinence, S. robustum
11Projeto Genoma
Estrutural Funcional
- Seqüenciamento Completo do Genoma
- Região Gênica e Região Intergênica
- EST Expressed Sequence Tag
- Regiões que codificam proteínas (Genes)
12Seqüenciamento Completo
Biblioteca de BACs
Mapa físico
BAC a ser seqüenciado
Genomic DNA
Clones Shotgun
...ATGTTGGGCCACAGTTGACCATTGAAACTG
Seqüência
GTTGACCATTGAAACTGACCTTGACGTAACGTGGTA....
13EST Expressed Sequence Tag
14(No Transcript)
15GenBank - dbEST Março de 1998
- Total de Entradas 1,528,715
- Homo sapiens 967,015
(63,4) - Plantas (total) 73,087
(4.8) - Mus musculus domesticus (camundongo) 306,544
- Caenorhabditis elegans 72,521
- Arabidopsis thaliana 36,173
- Drosophila melanogaster 27,625
- Oryza sativa (arroz) 25,844
- Rattus sp. (rato) 20,311
- Brugia malayi (nematoide parasita) 13,641
- Toxoplasma gondii 10,671
- Emericella nidulans 5,787
- Schistosoma mansoni 3,659
- Trypanosoma brucei rhodesiense 3,519
- Danio rerio (zebrafish) 3,373
- Saccharomyces cerevisiae 3,042
16Os Objetivos do projeto SUCEST
- Identificar 50.000 genes únicos
- (ou seqüenciar 300.000 ESTs)
- Desenvolver um Banco de Dados para a
cana-de-açúcar - Disponibiilizar este Banco de Dados para grupos
de Data Mining - Análise funcional dos ESTs
17O Cronograma
- Data Meta
- Jul/1999 Distribuição dos Primeiros Clones
- Dec/1999 20,000 ESTs
- Jul/2000 60,000 ESTs
- Dec/2000 100,000 ESTs
- Jul/2001 140,000 ESTs
- Dec/2001 180,000 ESTs
- Jul/2002 220,000 ESTs
- Dec/2002 260,000 ESTs
- Jul/2003 300,000 ESTs
18As Bibliotecas de cDNA
- Tecidos / Órgãos
- Raiz
- Meristema
- Caule
- Sementes
- Flores
- Cartucho da Folha
- Zona de Transição Folha-Raiz
- Gema Lateral
- Calli
- Plântulas imaturas
- Plântulas infectadas com Herbaspirillum
rubrisubalbicans - Plântulas infectadas com Gluconacetobacter
diazotroficans
- Variedades
- SP80-3280
- SP70-1143
- SP80-87432
- RB 845298
- RB 805028
- PB5211 X P57150-4
19Os Laboratórios de Seqüenciamento
UFSCAR (SC) (1)
IAC (CA) (1)
BIOINFORMATICA UNICAMP (CA)
UMC (MC) (1)
UNICAMP (CA) (1)
USP (SP) (3)
IAC (CO) (1)
UNESP (BT) (2)
UNESP (RC) (1)
UNAERP (RP) (1)
USP (SC) (1)
ABI 377-96
RIO DE JANEIRO
PERNAMBUCO
ALAGOAS
20EST Expressed Sequence Tag
266.016 clones
291.689 reads 260.352 clones
21Limpeza das seqüências
- remoção de seqüências ribossômicas
- remoção de seqüências de vetor
- remoção da região de poliA
- corte por qualidade
- eliminação das derrapagens
22poliA
AGGGGAGAATTTATGATCCCCTAGTACACCCGGCAGGACCGGTCCGGAAT
TCCCCGGTCGACCCAC GCGTCCGCTACAACAACAGCAGCAGCTTCCATT
TACCTTGTCGGCTGTTGCAACCGCTGCTGCCTA
CCACCAGCAACTACAGCTGCTACCAGTTAACCCATTGGCACTGGCTAACC
CATTGGCTGCTGCCTT CCTGCAGCAGCAACAATTGCTGCCATTCAACCA
GATGTCTTTGATGAACCCTGCCTTGTCGTGGTA
GCAACCCATCGTTGGAGGTGCCATCTTCTAGAATACAAATGAGTTGTACT
TGATAACAATGTTCTT GTGTCGGCGTGTGCAACTTCCCAGAAATAATCA
ATACATTGATTGAGATTTANAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAATATAATTAAAATAAAAAAATTTATAA
AAAAAAAAAAATAATT TTTTTTTATAAAAAATAAATATAAAATAAAAAG
GGGGGGCCGTTTTAAAGGAACAAAGTTTAAGAC
CGGGGGTATGAAAGGGAAAATTTTTTTATATAGGGCCCCAAAATTAAATA
CATGGGCCGGTGTTAA CAACGGCGGGAGGGAAAAAACCTGGGGGTTACC
AATTTAAAGCCGTGGAAAAAATCCCTTTTTTCA
AGTGGGGTAAAAAGAAAAGGCCCCACCCATCGCCCTTCCAAAAATTGCCC
CCCTTAAAGGAAAAAG GACACCCCCTTTTGGGCGCATATAACCGGGGGG
GTGGGGGTACCCCCAAGGGAACTTATATTTTTC
AGGCCTCATAGCCCTTTTTTTTTTTTTTTTTTTTTTTTTCAAGGTAGCGG
GTTTCCCAGGAAAATT AAAAGGGGGGTCCTTTTGGGTAATAATGTTTTN
23poliA
AGGGGAGAATTTATGATCCCCTAGTACACCCGGCAGGACCGGTCCGGAAT
TCCCCGGTCGACCCAC GCGTCCGCTACAACAACAGCAGCAGCTTCCATT
TACCTTGTCGGCTGTTGCAACCGCTGCTGCCTA
CCACCAGCAACTACAGCTGCTACCAGTTAACCCATTGGCACTGGCTAACC
CATTGGCTGCTGCCTT CCTGCAGCAGCAACAATTGCTGCCATTCAACCA
GATGTCTTTGATGAACCCTGCCTTGTCGTGGTA
GCAACCCATCGTTGGAGGTGCCATCTTCTAGAATACAAATGAGTTGTACT
TGATAACAATGTTCTT GTGTCGGCGTGTGCAACTTCCCAGAAATAATCA
ATACATTGATTGAGATTTANAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAATATAATTAAAATAAAAAAATTTATAA
AAAAAAAAAAATAATT TTTTTTTATAAAAAATAAATATAAAATAAAAAG
GGGGGGCCGTTTTAAAGGAACAAAGTTTAAGAC
CGGGGGTATGAAAGGGAAAATTTTTTTATATAGGGCCCCAAAATTAAATA
CATGGGCCGGTGTTAA CAACGGCGGGAGGGAAAAAACCTGGGGGTTACC
AATTTAAAGCCGTGGAAAAAATCCCTTTTTTCA
AGTGGGGTAAAAAGAAAAGGCCCCACCCATCGCCCTTCCAAAAATTGCCC
CCCTTAAAGGAAAAAG GACACCCCCTTTTGGGCGCATATAACCGGGGGG
GTGGGGGTACCCCCAAGGGAACTTATATTTTTC
AGGCCTCATAGCCCTTTTTTTTTTTTTTTTTTTTTTTTTCAAGGTAGCGG
GTTTCCCAGGAAAATT AAAAGGGGGGTCCTTTTGGGTAATAATGTTTTN
24Quality trimming
CGGAAGACTGGAGTCGTCGCTGCGGCACCGGTCCGGAATTCCCGGGTCGA
CCCACGCGTCCGGCCG CCGCCACCGCATCCCTTGCAGCCCCAATCCCCC
ACGGCGACCATGGCCGGCGCGCAGGAGTCCCTG
TCCCTGGTGGGCACGATGCGTGGCCACAACGGCGAGGTGACGGCGATCGC
CACCCCGATCGACAAC TCGCCGTTCATCGTCTCCTCCTCCCGCGACAAG
TCCGTGCTGGTGTGGGACCTGCAAAACCCGGTC
CACTCCACCCCGGAATCCGGCGCCACCGCCGACTACGGCGTCCCCTTCCG
CCGCCTCACCGGCCAC TCCCACTTCGTCCAGGACGTCGTCCTCAGCTCC
GACGGCCAGTTCGCCCTCTCCGGCTCCTGGGAC
GGCGAGCTCCGCCTCTGGGACCTCTCCACCGGCGTCACCACCCGCCGCTT
CGTCGGCCACGAGAAG GACGTCCTCTCCGTCGCCTTCTCCGTCGACAAC
CGCCAGATCGTCTCCGCGTCCCGCGACAAGACC
ATCAAGCTCTGGAACACCCTCGGTGAGTGCAAGTACACCATTGGTGGCGA
CCTCGGCGGCGGGGAG GGCCACAACGGGTGGGTCTCCTGCGTCAGGTTC
TTCCCCAACACCTTTCAGGCCACCATTGTCTCC
GGATTCTGGGACCGCACCGTCAGGTCTGGAACCTTACCAACTGCAAGCTG
CGATGCACTCTCGATG CCCACGCGGCTATGTTAACGCCGTCGCC
25Quality trimming
CGGAAGACTGGAGTCGTCGCTGCGGCACCGGTCCGGAATTCCCGGGTCGA
CCCACGCGTCCGGCCG CCGCCACCGCATCCCTTGCAGCCCCAATCCCCC
ACGGCGACCATGGCCGGCGCGCAGGAGTCCCTG
TCCCTGGTGGGCACGATGCGTGGCCACAACGGCGAGGTGACGGCGATCGC
CACCCCGATCGACAAC TCGCCGTTCATCGTCTCCTCCTCCCGCGACAAG
TCCGTGCTGGTGTGGGACCTGCAAAACCCGGTC
CACTCCACCCCGGAATCCGGCGCCACCGCCGACTACGGCGTCCCCTTCCG
CCGCCTCACCGGCCAC TCCCACTTCGTCCAGGACGTCGTCCTCAGCTCC
GACGGCCAGTTCGCCCTCTCCGGCTCCTGGGAC
GGCGAGCTCCGCCTCTGGGACCTCTCCACCGGCGTCACCACCCGCCGCTT
CGTCGGCCACGAGAAG GACGTCCTCTCCGTCGCCTTCTCCGTCGACAAC
CGCCAGATCGTCTCCGCGTCCCGCGACAAGACC
ATCAAGCTCTGGAACACCCTCGGTGAGTGCAAGTACACCATTGGTGGCGA
CCTCGGCGGCGGGGAG GGCCACAACGGGTGGGTCTCCTGCGTCAGGTTC
TTCCCCAACACCTTTCAGGCCACCATTGTCTCC
GGATTCTGGGACCGCACCGTCAGGTCTGGAACCTTACCAACTGCAAGCTG
CGATGCACTCTCGATG CCCACGCGGCTATGTTAACGCCGTCGC
753 bases
ACGTX lt10 ACGTX gt10 and lt15 ACGTX gt15 and
lt20 ACGTX gt20 and lt25 ACGTX gt25 and
lt30 ACGTX gt30
26Quality trimming
CGGAAGACTGGAGTCGTCGCTGCGGCACCGGTCCGGAATTCCCGGGTCGA
CCCACGCGTCCGGCCG CCGCCACCGCATCCCTTGCAGCCCCAATCCCCC
ACGGCGACCATGGCCGGCGCGCAGGAGTCCCTG
TCCCTGGTGGGCACGATGCGTGGCCACAACGGCGAGGTGACGGCGATCGC
CACCCCGATCGACAAC TCGCCGTTCATCGTCTCCTCCTCCCGCGACAAG
TCCGTGCTGGTGTGGGACCTGCAAAACCCGGTC
CACTCCACCCCGGAATCCGGCGCCACCGCCGACTACGGCGTCCCCTTCCG
CCGCCTCACCGGCCAC TCCCACTTCGTCCAGGACGTCGTCCTCAGCTCC
GACGGCCAGTTCGCCCTCTCCGGCTCCTGGGAC
GGCGAGCTCCGCCTCTGGGACCTCTCCACCGGCGTCACCACCCGCCGCTT
CGTCGGCCACGAGAAG GACGTCCTCTCCGTCGCCTTCTCCGTCGACAAC
CGCCAGATCGTCTCCGCGTCCCGCGACAAGACC
ATCAAGCTCTGGAACACCCTCGGTGAGTGCAAGTACACCATTGGTGGCGA
CCTCGGCGGCGGGGAG GGCCACAACGGGTGGGTCTCCTGCGT
618 bases
ACGTX lt10 ACGTX gt10 and lt15 ACGTX gt15 and
lt20 ACGTX gt20 and lt25 ACGTX gt25 and
lt30 ACGTX gt30
27Resultado de blastX
read trimmado
gtgi1346109spP49027GBLP_ORYSA GUANINE
NUCLEOTIDE-BINDING PROTEIN BETA
SUBUNIT-LIKE PROTEIN (GPB-LR) (RWD) pirT03764
protein RWD - rice dbjBAA07404.1 (D38231) RWD
Oryza sativa Length 334 Score
315 bits (798), Expect 4e-85 Identities
150/170 (88), Positives 156/170 (91) Frame
1 Query 109 MAGAQESLSLVGTMRGHNGEVTAIATPIDNSPFIV
SSSRDKSVLVWDLQNPVHSTPESGA 288 MAGAQESL
L G M GHN VTAIATPIDNSPFIVSSSRDKSLVWDL NPV E
Sbjct 1 MAGAQESLVLAGVMHGHNDVVTAIATPIDNSPFIVSS
SRDKSLLVWDLTNPVQNVGEGAG 60 Query 289
TADYGVPFRRLTGHSHFVQDVVLSSDGQFALSGSWDGELRLWDLSTGVTT
RRFVGHEKDV 468 YGVPFRRLTGHSHFVQDVVLS
SDGQFALSGSWDGELRLWDLSTGVTTRRFVGHKDV Sbjct 61
ASEYGVPFRRLTGHSHFVQDVVLSSDGQFALSGSWDGELRLWDLSTGVTT
RRFVGHDKDV 120 Query 469 LSVAFSVDNRQIVSASRDKTIKL
WNTLGECKYTIGGDLGGGEGHNGWVSC 618
LSVAFSVDNRQIVSASRDTIKLWNTLGECKYTIGGDLGGGEGHNGWVSC
Sbjct 121 LSVAFSVDNRQIVSASRDRTIKLWNTLGECKYTIGGDL
GGGEGHNGWVSC 170
28Resultado de blastX
read inteiro
gtgi1346109spP49027GBLP_ORYSA GUANINE
NUCLEOTIDE-BINDING PROTEIN BETA
SUBUNIT-LIKE PROTEIN (GPB-LR) (RWD) pirT03764
protein RWD - rice dbjBAA07404.1 (D38231) RWD
Oryza sativa Length 334 Score
352 bits (893), Expect(2) e-100 Identities
168/192 (87), Positives 175/192 (90) Frame
1 Query 109 MAGAQESLSLVGTMRGHNGEVTAIATPIDNSPFIV
SSSRDKSVLVWDLQNPVHSTPESGA 288 MAGAQESL
L G M GHN VTAIATPIDNSPFIVSSSRDKSLVWDL NPV E
Sbjct 1 MAGAQESLVLAGVMHGHNDVVTAIATPIDNSPFIVSS
SRDKSLLVWDLTNPVQNVGEGAG 60 Query 289
TADYGVPFRRLTGHSHFVQDVVLSSDGQFALSGSWDGELRLWDLSTGVTT
RRFVGHEKDV 468 YGVPFRRLTGHSHFVQDVVLS
SDGQFALSGSWDGELRLWDLSTGVTTRRFVGHKDV Sbjct 61
ASEYGVPFRRLTGHSHFVQDVVLSSDGQFALSGSWDGELRLWDLSTGVTT
RRFVGHDKDV 120 Query 469 LSVAFSVDNRQIVSASRDKTIKL
WNTLGECKYTIGGDLGGGEGHNGWVSCVRFFPNTFQA 648
LSVAFSVDNRQIVSASRDTIKLWNTLGECKYTIGGDLGGGEGHNGW
VSCVRF PNTFQ Sbjct 121 LSVAFSVDNRQIVSASRDRTIKLWN
TLGECKYTIGGDLGGGEGHNGWVSCVRFSPNTFQP 180 Query
649 TIVSGFWDRTVR 684 TIVSG
WDRTV Sbjct 181 TIVSGSWDRTVK 192
29Determinação do limiar de qualidade
30Quality trimming
CGGAAGACTGGAGTCGTCGCTGCGGCACCGGTCCGGAATTCCCGGGTCGA
CCCACGCGTCCGGCCG CCGCCACCGCATCCCTTGCAGCCCCAATCCCCC
ACGGCGACCATGGCCGGCGCGCAGGAGTCCCTG
TCCCTGGTGGGCACGATGCGTGGCCACAACGGCGAGGTGACGGCGATCGC
CACCCCGATCGACAAC TCGCCGTTCATCGTCTCCTCCTCCCGCGACAAG
TCCGTGCTGGTGTGGGACCTGCAAAACCCGGTC
CACTCCACCCCGGAATCCGGCGCCACCGCCGACTACGGCGTCCCCTTCCG
CCGCCTCACCGGCCAC TCCCACTTCGTCCAGGACGTCGTCCTCAGCTCC
GACGGCCAGTTCGCCCTCTCCGGCTCCTGGGAC
GGCGAGCTCCGCCTCTGGGACCTCTCCACCGGCGTCACCACCCGCCGCTT
CGTCGGCCACGAGAAG GACGTCCTCTCCGTCGCCTTCTCCGTCGACAAC
CGCCAGATCGTCTCCGCGTCCCGCGACAAGACC
ATCAAGCTCTGGAACACCCTCGGTGAGTGCAAGTACACCATTGGTGGCGA
CCTCGGCGGCGGGGAG GGCCACAACGGGTGGGTCTCCTGCGTCAGGTTC
TTCCCCAACACCTTTCAGGCCACCATTGTCTCC
GGATTCTGGGACCGCACCGTCAGGTCTGGAACCTTACCAACTGCAAGCTG
CGATGCACTCTCGATG CCCACGCGGCTATGTTAACGCCGTCGCC
31Quality trimming
CGGAAGACTGGAGTCGTCGCTGCGGCACCGGTCCGGAATTCCCGGGTCGA
CCCACGCGTCCGGCCG CCGCCACCGCATCCCTTGCAGCCCCAATCCCCC
ACGGCGACCATGGCCGGCGCGCAGGAGTCCCTG
TCCCTGGTGGGCACGATGCGTGGCCACAACGGCGAGGTGACGGCGATCGC
CACCCCGATCGACAAC TCGCCGTTCATCGTCTCCTCCTCCCGCGACAAG
TCCGTGCTGGTGTGGGACCTGCAAAACCCGGTC
CACTCCACCCCGGAATCCGGCGCCACCGCCGACTACGGCGTCCCCTTCCG
CCGCCTCACCGGCCAC TCCCACTTCGTCCAGGACGTCGTCCTCAGCTCC
GACGGCCAGTTCGCCCTCTCCGGCTCCTGGGAC
GGCGAGCTCCGCCTCTGGGACCTCTCCACCGGCGTCACCACCCGCCGCTT
CGTCGGCCACGAGAAG GACGTCCTCTCCGTCGCCTTCTCCGTCGACAAC
CGCCAGATCGTCTCCGCGTCCCGCGACAAGACC
ATCAAGCTCTGGAACACCCTCGGTGAGTGCAAGTACACCATTGGTGGCGA
CCTCGGCGGCGGGGAG GGCCACAACGGGTGGGTCTCCTGCGT
618 bases
ACGTX lt10 ACGTX gt10 and lt15 ACGTX gt15 and
lt20 ACGTX gt20 and lt25 ACGTX gt25 and
lt30 ACGTX gt30
32Quality trimming
CGGAAGACTGGAGTCGTCGCTGCGGCACCGGTCCGGAATTCCCGGGTCGA
CCCACGCGTCCGGCCG CCGCCACCGCATCCCTTGCAGCCCCAATCCCCC
ACGGCGACCATGGCCGGCGCGCAGGAGTCCCTG
TCCCTGGTGGGCACGATGCGTGGCCACAACGGCGAGGTGACGGCGATCGC
CACCCCGATCGACAAC TCGCCGTTCATCGTCTCCTCCTCCCGCGACAAG
TCCGTGCTGGTGTGGGACCTGCAAAACCCGGTC
CACTCCACCCCGGAATCCGGCGCCACCGCCGACTACGGCGTCCCCTTCCG
CCGCCTCACCGGCCAC TCCCACTTCGTCCAGGACGTCGTCCTCAGCTCC
GACGGCCAGTTCGCCCTCTCCGGCTCCTGGGAC
GGCGAGCTCCGCCTCTGGGACCTCTCCACCGGCGTCACCACCCGCCGCTT
CGTCGGCCACGAGAAG GACGTCCTCTCCGTCGCCTTCTCCGTCGACAAC
CGCCAGATCGTCTCCGCGTCCCGCGACAAGACC
ATCAAGCTCTGGAACACCCTCGGTGAGTGCAAGTACACCATTGGTGGCGA
CCTCGGCGGCGGGGAG GGCCACAACGGGTGGGTCTCCTGCGTCAGGTTC
TTCCCCAACACCTTTCAGGCCACCATTGTCTCC
GGATTCTGGGACCGCACCGTCAGGTCTGGAACCTTACCAACTGCAAGCTG
CGATGCACTCTCGATG CCCACGCGGCTATGTTAACGCCGTCGCC
33Quality trimming
CGGAAGACTGGAGTCGTCGCTGCGGCACCGGTCCGGAATTCCCGGGTCGA
CCCACGCGTCCGGCCG CCGCCACCGCATCCCTTGCAGCCCCAATCCCCC
ACGGCGACCATGGCCGGCGCGCAGGAGTCCCTG
TCCCTGGTGGGCACGATGCGTGGCCACAACGGCGAGGTGACGGCGATCGC
CACCCCGATCGACAAC TCGCCGTTCATCGTCTCCTCCTCCCGCGACAAG
TCCGTGCTGGTGTGGGACCTGCAAAACCCGGTC
CACTCCACCCCGGAATCCGGCGCCACCGCCGACTACGGCGTCCCCTTCCG
CCGCCTCACCGGCCAC TCCCACTTCGTCCAGGACGTCGTCCTCAGCTCC
GACGGCCAGTTCGCCCTCTCCGGCTCCTGGGAC
GGCGAGCTCCGCCTCTGGGACCTCTCCACCGGCGTCACCACCCGCCGCTT
CGTCGGCCACGAGAAG GACGTCCTCTCCGTCGCCTTCTCCGTCGACAAC
CGCCAGATCGTCTCCGCGTCCCGCGACAAGACC
ATCAAGCTCTGGAACACCCTCGGTGAGTGCAAGTACACCATTGGTGGCGA
CCTCGGCGGCGGGGAG GGCCACAACGGGTGGGTCTCCTGCGTCAGGTTC
TTCCCCAACACCTTTCAGGCCACCATTGTCTCC
GGATTCTGGGACCGCACCGTCAGGTCTGGAACCTTACCAACTGCAAGCTG
CGATGCACTCTCGATG CCCACGCGGCTATGTTAACGCCGTCGCC
719 bases
antes dif. homol. dif. depois
618 - 66 684 35 719
34Determinação do limiar de qualidade
35Exemplo de derrapagem
36todos os reads 291,689 reads 864.5 186.3
comprimento médio 399.5 161.3 médio bases gt
20/read
37cluster size (reads) cluster size (reads) HS X phrap X CAP3 X HS total common
1 32202 32202 13731 18535 11634 16838 14296 10744
2 12440 12440 5617 9207 4869 7665 4852 3792
3 6752 6752 2402 5192 2151 4193 1984 1441
4 4225 4225 1239 3329 1145 2709 992 697
5 2856 2856 676 2360 700 1872 521 344
6 2098 2098 442 1806 482 1452 354 231
7 1582 1582 288 1362 317 1115 220 144
8 1245 1245 202 1091 242 862 153 99
9 974 974 156 913 186 720 113 72
10 776 776 105 752 143 634 74 44
11 639 639 76 607 99 511 54 30
12 492 492 71 547 99 429 46 32
13 437 437 47 454 90 400 40 25
14 366 366 42 391 40 341 26 13
15 306 306 31 390 50 295 18 11
16 273 273 25 279 35 275 18 8
17 225 225 15 273 23 235 11 4
18 177 177 11 227 15 191 5 2
19 124 124 6 177 18 176 5 3
gt20 1192 1192 40 1814 87 2228 23 12
total 69381 69381 25222 49706 22425 43141 23805 17748
38Discrepância interna
39Discrepância interna
40Teste de consistência interna
41Teste de consistência interna
42Teste de consistência interna
43Teste de consistência externa
44Teste de consistência externa
45Teste de consistência externa
46Números totais
Total sequences 291,689
cDNA clones sequenced (5or 3) 260,352
5 end sequences 259,325
3 end sequences 32,364
Total high-quality sequences 237,954
Success index () 81.6
Average insert size (bp) 1,250
Average sequence size (bp) 864 / 642
Bases with phred quality 20/read 399
47Números totais
Trotal sequences analyzed 237,954
Number of contigs 26,803
Number of singletons 16,338
Number of sugarcane assembled sequences (SAS) 43,141
Number of assembled sequences matching to known genes 27,833 (64.5)
Number of clones with full length inserts 14,409 (
48Contribuição específica por biblioteca
Número de ESTs SAS contigs singletons contribuição contribuição
AD1 8,137 1,474 1,200 1,200 3.4
AM1 5,991 841 664 664 1.9
AM2 6,629 982 705 705 2.3
CL6 3,511 595 467 467 1.4
FL1 8,412 1,753 1,465 1,465 4.1
FL3 5,714 840 667 667 1.9
FL4 7,289 1,082 886 886 2.5
FL5 5,115 861 744 744 2.0
FL8 3,362 378 337 337 0.9
HR1 5,070 717 519 519 1.7
LB1 3,699 459 369 369 1.1
LB2 5,402 790 650 650 1.8
LR1 6,653 984 819 819 2.3
LR2 2,329 299 254 254 0.7
LV1 3,068 384 327 327 0.9
RT1 4,227 569 484 484 1.3
RT2 5,819 942 728 728 2.2
RT3 4,356 614 478 478 1.4
RZ1 2,012 205 175 175 0.5
RZ2 3,177 385 301 301 0.9
RZ3 6,528 929 752 752 2.1
SB1 7,407 1,313 1,132 1,132 3.0
SD1 4,459 792 642 642 1.8
SD2 4,099 857 632 632 2.0
ST1 4,359 645 523 523 1.5
ST3 4,519 507 418 418 1.2
- 47 dos SAS são formados por reads oriúndos de
uma única biblioteca - 38 dos SAS tecido-especícos são singletons
49Classificação funcional
50Porcentagem por órgão
51SAStecido- específicas
Número de ESTs Melhor hit biblioteca
360 (Y17556) alpha kafirin Sorghum bicolor SD
103 (A23207) zein zA1 Zea mays SD
42 (AF232008) beta-glucosidase aggregating factor precursor Zea mays RT
24 (AC007789) putative low molecular early light-inducible protein Oryza sativa SD
22 (AP002820) putative peroxidase Oryza sativa RT
19 (X56337) alpha-amylase Oryza sativa CL
18 (AP000374) cyclopropane fatty acid synthase Arabidopsis thaliana FL
52GenBank - dbEST Março de 1998
- Total de Entradas 1,528,715
- Homo sapiens 967,015
(63,4) - Plantas (total) 73,087
(4.8) - Mus musculus domesticus (camundongo) 306,544
- Caenorhabditis elegans 72,521
- Arabidopsis thaliana 36,173
- Drosophila melanogaster 27,625
- Oryza sativa (arroz) 25,844
- Rattus sp. (rato) 20,311
- Brugia malayi (nematoide parasita) 13,641
- Toxoplasma gondii 10,671
- Emericella nidulans 5,787
- Schistosoma mansoni 3,659
- Trypanosoma brucei rhodesiense 3,519
- Danio rerio (zebrafish) 3,373
- Saccharomyces cerevisiae 3,042
53GenBank - dbEST Março de 2001
- Total de Entradas 7,692,809
- Homo sapiens 3,369,459
(43.8) - Plantas (total) 1,099,102 (14.3
) - Glycine max (soja) 160,500
- Arabidopsis thaliana 113,000
- Medicago truncatula (barrel medic) 112,458
- Lycopersicon esculentum (tomate) 107,226
- Zea mays (milho) 86,999
- Oryza sativa (arroz) 72,657
- Hordeum vulgare (cevada)
68,480 - Chlamydomonas reinhardtii 64,973
- Sorghum bicolor 62,642
- Triticum aestivum (trigo)
58,141 - Pinus taeda (loblolly pine) 34,896
- Lotus japonicus 27,078
- Solanum tuberosum (batata)
26,177 - Gossypium arboreum 20,978
54GenBank - dbEST Setembro de 2002
- Total de Entradas 12,845,578
- Homo sapiens 4,691,979
(36.5) - Plantas (total) 2,279,170 (17.4
) - Glycine max (soja) 284,714
- Triticum aestivum (trigo)
256,593 - Hordeum vulgare (cevada)
240,882 - Zea mays (milho) 180,587
- Arabidopsis thaliana 174,624
- Medicago truncatula (barrel medic) 170,500
- Lycopersicon esculentum (tomate) 148,346
- Chlamydomonas reinhardtii 130,324
- Oryza sativa (arroz) 108,429
- Solanum tuberosum (batata) 94,420
- Sorghum bicolor 84,712
- Lactuca sativa (alface) 68,188
- Pinus taeda (loblolly pine) 60,226
- Physcomitrella patens 50,250
55Genetics and Molecular Biology
- The libraries that made SUCEST
- Bioinformatics of the sugarcane EST project
- Trimming and clustering sugarcane ESTs
- The sugarcane signal transduction (SUCAST)
catalogue prospecting signal transduction in
sugarcane - In silico characterization and expression
analyses of sugarcane putative sucrose
non-fermenting-1 (SNF1) related kinases - Identification of 14-3-3-like protein in
sugarcane (Saccharum officinarum) - A search for homologues of plant photoreceptor
genes and their signaling partners in the
sugarcane expressed sequence tag (Sucest)
database - Phylogenetic relationships between Arabidopsis
and sugarcane bZIP transcriptional regulatory
factors - Identification of sugarcane cDNAs encoding
components of the cell cycle machinery - Dissecting the sugarcane expressed sequence tag
(SUCEST) database unraveling flower-specific
genes - Molecular chaperone genes in the sugarcane
expressed sequence database (SUCEST) - Oxidative stress response in sugarcane
- In silico differential display of defense-related
expressed sequence tags from sugarcane tissues
infected with diazotrophic endophytes - Mechanisms of sugarcane response to herbivory
- Base excision repair in sugarcane
- Preliminary analysis of microsatellite markers
derived from sugarcane expressed sequence tags
(ESTs) - Sequence polymorphism from EST data in sugarcane
a fine analysis of 6-phosphogluconate
dehydrogenase genes - A search for markers of sugarcane evolution
- Sugarcane genes related to mitochondrial function
- Mitochondrial and chloroplast localization of
FtsH-like proteins in sugarcane based on their
phylogenetic profile - Patterns of expression of cell wall related genes
in sugacane - Expression of sugarcane genes induced by
inoculation with Gluconacetobacter diazotrophicus
and Herbaspirillum rubrisubalbicans - Identifying sugarcane expressed sequences
associated with nutrient transporters and peptide
metal chelators - Prospecting sugarcane genes involved in aluminum
tolerance - N-glycosylation in sugarcane
- Sugarcane expressed sequences tags (ESTs)
encoding enzymes involved in lignin biosynthesis
pathways - Biosynthesis of secondary metabolites in
sugarcane - Identification of sugarcane genes involved in the
purine synthesis pathway - A new member of the chalcone synthase (CHS)
family in sugarcane - Classification. expression pattern and
comparative analysis of sugarcane expressed
analysis of sugarcane expressed sequences tags
(ESTs) encoding glycine-rich proteins (GRPs) - Identification. classification and expression
pattern analysis of sugarcane cysteine
proteinases - Identification of metalloprotease gene families
in sugarcane - Sugarcane phytocystatins Identification.
classification and expression pattern analysis
- DNA repair-related genes in sugarcane expressed
sequence tags (ESTs) - Distribution of DNA repair-related ESTs in
sugarcane - Survey of transposable elements in sugarcane
expressed sequence tags (ESTs)
56Genetics and Molecular Biology
http//www.sbg.org.br/revista24_index.htm
57Grupo do SUCEST
58Uma parte do LBI
59Uma parte do LBI
60Os trimmadores
61Grupo Genoma - CBMEG
62Grupo Genoma - CBMEG
felipes_at_cenargen.embrapa.br
http//www.lbi.ic.unicamp.br/
63www.laerte.com.br
64(No Transcript)
65(No Transcript)