Title: Data analysis of gene expression data
1Data analysis of gene expression data
2Personnel
- Jaakko Hollmén, Heikki Mannila
- Graduate students (3) Jouni Seppänen, Salla
Ruosaari, Anne Patrikainen - Undergraduate students (2) Mikko Katajamaa,
Antti Rasinen
3Gene expression data
- State of protein production
- Tissue to RNA to hybridized arrays
- High-dimensional, noisy measurement data matrices
- 500-10000 simultaneous measurements from an
organism
4Research scope
- Goal advances in data analysis, with a specific
focus on analyzing gene expression data - High-dimensional, noisy measurement data matrices
- Signal decomposition and projection methods (PCA,
ICA, NMF, ...), MCMC, and pattern discovery
methods
5Understanding measurements
- Simulation model for gene expression data
- To understand measurements and their analysis
6Closer look at the real world
7Expression data as numbers
- 0.8214 0.5298 0.4586 0.7505
0.0147 0.2440 0.7258 0.1302 0.8995
0.2233 0.7430 0.9636 0.7067 0.7333
0.4974 0.0264 - 0.4447 0.6405 0.8699 0.7400
0.6641 0.8220 0.3987 0.2544 0.6928
0.3965 0.6508 0.1205 0.1684 0.6223
0.0750 0.3554 - 0.6154 0.2091 0.9342 0.4319
0.7241 0.2632 0.3584 0.8030 0.4397
0.1351 0.9398 0.0483 0.8137 0.9898
0.7666 0.7439 - 0.7919 0.3798 0.2644 0.6343
0.2816 0.7536 0.2853 0.6678 0.7010
0.2411 0.8328 0.3802 0.4662 0.1524
0.0454 0.2987 - 0.9218 0.7833 0.1603 0.8030
0.2618 0.6596 0.8686 0.0136 0.6097
0.9275 0.4700 0.4128 0.7223 0.2033
0.1651 0.1812 - 0.7382 0.6808 0.8729 0.0839
0.7085 0.2141 0.6264 0.5616 0.2999
0.3911 0.6299 0.4014 0.9949 0.8193
0.7772 0.4152 - 0.1763 0.4611 0.2379 0.9455
0.7839 0.6021 0.2412 0.4546 0.8560
0.5113 0.0582 0.4210 0.3625 0.0584
0.2083 0.8673 - 0.4057 0.5678 0.6458 0.9159
0.9862 0.6049 0.9781 0.9049 0.1121
0.0929 0.5422 0.3770 0.7308 0.5385
0.2518 0.6249 - 0.9355 0.7942 0.9669 0.6020
0.4733 0.6595 0.6405 0.2822 0.2916
0.0217 0.4557 0.9073 0.6497 0.1902
0.3965 0.0552 - 0.9169 0.0592 0.6649 0.2536
0.9028 0.1834 0.2298 0.0650 0.0974
0.1595 0.8631 0.6702 0.6813 0.5995
0.4807 0.4041 - 0.4103 0.6029 0.8704 0.8735
0.4511 0.6365 0.6813 0.4766 0.3974
0.8445 0.8552 0.9618 0.0076 0.2923
0.5093 0.3020 - 0.8936 0.0503 0.0099 0.5134
0.8045 0.1703 0.6658 0.9837 0.3333
0.8792 0.4723 0.1630 0.6541 0.0913
0.6248 0.1523 - 0.0579 0.4154 0.1370 0.7327
0.8289 0.5396 0.1347 0.9223 0.9442
0.1870 0.7869 0.7486 0.9452 0.5068
0.6255 0.3092 - 0.3529 0.3050 0.8188 0.4222
0.1663 0.6234 0.0225 0.5612 0.8386
0.9913 0.6560 0.3741 0.6133 0.8841
0.9912 0.0033 - 0.8132 0.8744 0.4302 0.9614
0.3939 0.6859 0.2622 0.6523 0.2584
0.7120 0.0000 0.4542 0.7829 0.6156
0.3592 0.4374 - 0.0099 0.0150 0.8903 0.0721
0.5208 0.6773 0.1165 0.7727 0.0429
0.8714 0.1312 0.0386 0.0032 0.0464
0.2760 0.6764 - 0.1389 0.7680 0.7349 0.5534
0.7181 0.8768 0.0693 0.1062 0.0059
0.4796 0.4949 0.5624 0.7970 0.9519
0.6781 0.8229 - 0.2028 0.9708 0.6873 0.2920
0.5692 0.0129 0.8529 0.0011 0.5744
0.4960 0.0383 0.3723 0.6418 0.1690
0.5088 0.7558 - 0.1987 0.9901 0.3461 0.8580
0.4608 0.3104 0.1803 0.5418 0.7439
0.2875 0.2274 0.7928 0.1785 0.8267
0.2769 0.1626
8Quality control at spot level
- Choose good quality spots for subsequent analysis
- image analysis, detection and cost-sensitive
classification
9Collaboration with biologists
- Department of Medical genetics, Lab. of
Cytomolecular Genetics, U. of Helsinki - Institute of Occupational Health
- Turku Centre for Biotechnology
- Karolinska Institutet
- Journal articles during 2002
Wikman et al., Identification of differentially
expressed genes in pulmonary adenocarcinoma by
using a cDNA array. Oncogene 21(37), 2002,
Nature Publishing Group Niini et al., Expression
of myeloid-specific genes in childhood acute
lymphoblastic leukemia cDNA array study.
Leukemia, 16(11), 2002, Nature Publishing
Group Mannila et al., Long-range control of gene
expression in yeast. Bioinformatics 18(3), 2002.
10Current topics and further work
- Correlation between gene expression and gene
location in the genome - Combinations with sequence information
- Time-series analysis, decompositions
- Sparse decompositions of data matrices
- MCMC techniques
- Pattern discovery methods
- Etc.