Title: milkER a milk informatics resource
 1milkER  a milk informatics resource
- Stephen Edwards BSc. 
- University of Edinburgh 
- BioNLP meeting 6th June 2005
2Overview
-  Aims of milkER 
-  milkER database 
-  Text-mining 
-  Potential targets 
3milkER aims
- To amalgamate disperse milk information into one 
 resource, allowing more focused analysis of milk
 proteins in relation to dairy issues, health and
 disease.
4A milk database
- Knowledge on milk affects many industries 
- UniProt, GenBank excellent resources 
- Marsupial genomics database (New Zealand) 
- Glasgow genomics data 
- Chinese database 
- Polish bioactive peptide database 
- Food property database (commercial)
5Milk components
- Fat, carbohydrates, proteins, minerals 
- Growth factors, enzymes, enzyme inhibitors, 
 immunoglobulins, allergens, disease factors,
 anti-bacterial proteins, opioids
-  1. Deliberate 
-  2. Leakage from blood 
-  3. Result of disease conditions 
-  4. Engineered 
-  5. Bacterial origin
6milkER database
- Database using BioSQL which allows incorporation 
 of UniProt, EMBL, GenBank entries
7LOCUS NM_173929 790 bp 
mRNA linear MAM 27-OCT-2004 DEFINITION Bos 
taurus lactoglobulin, beta (LGB), mRNA. ACCESSION 
 NM_173929 VERSION NM_173929.2 
GI31343239 KEYWORDS . SOURCE Bos taurus 
(cow) ORGANISM Bos taurus 
Eukaryota Metazoa Chordata Craniata 
Vertebrata Euteleostomi Mammalia 
Eutheria Cetartiodactyla Ruminantia Pecora 
Bovidae Bovinae Bos. REFERENCE 1 
(bases 1 to 790) AUTHORS Jayat,D., 
Gaudin,J.C., Chobert,J.M., Burova,T.V., Holt,C., 
 McNae,I., Sawyer,L. and Haertle,T. 
TITLE A recombinant C121S mutant of bovine 
beta-lactoglobulin is more 
susceptible to peptic digestion and to 
denaturation by reducing agents and 
heating JOURNAL Biochemistry 43 (20), 
6312-6321 (2004) PUBMED 15147215 REMARK 
GeneRIF Results suggest that the stability of 
beta-lactoglobulin arising from the 
hydrophobic effect is reduced by the C121S 
 mutation so that unfolded or partially 
unfolded states are more 
favored. ORIGIN 1 actccactcc 
ctgcagagct cagaagcgtg atcccggctg cagccatgaa 
gtgcctcctg 61 cttgccctgg ccctcacctg 
tggcgcccag gccctcatcg tcacccagac catgaagggc 
.. 
 8 Information retrieval
Other Databases
EMBL
UniProt
Information extraction
Other Sources (e.g. published tables)
milkER population
milkER
Web Query  
 9milkER database
- Database using BioSQL which allows incorporation 
 of UniProt, EMBL, GenBank entries
- Library of literature on milk 
- User interface (www.milker.org.uk)
10(No Transcript) 
 11(No Transcript) 
 12(No Transcript) 
 13Text-mining
- Machine reading of text 
- Many techniques involved 
- Tokenisation 
- Stemming (Activation ? Activat) 
- POS tagging (Protein ? noun) 
- Abbreviation expansion (CN ? Casein) 
- Entity identification (Casein ? protein) 
- Dictionary
14Increased levels of IgA antibodies to B-LG were 
found and were shown to be an independent risk 
marker for type 1 diabetes.
 Increased past participle levels 
plural noun of preposition
Tokeniser / POS tagger
IgA antibody B-LG protein Diabetes disease
Entity identification
Parser
IgA antibodies to B-LG MARKER type 1 
diabetes 
 15Information extraction
- Rule based 
- interact bind activate 
- protein (0-5 words) verbs (0-5 words) 
 protein
- (Blaschke and Valencia, 2002) 
- Machine-learning 
- Statistical methods, Hidden Markov Models 
- Learn interfillers, text lying between tagged 
 entities (Bunescu et al, 2004)
16Difficulties
- Synonyms 
- Proteins and genes with same name 
- Funny names e.g. ERK-1/2, and gene! 
- Variability of natural language 
- Compounded names 
- Co-ordination, negatives, speeling errors
17Evaluation
- Precision (P) - how correct is output 
- Recall (R) - how often does it pick 
- F-measure - combines P and R 
- IE systems can achieve high results, but not 
 enough to populate databases automatically
18Text-mining uses
- Aim to extract interactions and diseases 
- Swanson (Fish oil) 
- Srinivasan (Turmeric) 
19General model for discovering implicit links 
between topics Starting topic Turmeric 
 (inhibits) Intermediate topic Nuclear 
factor-kappa B (involved in) Terminal 
topic Crohns disease 
Diagram taken from Srinivasan et al, 2004 
 20Targets for text mining
- Many milk relationships still require further 
 investigation
- Positive reasons 
-  - nutritional benefits 
-  - neonatal growth 
-  - antimicrobial activity 
- - bioactive peptides 
21Targets for text mining (cont.)
- Negative reasons 
-  - recent link with Alzheimer's 
-  - diabetes link 
-  - asthma 
-  - human reactions to cow hormones 
-  (e.g. Acne, Danby 2005) 
-  - drug transfer to milk and effects 
-  - allergic reactions/intolerance 
-  - toxic contaminants
22milkER process
- 897 proteins, 772 dna, 1232 rna 
- Analyze references (1465 MEDLINE refs) 
- MeSH terms, GO terms etc 
- POS tag 
- UMLS standardisation 
- Gene/protein dictionary 
- Extract relations
23Milk literature 
 24milkER interactions
- Table of interacting proteins 
- Store as queryable XML strings? 
- Discover links between proteins and disease 
- Create hypotheses 
- Confirm experimentally
25Diabetes
- Pancreas secretes hormones 
- Glycagon, increases conversion glycagon ? glucose 
- Insulin, increases conversion glucose ? glycagon. 
 Allows glucose into cells.
- Condition where the amount of glucose in the 
 blood is abnormally high as the body cannot use
 it adequately as fuel
26Diabetes
- Affects 3-5 of industrialised populations 
- Type 1 (10) 
- Genetic and environmental factors (e.g. diet) 
- Decreased insulin production 
- Mostly develops lt age 20 
- Type II (90) 
- Resistance of body to insulin 
- Normally develops gt age 40 
- Often associates with high B.P, cholsterol and 
 arterial disease
27Milk and diabetes 
 28Selected quotes
- More research is needed on all aspects of 
 lactation in women with diabetes.
- Reader D. et al, Curr Diab Rep. 2004  
- The effect of high protein intakes from 
 different sources on glucose-insulin metabolism
 needs further study
- Hoppe et al, European Journal of Clinical 
 Nutrition 2005
- American children also tend to be heavier than 
 those from European countries, skewing the
 growth charts further.
- The Scotsman Sat 5 Feb 2005 
- The government currently recommends that babies 
 should be fed breast milk alone for the first six
 months - the WHO recommends two years.
29Conclusions
- Knowledge of milk vital in many areas 
- milkER aims to bring disparate milk data together 
- Text-mining can wade through large amounts of 
 data to retrieve and discover vital information
30Future work
- Relation extraction of milk literature 
- Extend content of milkER to include interaction 
 data
- Create hypotheses for experimental work
31Acknowledgements
- Prof. Lindsay Sawyer 
- Dr. Carl Holt (Hannah Research Institute, Ayr) 
- Prof. Bonnie Webber (Informatics) 
- Dr. Alistair Kerr and Dr. Douglas Armstrong for 
 technical support
32References
- Acne/milk 
- Acne and milk, the diet myth, and beyond (Danby, 
 2005)
- Diabetes/milk 
- Milk and diabetes (Schrezenmeir et al, 2000) 
 REVIEW
- The role of ?-casein variants in the induction of 
 insulin-dependent diabetes (Elliott et al, 1997)
- Text-mining 
- Natural language processing and systems biology 
 (Cohen et al, 2004) REVIEW
- Mining MEDLINE for implicit links between dietary 
 substances and diseases (Srinivasan et al, 2004)
- Learning to extract proteins and their 
 interactions from MEDLINE abstracts (Bunescu et
 al, 2003)