Title: Class Projects
1Class Projects
2Future Work and Possible Project Topic in Gene
Regulatory network
- Learning from multiple data sources
- Learning causality in Motifs
- Learning GRN with feedback loops
3Learning from multiple data sources
- We have gene expression data and topological
ordering information - Incorporating some other data sources as prior
knowledge for the learning - Transcription factor binding location data
Example Partial regulatory network recovered
using expression data and location data.
4Learning Causality in Motifs
- They be used to assemble a transcriptional
regulatory network. - Network motifs are the simplest units of network
architecture.
5Learning GRN with feedback loops
6Learning GRN with feedback loops
(CondProtein-Protein Interactions
7Future work and Possible Project Topics in
protein interaction
- Learning from multiple data sources
- Disease related protein-protein interactions
- Learning from different species
8Learning from Multiple data sources
- Gene Neighbor identifies protein pair encoded in
close proximity across multiple genomes. - Rosetta Stone
- Phylogenetic Profile
- Gene Clustering
- closely spaced genes, and assigns a probability
P of observing a particular gap distance
9Disease related protein-protein interactions
Disease Related??? -- Query NCBI OMIM Database
10Learning from different species
11BioQA related projects
12Projects for BioQA
- Learning
- Given a set of relevant abstracts, what kind of
features can we obtain to enhance our queries? - Given a set of questions from users, how can we
identify keywords from the questions to form
queries? - Answer Presentation
- Given a relevant abstract/article,
- how can we retrieve the relevant passage with
respect to the users question? - how to extract answers?
13Projects for BioQA
- Automatic Extraction
- Extract relations of gene-disease,
gene-biological process (also their corresponding
organisms) - Uniquely identify the genes
- A gene symbol can be associated with multiple
gene identifiers. Which gene identifier is the
right one? - Can these extraction processes be generalized?
- Sortal Resolution
- Given an abstract and query, perform sortal
resolution (but not on pronouns) - Example
- Given the following abstract
- In this report, we show that virus infection of
cells results in a dramatic hyperacetylation of
histones H3 and H4 that is localized to the
IFN-beta promoter. Thus, coactivator-mediated
localized hyperacetylation of histones may play a
crucial role in inducible gene expression. PMID
10024886 - and the query about histones, perform resolution
on histones - Results histones refer to H3, H4.
14Projects for BioQA
- Semantics of Words
- Dealing with the semantics of words to improve
the retrieval of answers - Example semantic relation between role and
play - Gene symbol variants, disambiguate gene symbols,
entity recognition - Generate gene symbol synonyms and variants given
a gene symbol in a query - Example variants of CDC28 can be written as
Cdc28, Cdc28p, cdc-28 - GSS is a synonym of PRNP, but GSS itself is
also a gene which is unrelated to PRNP. - Improve on recognition of diseases, biological
processes - Extension of Ontology
- To capture biological processes and their
possible relations to diseases - Examples
- learning and/or memory can influence Alzheimers
disease - Degradation of ubiquitin cycle can cause extra
long/short half-life of genes - Extra long/short half-life of genes can cause
cancer
15CBioC Class Projects
Extraction of organism info for each entity in a relationship High-priority. Use existing software for extraction, but need to use biological databases and algorithms for deducing info (not explicit), and allow users to correct this info. Example, PMID 16107876.
KALPESH Image extension - extracts images information about images and allows collaborative curation. Take PDFs other structured documents, and extract images with their captions references within the text, then let users polish. Related.
Use ontologies and some automated tools to ensure consistency and cross-link info 2 people. Information entered by users needs to be validated against existing DB ontologies. Also, need to tag our data for cross-reference. Example
16Other projects
17Build an Ontology
- Build an ontology for a domain for which we do
not have an ontology yet. - Verify its consistency.
18Various kinds of text extraction systems
- TREC suggested ones
- Which method/protocol is used in which
experiment/procedure - Gene disease role
- Gene biological process role
- Gene mutation type biological impact
- Gene interaction gene function organ
- Gene interaction gene disease organ
- Protein Lounge inspired
- Kinase-phosphatase
- transcription factor
- peptide antigen
19Drug classification in Pharmacogenetics
- Experimental Data available
- Drug response on cell lines gene expression
data gene copy data mutation analysis data
RNAi data - Data from literature
- Mutation data (Sanger lab) NCI-60 drug response
data Mutation analysis data Pathway data (e.g.
BIND) Gene Ontology - Proprietary data
- Where does the drug physically interact? (600
Kinase IC 50) - Gene expression data of patients after treatments
- Goal
- Given a patient, what kinds of data do we need in
order to determine if a drug should be applicable
to that patient or not? How do we develop a
classifier using these kinds of data? - Find gene and protein interaction network (or
components) using these data.