Class Projects - PowerPoint PPT Presentation

About This Presentation
Title:

Class Projects

Description:

Degradation of ubiquitin cycle can cause extra long/short half-life of genes ... Drug response on cell lines; gene expression data; gene copy data; mutation ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 20
Provided by: Xin110
Category:

less

Transcript and Presenter's Notes

Title: Class Projects


1
Class Projects
2
Future Work and Possible Project Topic in Gene
Regulatory network
  • Learning from multiple data sources
  • Learning causality in Motifs
  • Learning GRN with feedback loops

3
Learning from multiple data sources
  • We have gene expression data and topological
    ordering information
  • Incorporating some other data sources as prior
    knowledge for the learning
  • Transcription factor binding location data

Example Partial regulatory network recovered
using expression data and location data.
4
Learning Causality in Motifs
  • They be used to assemble a transcriptional
    regulatory network.
  • Network motifs are the simplest units of network
    architecture.

5
Learning GRN with feedback loops
6
Learning GRN with feedback loops
(CondProtein-Protein Interactions
7
Future work and Possible Project Topics in
protein interaction
  • Learning from multiple data sources
  • Disease related protein-protein interactions
  • Learning from different species

8
Learning from Multiple data sources
  • Gene Neighbor identifies protein pair encoded in
    close proximity across multiple genomes.
  • Rosetta Stone
  • Phylogenetic Profile
  • Gene Clustering
  • closely spaced genes, and assigns a probability
    P of observing a particular gap distance

9
Disease related protein-protein interactions
Disease Related??? -- Query NCBI OMIM Database
10
Learning from different species
11
BioQA related projects
12
Projects for BioQA
  • Learning
  • Given a set of relevant abstracts, what kind of
    features can we obtain to enhance our queries?
  • Given a set of questions from users, how can we
    identify keywords from the questions to form
    queries?
  • Answer Presentation
  • Given a relevant abstract/article,
  • how can we retrieve the relevant passage with
    respect to the users question?
  • how to extract answers?

13
Projects for BioQA
  • Automatic Extraction
  • Extract relations of gene-disease,
    gene-biological process (also their corresponding
    organisms)
  • Uniquely identify the genes
  • A gene symbol can be associated with multiple
    gene identifiers. Which gene identifier is the
    right one?
  • Can these extraction processes be generalized?
  • Sortal Resolution
  • Given an abstract and query, perform sortal
    resolution (but not on pronouns)
  • Example
  • Given the following abstract
  • In this report, we show that virus infection of
    cells results in a dramatic hyperacetylation of
    histones H3 and H4 that is localized to the
    IFN-beta promoter. Thus, coactivator-mediated
    localized hyperacetylation of histones may play a
    crucial role in inducible gene expression. PMID
    10024886
  • and the query about histones, perform resolution
    on histones
  • Results histones refer to H3, H4.

14
Projects for BioQA
  • Semantics of Words
  • Dealing with the semantics of words to improve
    the retrieval of answers
  • Example semantic relation between role and
    play
  • Gene symbol variants, disambiguate gene symbols,
    entity recognition
  • Generate gene symbol synonyms and variants given
    a gene symbol in a query
  • Example variants of CDC28 can be written as
    Cdc28, Cdc28p, cdc-28
  • GSS is a synonym of PRNP, but GSS itself is
    also a gene which is unrelated to PRNP.
  • Improve on recognition of diseases, biological
    processes
  • Extension of Ontology
  • To capture biological processes and their
    possible relations to diseases
  • Examples
  • learning and/or memory can influence Alzheimers
    disease
  • Degradation of ubiquitin cycle can cause extra
    long/short half-life of genes
  • Extra long/short half-life of genes can cause
    cancer

15
CBioC Class Projects
Extraction of organism info for each entity in a relationship High-priority. Use existing software for extraction, but need to use biological databases and algorithms for deducing info (not explicit), and allow users to correct this info. Example, PMID 16107876.
KALPESH Image extension - extracts images information about images and allows collaborative curation. Take PDFs other structured documents, and extract images with their captions references within the text, then let users polish. Related.
Use ontologies and some automated tools to ensure consistency and cross-link info 2 people. Information entered by users needs to be validated against existing DB ontologies. Also, need to tag our data for cross-reference. Example
16
Other projects
17
Build an Ontology
  • Build an ontology for a domain for which we do
    not have an ontology yet.
  • Verify its consistency.

18
Various kinds of text extraction systems
  • TREC suggested ones
  • Which method/protocol is used in which
    experiment/procedure
  • Gene disease role
  • Gene biological process role
  • Gene mutation type biological impact
  • Gene interaction gene function organ
  • Gene interaction gene disease organ
  • Protein Lounge inspired
  • Kinase-phosphatase
  • transcription factor
  • peptide antigen

19
Drug classification in Pharmacogenetics
  • Experimental Data available
  • Drug response on cell lines gene expression
    data gene copy data mutation analysis data
    RNAi data
  • Data from literature
  • Mutation data (Sanger lab) NCI-60 drug response
    data Mutation analysis data Pathway data (e.g.
    BIND) Gene Ontology
  • Proprietary data
  • Where does the drug physically interact? (600
    Kinase IC 50)
  • Gene expression data of patients after treatments
  • Goal
  • Given a patient, what kinds of data do we need in
    order to determine if a drug should be applicable
    to that patient or not? How do we develop a
    classifier using these kinds of data?
  • Find gene and protein interaction network (or
    components) using these data.
Write a Comment
User Comments (0)
About PowerShow.com