Interdisciplinary Introductory Course in Bioinformatics - PowerPoint PPT Presentation

About This Presentation
Title:

Interdisciplinary Introductory Course in Bioinformatics

Description:

To introduce students to the Python programming language. ... to Python learned the new programming language independently using structured guidance. ... – PowerPoint PPT presentation

Number of Views:171
Avg rating:3.0/5.0
Slides: 40
Provided by: yana3
Category:

less

Transcript and Presenter's Notes

Title: Interdisciplinary Introductory Course in Bioinformatics


1
Interdisciplinary Introductory Course in
Bioinformatics
  • Yana Kortsarts
  • Computer Science Department
  • Robert Morris
  • Biology Department
  • Janine Utell
  • English Department,
  • Widener University, Chester, PA

2
What is Bioinformatics?
  • Bioinformatics is a relatively new
    interdisciplinary field that integrates computer
    science, mathematics, biology, and information
    technology to manage, analyze, and understand
    biological, biochemical and biophysical
    information.
  • Bioinformatics is a computational science and the
    subset of larger field of Computational Biology.

3
Motivation
  • IS professionals must have strong analytical and
    critical thinking skills. (IS 2002 Model
    Curriculum and Guidelines for Undergraduate
    Degree Programs in IS)
  • Introducing bioinformatics to CIS students will
    strengthen these required skills.
  • Equip students with some of the following
    capabilities as suggested in the IS 2002
    guidelines
  • Creativity
  • Application of both traditional and new concepts
    and skills
  • Application development
  • Problem solving abilities
  • Ability to communicate effectively (oral, written
    and listening)

4
Motivation
  • Provides opportunities for students to become
    familiar with one of the most widely used script
    languages, Python
  • Explore various data structures and algorithmic
    techniques traditionally not covered in other
    courses.
  • Helps students to make connections between
    theoretical topics learned in core CS and CIS
    courses, such as Data Structures and Algorithms,
    and to apply their knowledge to real world
    biology problems.
  • Helps to diversify department course offering and
    provides interdisciplinary opportunities for CS
    and CIS students.
  • CIS and CS students with bioinformatics
    background clearly will enhance their employment
    qualifications in the competitive job market

5
Challenges
  • Students have different backgrounds
  • Choosing programming language
  • Defining course prerequisites
  • Defining course content
  • Programming
  • Algorithms
  • End User Bioinformatics Tools
  • Balanced course
  • Content
  • Hands-on/lecture
  • Interdisciplinary Nature

6
Course Development
  • First Iteration Spring 05 Second Iteration
    Spring 08
  • Crosslisted upper level technical elective.
  • Prerequisites
  • Biology/Chemistry/Biochemistry majors
    Introduction to Molecular Biology
  • CS/CIS/MATH majors Introduction to Computer
    Science I
  • Chemical Engineering Majors Computer Programming
    and Engineering Problem Solving
  • Team teaching Biology and Computer Science
    Faculty
  • 4 credits, 6 hr 4 CS, 2 Biology
  • Spring 05 Enrollment 6 Biology students, 6
    CS/CIS
  • Spring 08 Enrollment 6 CS/CIS

7
Course Objectives and Goals
  • To integrate bioinformatics algorithms into the
    course and to teach the foundations of the
    algorithms and important results in
    bioinformatics
  • To introduce students to the Python programming
    language. Biopython Project is an international
    association of developers of freely available
    Python tools for computational molecular biology.

8
Course Objectives and Goals
  • To introduce students to the principles that
    drive an algorithms design and to intellectual
    content of bioinformatics
  • To provide an opportunity for interdisciplinary
    collaboration in the in-class assignments and the
    course project

9
Course Curriculum
  • Ethics, Computing and Genomics
  • Project-Oriented Component, new for Spring 08
  • 20 of the final grade, three weeks to work on
    this project
  • Goal developing oral and written communication
    skills and to engage students in the knowledge
    exchange process
  • Learning about the ethics, computing and genomics
    topic independently and presenting the results of
    the self-learning.
  • Students were assigned one or more scholarly
    articles from the collection Ethics, Computing,
    and Genomics, edited by Herman Tavani.
  • Students were required
  • read assigned essays
  • prepare 25-minute Power Point presentations with
    a summary of the paper and answers to the
    questions posed in the introductory part of the
    corresponding section
  • prepare a mini-quiz to assess the understanding
    of the presented material by their peers.

10
Ethics, Computing and Genomics
  • Collaborative Work with English faculty member
  • English faculty member did short presentation
    before students started to work on this
    assignment.
  • discussion of how to read critically and what
    questions to ask while reading the text
  • discussion of how to summarize the paper using
    the structure of the essay as a guide and
    elucidating key points and key moments of
    evidence while making connections to the rest of
    the class material
  • tips on writing the summary that include three
    steps prewriting, drafting, revising
  • discussion of how to design an effective
    presentation of information.
  • Was present at all oral presentations and
    provided detailed notes for each student
    explaining ways the presentation could have been
    stronger and also pointing out the positive and
    negative aspects of the presentation.
  • This successful and enjoyable experience showed
    the value of working with colleagues across
    disciplines to further student learning.

11
Course Curriculum Introduction to Python
  • Quickly introduce students to Python during first
    few weeks of the course
  • Working on different problem solving algorithmic
    techniques.
  • Introductory topics arithmetic, decision and
    loop structures, functions, simple manipulations
    with strings, lists, tuples and dictionaries.
  • Advanced Python topics were taught later
    throughout the course, building students
    knowledge and their abilities to tackle biology
    real-world problems.
  • The programming examples were all
    biology-oriented and motivated students to learn
    in order to solve practical problems.

12
Introduction to Python
  • Spring 05 6 biology students, no programming
    experience, 6 CS and CIS students no experience
    in Python.
  • 6 interdisciplinary teams, all concepts were
    practiced within the team
  • Spring 08 all students were CS and CIS majors
    with prior programming experience in C and Java,
    and some with introductory knowledge in Python
  • Special handouts were prepared to walk students
    through the introductory topics toward advanced
    Python concepts.
  • Each topic was supported by a list of examples in
    increasing order of complexity.
  • Students were required to run the proposed
    programs in order to gain understanding of basic
    Python structures.
  • To assess the understanding of each concept,
    students were required to write short programs
    solving biology-oriented problems.

13
Introduction to Python
  • Examples of the problems, given here in
    increasing level of complexity
  • computation of the alignment score between two
    DNA sequences using different score matrices
  • finding the maximal alignment score if no
    internal gaps are allowed using different score
    matrices
  • finding all occurrences of one sequence in
    another sequence
  • writing a program that reads a DNA sequence,
    first transcribing DNA into RNA and printing the
    resulting RNA sequence, then translating RNA into
    a protein sequence through the following first,
    the program divides RNA into codons and prints
    the list of codons, and second, the codons are
    translated into the protein using genetic code
    table and finding the maximal alignment score if
    internal gaps are allowed using different score
    matrices.

14
Introduction to Python
  • Spring 08 different levels of programming and
    computational experience and the best way to
    cover this topic was through independent
    learning.
  • Handouts and Python and BioPython tutorials
    (www.python.org, www.biopython.org), worked each
    at their own pace.
  • Grading rubrics for each programming concept,
    minimal requirements to pass the specific
    concept, list of more advanced examples for
    students with prior Python experience.
  • Students with previous Python knowledge further
    advanced their experience and students new to
    Python learned the new programming language
    independently using structured guidance.
  • Python provides an opportunity to solve some
    problems in very short ways, and it was a very
    enjoyable experience for students to try to find
    a shortest solution for the proposed problems
    using Python functions and libraries.

15
Introduction to Bioinformatics Algorithms
  • Sequence alignments, scoring, gaps
  • Algorithm Design Techniques Exhaustive Search,
    Dynamic Programming
  • The Needleman and Wunsch Algorithm
  • The Smith-Waterman Algorithm
  • Introduction to BLAST
  • Introduction to Multiple Sequence Alignments
  • Visualization of algorithms
  • ALGGEN EMBER Web resources

16
Introduction to Bioinformatics Algorithms
  • Dynamic Programming technique usually is not
    covered in a core algorithms course
  • Provided an opportunity to expand the theoretical
    background and to make connections between theory
    and practice.
  • Helped to maintain an appropriate level of
    theoretical content required for upper-level
    elective courses in our department.
  • This topic was very well blended with biology
    topics and students had an opportunity to learn
    the concept of sequence alignments from biology
    and computer science points of view.
  • EMBER website provides a suite of multimedia
    bioinformatics educational tools, allows to
    create a set of hands-on activities to help
    students to gain understanding of the dynamic
    programming technique in general and specific
    algorithms in particular.

17
Course Curriculum Biology Topics
  • Biological Research on the Web
  • Public Biological Databases and Data Formats
  • NCBI - National Center for Biotechnology
    Information
  • Searching Biological Databases
  • Review of Molecular Biology and Biochemistry
    Concepts
  • DNA and protein structure
  • Gene expression (transcription and translation)
  • Molecular Biology Central Dogma
  • Sequence Alignments

18
Hands-On Activities
  • Microbes Count! BioQUEST Curriculum Consortium
  • Exploring HIV Evolution An Opportunity for
    Research.
  • The HIV genome is very small and relatively
    simple. It is made up of nine genes and about
    9,500 nucleotides.
  • In this lab students worked with HIV sequence
    data collected from 15 individuals from an
    intravenous-drug-using population in Baltimore.
  • The goal of the study was to determine if the HIV
    isolated from particular subgroups of subjects
    derives from a common source.
  • CLUSTALW multiple sequence alignment tool
  • Biology Workbench http//workbench.sdsc.edu/

19
Biology Topics Hands-On Activities
  • Microarray Lab, developed by Campbell and Heyer,
    sold by Carolina Biologicals, called DNA Chips
    Genes to Disease.
  • Understanding how microarrays are used to
    identify gene changes in disease and the role of
    gene expression in cancer.
  • Students compared the relative expression levels
    of six different genes in healthy lung cells and
    lung cancer cells.
  • After completing the lab, students had an
    opportunity to discuss the significance of the
    relative expression levels with respect to the
    genes' roles in causing cancer

20
Biology Topics Hands-On Activities
  • Epidemiology - the study of the distribution of
    diseases in populations.
  • Explored factors that influence disease spread
    throughout populations with the software
    Epidemiology. Ebola was used as a model organism
    and epidemiology was presented from both a
    microbiological and social perspective
  • Exploration of the structure and function of the
    insulin
  • generate a phylogenetic tree demonstrating
    evolution of insulin amongst the vertebrates -
    animals with an internal skeleton made of bone

21
Project DNA Sequence Annotation.
  • Real Data Bacillus anthracis str. Ames project
    at J. Craig Venter Institute
  • Input DNA about 50,000 nucleotides long,
    students worked on different sequences from the
    same organism.
  • Project Steps
  • Find a list of all potential genes and
    pseudo-genes in the input DNA sequence, using
    start and end codons, and to arrange found
    sequences in two separate lists potential genes
    (length is larger than 300) and pseudo-genes
    (length of is less than 300), in order of
    increasing length.
  • Locate the potential promoters in the given DNA
    sequences for each potential gene that they found
    in the first step, and calculated the strength of
    the promoter. A promoter is a region of DNA near
    the beginning of a gene that controls if and when
    the gene is actually expressed. Output list
    potential genes in order of decreasing promoter
    strength.
  • BLAST all potential genes and pseudo-genes that
    were found, and to perform an analysis of the
    results.

22
Project
  • Summary
  • For each potential and pseudo-gene start
    position, length, promoter score, BLAST results,
    summary and conclusion. For each sequence, we
    asked students to determine whether a potential
    gene could be a real gene based on the strength
    of the promoter and BLAST results.
  • 15-minute in-class presentations Python program,
    description of all Python functions that were
    used and the purpose of each function, all
    algorithms or/and programming techniques and the
    presentation and explanation of the summary
    results, including the information about the
    specific organism whose DNA was used as the
    input.
  • Spring 05 team project team of CS/CIS and
    biology student. Programming part of the project
    was mostly done by the computer science students,
    and the biology students were required to
    understand and to explain the programming
    techniques and algorithms that were used. The
    project provided a possibility for truly
    interdisciplinary collaboration between computer
    science and biology students.
  • Spring 08 individual project.

23
Course Results
  • Spring 05 no formal assessment survey was
    conducted.
  • An informal discussion about the course was
    conducted at the end of the semester and we asked
    students to provide their feedback. Students
    completed teaching evaluations and provided their
    comments there as well.
  • All students showed satisfaction with the course
    and we were very pleased to receive the request
    to extend the programming component of the course
    from almost all students. Biology students showed
    interest in programming and asked that an
    environment be created where they would be able
    to more fully participate in all stages of the
    course project.
  • Spring 08 a short post-survey was designed in
    order to assess the students experience which
    included a list of the topics that were covered
    in the course. We asked students to rate the
    level of learning for each topic on a scale of 1
    (not well) to 5 (very well). Six students were
    enrolled in the Spring 08

24
Topic Average
1. Introductory Python and ability to design simple Python programs 3.7
2. Advanced Python topics functions, loops, if-else statements, string manipulations, lists, and list manipulations 3.5
3. Designing complex Python programs using advanced Python features 3.3
4. Understanding the concept of sequence alignment global, local, semi-global, multiple sequence alignment 3.2
5. Understanding dynamic programming algorithmic technique 3.7
6. Understanding Exhaustive Search (brute force) algorithmic technique 4
7. Understanding Needleman-Wunsch algorithm and be able to trace the algorithm to produce the final result 3.8
8. Understanding Smith-Waterman algorithm and be able to trace the algorithm to produce the final result 3.8
9. The ability to work independently on the research based project applying computer science and biology knowledge to solve problems 4.3
10. Understanding how to use BLAST tool and to read the results of BLAST 4.2
11. Using sequence alignments to understand relatedness among species 3.8
25
12. Using sequence alignments forensically (HIV experiment) 4.2
13. Understanding how microarrays are used to identify gene changes in disease 3.3
14. Understanding the flow of information from DNA to protein 3.3
15. Using computer simulations to test hypotheses about disease spread 3.8
16. The ability to read a research paper in the Ethics, Computing and Genomics 3.8
17. The ability to communicate effectively through the participation in the Ethics, Computing and Genomics project 4
18. The ability to create an informative power point presentation to present the results of the Ethics, Computing and Genomics project 4.3
19. The ability to learn the topic by yourself and the ability to present results of learning in clear way 3.8
26
Course Results
  • All topics were learned on an above average level
  • Some of the topics will require our special
    attention and should be revised for future
    iterations.
  • Comment on the Ethics, Computing and Genomics
    component received positive feedback from most
    of the students.
  • Comments regarding the course most of the
    students mentioned that they loved the course and
    would recommend it to their peers they expressed
    their satisfaction with the level of the course
    and the amount of material covered and the depth
    of the coverage. They also mentioned that the
    final project was very interesting but at the
    same time they proposed to be more careful with
    the project description and to provide clear
    rules for finding genes on the main and
    complement strings to avoid confusion.

27
Future Plans
  • Blend more effectively computer science and
    biology topics
  • Guest speaker from the field
  • The teaching approach will try to foster student
    learning through a research-based process.
  • Further expanding programming and algorithms
    component
  • To return to team work in the project in order to
    enhance the collaborative component of the
    course.
  • More careful project description
  • Bioinformatics across computer science curriculum
  • Introduction to computer science
  • Design and analysis of algorithms
  • Programming for non-majors

28
References
  • 1. An Introduction to Bioinformatics Algorithms,
  • N.C. Jones and P. A. Pevzner, The MIT Press,
    2004
  • 2. Fundamental Concepts of Bioinformatics, D. E.
    Krane and M . L. Raymer, Publisher Benjamin
    Cummings, 2002
  • 3. Developing Bioinformatics Computer Skills, C.
    Gibas and P. Jambeck, OReilly, 2001
  • 4. Python/Biopython websites http//python.org
    http//biopython.org
  • 5. ALGGEN EMBER Web Resources
  • http//alggen.lsi.upc.es/docencia/ember/frame
    -ember.html
  • 6. Microbes Count! John R. Jungck, Ethel D.
    Stanley,
  • Marion Field Fass. BioQUEST Curriculum
    Consortium.
  • http//bioquest.org/microbescount/modules_b
    y_tools.pdf
  • 7. Heyer, Laurie J. and Campbell, A. Malcolm,
    Microarray Lab DNA Chips Genes to Disease.
    Carolina Biologicals
  • 8. Campbell, Neil, Reece, Jane (2004), Biology,
    Benjamin Cummings 7th edition

29
BLAST
  • The Basic Local Alignment Search Tool (BLAST)
    finds regions of local similarity between
    sequences.
  • The program compares nucleotide or protein
    sequences to sequence databases and calculates
    the statistical significance of matches.
  • BLAST can be used to infer functional and
    evolutionary relationships between sequences as
    well as help identify members of gene families.
  • Introduced by S. Altschul, W. Gish, W. Miller, E.
    Myers, and D. Lipman in the early 1990s
  • The original BLAST algorithm searches a sequence
    database for maximal un-gapped local alignments.

30
Sequence Alignment
  • Global Alignment compare two sequences in their
    entirety the gap penalty is assessed regardless
    of whether gaps are located internally within a
    sequence, or at the end of one or both sequences.
  • The Needleman and Wunsch Algorithm
  • Local Alignment find best matching subsequences
    within the two search sequences.
  • The Smith-Waterman Algorithm.

31
Sequence Alignment
  • Semi-Global Alignment different treatment of
    terminal (end) gaps. Terminal Gaps are usually
    the result of incomplete data and do not have
    biological significance. Example searching the
    best alignment between the short sequence and
    entire genome.
  • Modification of Needleman and Wunsch
    Algorithm.

32
Algorithm Design Techniques
  • Exhaustive Search (brute force) algorithm
    examines every possible alternative to find one
    particular solution
  • Dynamic Programming Algorithm breaks the problem
    into smaller sub-problems and uses the solutions
    of the sub-problems to construct the solution of
    the larger problem.

33
Needleman and Wunsch Algorithm
  • Input two strings X x1xM and Y y1yN and
    scoring rules scoring matrix s and gap penalty
    GP
  • Output An alignment of X and Y whose score as
    defined by scoring rules is maximal among all
    possible alignments of X and Y

34
  • Let F(i, j) optimal score of aligning x1xi and
    y1yj
  • Initialization F(0,0) 0, F(0, i) -i, F(j,
    0) -j
  • ( i 1.M, j 1.N )
  • Main Iteration For each i 1.M and j 1.N
  • Termination F(M,N) is an optimal score

35
  • Finding the optimal alignment
  • Every non-decreasing path from (0, 0) to (M,N)
    corresponds to an global alignment of the two
    sequences.
  • Use TraceBackP starting at (M,N) to trace back an
    optimal alignment
  • case 1 xi aligns to yj
  • case 2 xi aligns to a gap
  • case 3 yj aligns to a gap

36
Global Alignment Example
A C G
0 -1 -2 -3
A -1 1 0 -1
A -2 0 1 0
C -3 -1 1 1
T -4 -2 0 1
  • Find the optimal global alignment of AACT and
    ACG.
  • Scoring rules match 1, mismatch 0,
  • gap penalty GP -1

Optimal Alignments Alignment 1 score 1
A A C T -
A C G
Alignment 2 score 1 A A C T
A - C G
37
Smith-Waterman Algorithm
  • Input Strings X and Y and scoring rules scoring
    matrix s and gap penalty GP.
  • Output Substrings of X and Y whose global
    alignment, as defined by scoring rules is maximal
    among all global alignments of all substrings of
    X and Y.

38
  • Initialization F(0,0) 0, F(0, i) 0, F(j, 0)
    0
  • ( i 1.M, j 1.N )
  • Main Iteration For each i 1.M and j
    1.N
  • Largest value of F(i, j) represents the score of
    the best local alignment of X and Y
  • Traceback begins at the highest score in the
    matrix and continues until you reach 0.

39
Local Alignment Example
A C G
0 0 0 0
A 0 1 0 0
A 0 1 1 0
C 0 0 2 1
T 0 0 1 2
  • Find the optimal local alignment of AACT and ACG.
  • Scoring rules match 1, mismatch 0, gap
    penalty GP -1
  • Solution Local Alignment
  • Score 2
  • A C
  • A C
Write a Comment
User Comments (0)
About PowerShow.com