Title: Interdisciplinary Introductory Course in Bioinformatics
1Interdisciplinary Introductory Course in
Bioinformatics
- Yana Kortsarts
- Computer Science Department
- Robert Morris
- Biology Department
- Janine Utell
- English Department,
- Widener University, Chester, PA
2What is Bioinformatics?
- Bioinformatics is a relatively new
interdisciplinary field that integrates computer
science, mathematics, biology, and information
technology to manage, analyze, and understand
biological, biochemical and biophysical
information. - Bioinformatics is a computational science and the
subset of larger field of Computational Biology.
3Motivation
- IS professionals must have strong analytical and
critical thinking skills. (IS 2002 Model
Curriculum and Guidelines for Undergraduate
Degree Programs in IS) - Introducing bioinformatics to CIS students will
strengthen these required skills. - Equip students with some of the following
capabilities as suggested in the IS 2002
guidelines - Creativity
- Application of both traditional and new concepts
and skills - Application development
- Problem solving abilities
- Ability to communicate effectively (oral, written
and listening)
4Motivation
- Provides opportunities for students to become
familiar with one of the most widely used script
languages, Python - Explore various data structures and algorithmic
techniques traditionally not covered in other
courses. - Helps students to make connections between
theoretical topics learned in core CS and CIS
courses, such as Data Structures and Algorithms,
and to apply their knowledge to real world
biology problems. - Helps to diversify department course offering and
provides interdisciplinary opportunities for CS
and CIS students. - CIS and CS students with bioinformatics
background clearly will enhance their employment
qualifications in the competitive job market
5Challenges
- Students have different backgrounds
- Choosing programming language
- Defining course prerequisites
- Defining course content
- Programming
- Algorithms
- End User Bioinformatics Tools
- Balanced course
- Content
- Hands-on/lecture
- Interdisciplinary Nature
6Course Development
- First Iteration Spring 05 Second Iteration
Spring 08 - Crosslisted upper level technical elective.
- Prerequisites
- Biology/Chemistry/Biochemistry majors
Introduction to Molecular Biology - CS/CIS/MATH majors Introduction to Computer
Science I - Chemical Engineering Majors Computer Programming
and Engineering Problem Solving - Team teaching Biology and Computer Science
Faculty - 4 credits, 6 hr 4 CS, 2 Biology
- Spring 05 Enrollment 6 Biology students, 6
CS/CIS - Spring 08 Enrollment 6 CS/CIS
7Course Objectives and Goals
- To integrate bioinformatics algorithms into the
course and to teach the foundations of the
algorithms and important results in
bioinformatics - To introduce students to the Python programming
language. Biopython Project is an international
association of developers of freely available
Python tools for computational molecular biology.
8Course Objectives and Goals
- To introduce students to the principles that
drive an algorithms design and to intellectual
content of bioinformatics - To provide an opportunity for interdisciplinary
collaboration in the in-class assignments and the
course project
9Course Curriculum
- Ethics, Computing and Genomics
- Project-Oriented Component, new for Spring 08
- 20 of the final grade, three weeks to work on
this project - Goal developing oral and written communication
skills and to engage students in the knowledge
exchange process - Learning about the ethics, computing and genomics
topic independently and presenting the results of
the self-learning. - Students were assigned one or more scholarly
articles from the collection Ethics, Computing,
and Genomics, edited by Herman Tavani. - Students were required
- read assigned essays
- prepare 25-minute Power Point presentations with
a summary of the paper and answers to the
questions posed in the introductory part of the
corresponding section - prepare a mini-quiz to assess the understanding
of the presented material by their peers.
10Ethics, Computing and Genomics
- Collaborative Work with English faculty member
- English faculty member did short presentation
before students started to work on this
assignment. - discussion of how to read critically and what
questions to ask while reading the text - discussion of how to summarize the paper using
the structure of the essay as a guide and
elucidating key points and key moments of
evidence while making connections to the rest of
the class material - tips on writing the summary that include three
steps prewriting, drafting, revising - discussion of how to design an effective
presentation of information. - Was present at all oral presentations and
provided detailed notes for each student
explaining ways the presentation could have been
stronger and also pointing out the positive and
negative aspects of the presentation. - This successful and enjoyable experience showed
the value of working with colleagues across
disciplines to further student learning.
11Course Curriculum Introduction to Python
- Quickly introduce students to Python during first
few weeks of the course - Working on different problem solving algorithmic
techniques. - Introductory topics arithmetic, decision and
loop structures, functions, simple manipulations
with strings, lists, tuples and dictionaries. - Advanced Python topics were taught later
throughout the course, building students
knowledge and their abilities to tackle biology
real-world problems. - The programming examples were all
biology-oriented and motivated students to learn
in order to solve practical problems.
12Introduction to Python
- Spring 05 6 biology students, no programming
experience, 6 CS and CIS students no experience
in Python. - 6 interdisciplinary teams, all concepts were
practiced within the team - Spring 08 all students were CS and CIS majors
with prior programming experience in C and Java,
and some with introductory knowledge in Python - Special handouts were prepared to walk students
through the introductory topics toward advanced
Python concepts. - Each topic was supported by a list of examples in
increasing order of complexity. - Students were required to run the proposed
programs in order to gain understanding of basic
Python structures. - To assess the understanding of each concept,
students were required to write short programs
solving biology-oriented problems.
13Introduction to Python
- Examples of the problems, given here in
increasing level of complexity - computation of the alignment score between two
DNA sequences using different score matrices - finding the maximal alignment score if no
internal gaps are allowed using different score
matrices - finding all occurrences of one sequence in
another sequence - writing a program that reads a DNA sequence,
first transcribing DNA into RNA and printing the
resulting RNA sequence, then translating RNA into
a protein sequence through the following first,
the program divides RNA into codons and prints
the list of codons, and second, the codons are
translated into the protein using genetic code
table and finding the maximal alignment score if
internal gaps are allowed using different score
matrices.
14Introduction to Python
- Spring 08 different levels of programming and
computational experience and the best way to
cover this topic was through independent
learning. - Handouts and Python and BioPython tutorials
(www.python.org, www.biopython.org), worked each
at their own pace. - Grading rubrics for each programming concept,
minimal requirements to pass the specific
concept, list of more advanced examples for
students with prior Python experience. - Students with previous Python knowledge further
advanced their experience and students new to
Python learned the new programming language
independently using structured guidance. - Python provides an opportunity to solve some
problems in very short ways, and it was a very
enjoyable experience for students to try to find
a shortest solution for the proposed problems
using Python functions and libraries.
15Introduction to Bioinformatics Algorithms
- Sequence alignments, scoring, gaps
- Algorithm Design Techniques Exhaustive Search,
Dynamic Programming - The Needleman and Wunsch Algorithm
- The Smith-Waterman Algorithm
- Introduction to BLAST
- Introduction to Multiple Sequence Alignments
- Visualization of algorithms
- ALGGEN EMBER Web resources
16Introduction to Bioinformatics Algorithms
- Dynamic Programming technique usually is not
covered in a core algorithms course - Provided an opportunity to expand the theoretical
background and to make connections between theory
and practice. - Helped to maintain an appropriate level of
theoretical content required for upper-level
elective courses in our department. - This topic was very well blended with biology
topics and students had an opportunity to learn
the concept of sequence alignments from biology
and computer science points of view. - EMBER website provides a suite of multimedia
bioinformatics educational tools, allows to
create a set of hands-on activities to help
students to gain understanding of the dynamic
programming technique in general and specific
algorithms in particular.
17Course Curriculum Biology Topics
- Biological Research on the Web
- Public Biological Databases and Data Formats
- NCBI - National Center for Biotechnology
Information - Searching Biological Databases
- Review of Molecular Biology and Biochemistry
Concepts - DNA and protein structure
- Gene expression (transcription and translation)
- Molecular Biology Central Dogma
- Sequence Alignments
18Hands-On Activities
- Microbes Count! BioQUEST Curriculum Consortium
- Exploring HIV Evolution An Opportunity for
Research. - The HIV genome is very small and relatively
simple. It is made up of nine genes and about
9,500 nucleotides. - In this lab students worked with HIV sequence
data collected from 15 individuals from an
intravenous-drug-using population in Baltimore. - The goal of the study was to determine if the HIV
isolated from particular subgroups of subjects
derives from a common source. - CLUSTALW multiple sequence alignment tool
- Biology Workbench http//workbench.sdsc.edu/
19Biology Topics Hands-On Activities
- Microarray Lab, developed by Campbell and Heyer,
sold by Carolina Biologicals, called DNA Chips
Genes to Disease. - Understanding how microarrays are used to
identify gene changes in disease and the role of
gene expression in cancer. - Students compared the relative expression levels
of six different genes in healthy lung cells and
lung cancer cells. - After completing the lab, students had an
opportunity to discuss the significance of the
relative expression levels with respect to the
genes' roles in causing cancer
20Biology Topics Hands-On Activities
- Epidemiology - the study of the distribution of
diseases in populations. - Explored factors that influence disease spread
throughout populations with the software
Epidemiology. Ebola was used as a model organism
and epidemiology was presented from both a
microbiological and social perspective - Exploration of the structure and function of the
insulin - generate a phylogenetic tree demonstrating
evolution of insulin amongst the vertebrates -
animals with an internal skeleton made of bone
21Project DNA Sequence Annotation.
- Real Data Bacillus anthracis str. Ames project
at J. Craig Venter Institute - Input DNA about 50,000 nucleotides long,
students worked on different sequences from the
same organism. - Project Steps
- Find a list of all potential genes and
pseudo-genes in the input DNA sequence, using
start and end codons, and to arrange found
sequences in two separate lists potential genes
(length is larger than 300) and pseudo-genes
(length of is less than 300), in order of
increasing length. - Locate the potential promoters in the given DNA
sequences for each potential gene that they found
in the first step, and calculated the strength of
the promoter. A promoter is a region of DNA near
the beginning of a gene that controls if and when
the gene is actually expressed. Output list
potential genes in order of decreasing promoter
strength. - BLAST all potential genes and pseudo-genes that
were found, and to perform an analysis of the
results.
22Project
- Summary
- For each potential and pseudo-gene start
position, length, promoter score, BLAST results,
summary and conclusion. For each sequence, we
asked students to determine whether a potential
gene could be a real gene based on the strength
of the promoter and BLAST results. - 15-minute in-class presentations Python program,
description of all Python functions that were
used and the purpose of each function, all
algorithms or/and programming techniques and the
presentation and explanation of the summary
results, including the information about the
specific organism whose DNA was used as the
input. - Spring 05 team project team of CS/CIS and
biology student. Programming part of the project
was mostly done by the computer science students,
and the biology students were required to
understand and to explain the programming
techniques and algorithms that were used. The
project provided a possibility for truly
interdisciplinary collaboration between computer
science and biology students. - Spring 08 individual project.
23Course Results
- Spring 05 no formal assessment survey was
conducted. - An informal discussion about the course was
conducted at the end of the semester and we asked
students to provide their feedback. Students
completed teaching evaluations and provided their
comments there as well. - All students showed satisfaction with the course
and we were very pleased to receive the request
to extend the programming component of the course
from almost all students. Biology students showed
interest in programming and asked that an
environment be created where they would be able
to more fully participate in all stages of the
course project. - Spring 08 a short post-survey was designed in
order to assess the students experience which
included a list of the topics that were covered
in the course. We asked students to rate the
level of learning for each topic on a scale of 1
(not well) to 5 (very well). Six students were
enrolled in the Spring 08
24Topic Average
1. Introductory Python and ability to design simple Python programs 3.7
2. Advanced Python topics functions, loops, if-else statements, string manipulations, lists, and list manipulations 3.5
3. Designing complex Python programs using advanced Python features 3.3
4. Understanding the concept of sequence alignment global, local, semi-global, multiple sequence alignment 3.2
5. Understanding dynamic programming algorithmic technique 3.7
6. Understanding Exhaustive Search (brute force) algorithmic technique 4
7. Understanding Needleman-Wunsch algorithm and be able to trace the algorithm to produce the final result 3.8
8. Understanding Smith-Waterman algorithm and be able to trace the algorithm to produce the final result 3.8
9. The ability to work independently on the research based project applying computer science and biology knowledge to solve problems 4.3
10. Understanding how to use BLAST tool and to read the results of BLAST 4.2
11. Using sequence alignments to understand relatedness among species 3.8
2512. Using sequence alignments forensically (HIV experiment) 4.2
13. Understanding how microarrays are used to identify gene changes in disease 3.3
14. Understanding the flow of information from DNA to protein 3.3
15. Using computer simulations to test hypotheses about disease spread 3.8
16. The ability to read a research paper in the Ethics, Computing and Genomics 3.8
17. The ability to communicate effectively through the participation in the Ethics, Computing and Genomics project 4
18. The ability to create an informative power point presentation to present the results of the Ethics, Computing and Genomics project 4.3
19. The ability to learn the topic by yourself and the ability to present results of learning in clear way 3.8
26Course Results
- All topics were learned on an above average level
- Some of the topics will require our special
attention and should be revised for future
iterations. - Comment on the Ethics, Computing and Genomics
component received positive feedback from most
of the students. - Comments regarding the course most of the
students mentioned that they loved the course and
would recommend it to their peers they expressed
their satisfaction with the level of the course
and the amount of material covered and the depth
of the coverage. They also mentioned that the
final project was very interesting but at the
same time they proposed to be more careful with
the project description and to provide clear
rules for finding genes on the main and
complement strings to avoid confusion.
27Future Plans
- Blend more effectively computer science and
biology topics - Guest speaker from the field
- The teaching approach will try to foster student
learning through a research-based process. - Further expanding programming and algorithms
component - To return to team work in the project in order to
enhance the collaborative component of the
course. - More careful project description
- Bioinformatics across computer science curriculum
- Introduction to computer science
- Design and analysis of algorithms
- Programming for non-majors
28References
- 1. An Introduction to Bioinformatics Algorithms,
- N.C. Jones and P. A. Pevzner, The MIT Press,
2004 - 2. Fundamental Concepts of Bioinformatics, D. E.
Krane and M . L. Raymer, Publisher Benjamin
Cummings, 2002 - 3. Developing Bioinformatics Computer Skills, C.
Gibas and P. Jambeck, OReilly, 2001 - 4. Python/Biopython websites http//python.org
http//biopython.org - 5. ALGGEN EMBER Web Resources
- http//alggen.lsi.upc.es/docencia/ember/frame
-ember.html - 6. Microbes Count! John R. Jungck, Ethel D.
Stanley, - Marion Field Fass. BioQUEST Curriculum
Consortium. - http//bioquest.org/microbescount/modules_b
y_tools.pdf - 7. Heyer, Laurie J. and Campbell, A. Malcolm,
Microarray Lab DNA Chips Genes to Disease.
Carolina Biologicals - 8. Campbell, Neil, Reece, Jane (2004), Biology,
Benjamin Cummings 7th edition
29BLAST
- The Basic Local Alignment Search Tool (BLAST)
finds regions of local similarity between
sequences. - The program compares nucleotide or protein
sequences to sequence databases and calculates
the statistical significance of matches. - BLAST can be used to infer functional and
evolutionary relationships between sequences as
well as help identify members of gene families. - Introduced by S. Altschul, W. Gish, W. Miller, E.
Myers, and D. Lipman in the early 1990s - The original BLAST algorithm searches a sequence
database for maximal un-gapped local alignments.
30Sequence Alignment
- Global Alignment compare two sequences in their
entirety the gap penalty is assessed regardless
of whether gaps are located internally within a
sequence, or at the end of one or both sequences.
- The Needleman and Wunsch Algorithm
- Local Alignment find best matching subsequences
within the two search sequences. - The Smith-Waterman Algorithm.
31Sequence Alignment
- Semi-Global Alignment different treatment of
terminal (end) gaps. Terminal Gaps are usually
the result of incomplete data and do not have
biological significance. Example searching the
best alignment between the short sequence and
entire genome. - Modification of Needleman and Wunsch
Algorithm.
32Algorithm Design Techniques
- Exhaustive Search (brute force) algorithm
examines every possible alternative to find one
particular solution - Dynamic Programming Algorithm breaks the problem
into smaller sub-problems and uses the solutions
of the sub-problems to construct the solution of
the larger problem.
33Needleman and Wunsch Algorithm
- Input two strings X x1xM and Y y1yN and
scoring rules scoring matrix s and gap penalty
GP - Output An alignment of X and Y whose score as
defined by scoring rules is maximal among all
possible alignments of X and Y
34- Let F(i, j) optimal score of aligning x1xi and
y1yj - Initialization F(0,0) 0, F(0, i) -i, F(j,
0) -j - ( i 1.M, j 1.N )
- Main Iteration For each i 1.M and j 1.N
- Termination F(M,N) is an optimal score
35- Finding the optimal alignment
- Every non-decreasing path from (0, 0) to (M,N)
corresponds to an global alignment of the two
sequences. - Use TraceBackP starting at (M,N) to trace back an
optimal alignment - case 1 xi aligns to yj
- case 2 xi aligns to a gap
- case 3 yj aligns to a gap
36Global Alignment Example
A C G
0 -1 -2 -3
A -1 1 0 -1
A -2 0 1 0
C -3 -1 1 1
T -4 -2 0 1
- Find the optimal global alignment of AACT and
ACG. - Scoring rules match 1, mismatch 0,
- gap penalty GP -1
Optimal Alignments Alignment 1 score 1
A A C T -
A C G
Alignment 2 score 1 A A C T
A - C G
37Smith-Waterman Algorithm
- Input Strings X and Y and scoring rules scoring
matrix s and gap penalty GP. - Output Substrings of X and Y whose global
alignment, as defined by scoring rules is maximal
among all global alignments of all substrings of
X and Y.
38- Initialization F(0,0) 0, F(0, i) 0, F(j, 0)
0 - ( i 1.M, j 1.N )
- Main Iteration For each i 1.M and j
1.N - Largest value of F(i, j) represents the score of
the best local alignment of X and Y - Traceback begins at the highest score in the
matrix and continues until you reach 0.
39Local Alignment Example
A C G
0 0 0 0
A 0 1 0 0
A 0 1 1 0
C 0 0 2 1
T 0 0 1 2
- Find the optimal local alignment of AACT and ACG.
- Scoring rules match 1, mismatch 0, gap
penalty GP -1 - Solution Local Alignment
- Score 2
- A C
-
- A C