SEQUENCE RETRIEVAL SYSTEM - PowerPoint PPT Presentation

About This Presentation
Title:

SEQUENCE RETRIEVAL SYSTEM

Description:

Example: get all mouse and mouse related proteins in SWISS ... To browse an index go to the information page for a particular field from a certain database ... – PowerPoint PPT presentation

Number of Views:506
Avg rating:3.0/5.0
Slides: 63
Provided by: defau364
Category:

less

Transcript and Presenter's Notes

Title: SEQUENCE RETRIEVAL SYSTEM


1
  • SEQUENCE RETRIEVAL SYSTEM
  • SRS
  • Ashwin Sivakumar,
    02/12/03
  • Hands on Workshop on Protein
    Analysis (HOW)

2
http//srs.ebi.ac.uk
Database Information -which are present -when
indexed
3
What is SRS?
  • Central resource for molecular biology data
  • Data retrieval system
  • - more than 250 databanks have been indexed. More
    than 35 SRS servers over the WWW
  • Data analysis applications server
  • - 11 protein applications
  • - 6 nucleic acid applications
  • Uniform query interface on the web

4
Data Jungle
Sequencing information
genetics
Structural biology
molecularbiology
medicine
physiology
toxicology
gene expression
5
History of SRS
  • 1990 - Main author Dr. Thure Etzold
  • Development started in EMBL, Heidelberg
  • 1997
  • Moved to EBI in Cambridge. Development work was
    supported by various grants amongst others from
    the EMBnet.
  • 1998
  • Etzold and his group join LionBiosciences

6
Why SRS?
  • Information retrieval
  • Easy way to retrieve information from sequence
    and sequence-related databases
  • Possibility to search for multiple words/other
    criteria
  • Linkage between different databases
  • E.g. Find all primary structures with known
    three-dimensional structure
  • ... and much more

7
Philosophy of SRS
Original database file -plain text, html,
xml
8
Temporary Projects
  • Queries and views are stored by the project
    manager temporarily
  • Temporary sessions last 24 hours
  • Useful when you
  • Do not need to keep your results
  • look something up quickly
  • Run an occasional application
  • Click on Start paw on SRS start page

9
Permanent Projects
  • Queries and views are stored by the project
    manager in a single location
  • They are available for use in the future
  • Useful when
  • You want to return to a session
  • Want to have many projects in the same session
  • Begin by clicking Permanent session paw on SRS
    start page
  • Just need to enter an SRS user name and re-enter
    this to return to same session again later

10
The Library Select Page
11
SRS main toolbar tabs
  • Top Page displays databases in different
    database groups
  • Query displays either the standard or extended
    query form
  • Results or the query manager maintains a
    history of all the results obtained during a
    session
  • Projects or the project manager maintains a
    history of all queries and views used during a
    session
  • Views allows a user to define a user specific
    view for one or more databases
  • Databanks contains a list and some facts about
    the databases available in the system

12
Search terms in SRS
  • SRS indexed fields can be searched using any of
    the following
  • Single word search
  • Multiple word phrases
  • Numbers and dates
  • Regular expressions
  • Wildcards

13
Search methods
  • Quick search button
  • Works by searching all datafields of type text
  • The quickest way to generate query results
  • For very general/broad searches
  • Example get all mouse and mouse related proteins
    in SWISS-PROT
  • All Entries button
  • Returns all entries in the database selected
  • Search forms allow you to specify your area of
    interest in more detail
  • Standard query form
  • Extended query form

14
Standard query form
  • Enter up to 4 separate search terms against up to
    4 datafields simultaneously
  • Combine entries with logical operators ( and or
    butnot ! )
  • Choose the number of entries to display per page
  • Retrieve entries of type (entry or
    subentry(name))
  • Choose a view
  • use an SRS predefined view
  • create one of your own by selecting specific
    fields from a dropdown menu (and choose whether
    to view a list or table in SRS7)

15
The Standard Query Page
16
Extended query form
  • Can enter search terms for as many fields as you
    want
  • Combine searches with logical operators ( and
    or butnot ! )
  • Choose how many results to display per page
  • Choose view and sequence format to use
  • Can choose an SRS predefined view
  • Define your own view by clicking the boxes next
    to the fields that you want to have displayed
    (list or table option in SRS7)
  • Each field name has a hyperlink to the
    description page for that field
  • Form provides less than lt and greater than gt
    for numerical fields
  • Choose what type of entries to retrieve (entry,
    subentry (name))
  • on extended form if you query a subentry field,
    it defaults to returning results of type subentry

17
Extended query page
Predefind views
User defined view
Fields
18
Differences in these 2 forms
  • Ranges
  • standard must use
  • extended provides lt and gt
  • Type retrieval
  • standard defaults to retrieving entries of type
    entry
  • extended defaults to retrieving entries of type
    entry unless you query a subentry field in which
    case the default is the subentry type
  • Controlled vocabulary fields
  • standard does not provide you with a list for
    these fields
  • extended provides a drop down menu for these
    fields allowing you to select an option

19
Wildcards
  • These are useful when
  • Searching for a group of words (eg. Words
    starting cell and ending ase cellase)
  • If unclear about how a word is spelt in a
    database
  • Two types
  • one or more characters of any value
  • ? Single character of any value
  • Any number of wildcards can be placed anywhere in
    a search word
  • Placing a wildcard at the start of a word or
    string may increase response time because all
    words in the index have to be checked against the
    string

20
Regular expressions
  • NB Must appear within forward slashes (/)
  • Some operators
  • marks the start of a string /glu/
    begins with glu
  • marks the end of a string /ase/
    ends with ase
  • . dot is any single character
  • characters in square brackets are regarded as
    a set, any of which can be matched
  • 0-9 specifies a range of 1 to 9
  • the preceding group may be repeated zero or
    more times
  • the preceding group may be repeated one or more
    times
  • ? The preceding character/group occurs one or
    zero times

21
Some examples
  • /glu/ will find terms
    beginning with glu
  • /ase/ will find terms
    ending with ase
  • /c.t/ will find the
    words cat, cot, cut.
  • /c.t/ will find terms
    beginning with c and
  • then any number of
    characters and ending with t
  • /smiyth/ will find the words
    smith or smyth
  • /rho1-9/ will find the word rho
    followed by a number from 1-9
  • /mue?ller/ will find muller or
    mueller
  • NB. The symbol has two meanings
  • -within forward slashes / it means the
    preceding group may be
  • repeated zero or more times
  • - outside forward slashes it means any
    character

22
Numerical ranges
  • In a numerical index it is possible to search
    numerical ranges
  • - sequence lengths, mol. weights, dates.
  • the is used for specifying ranges and !
    for excluding values
  • 400500 all seq. with length between 400 and
    500
  • 400 all seq. with lengths greater
    than 400
  • 500 all seq. with lengths less than
    500
  • 400!500 all seq. with lengths bet. 400 and
    500 excluding 500
  • Can combine ranges using logical operators
  • 300!400 !500600 or 300600 !
    400500
  • Dates in SRS have 2 formats
  • YYYYMMDD 20021205
  • DD-MMM-YYYY 05-Dec-2002

23
Some examples
  • Find entries with sequences having length betwwen
    300 and 400
  • excluding 400 and between 500 and 600
    excluding 500
  • 300!400 !500600 or 300600
    ! 400500
  • Find entries that were created in the first half
    of 2001
  • 01-jan-200130-jun-2001 or
    2001010120010630
  • Find all entries updated since May this year
  • 01-may-2002 or
    20020501

24
SRS Indexing
  • SRS indexes database records using a word by
    word approach.
  • - DE Human glutathione transferase
  • The SRS description index will contain terms
    human, glutathione and transferase.
  • () AND human glutathione transferase
  • () OR human glutathione transferase
  • (!) BUTNOT human ! glutathione ! transferase

25
human glutathione transferase
EMBL
transferase
gluthathione transferase ! human
human transferase ! glutathione
26
Databanks information page
  • Lists the databases available in the system and a
    summary about them
  • Number of entries in the database
  • Date it was indexed
  • Group it belongs to
  • Its availability status
  • Hyperlinks to information page specific to each
    database

27
Databanks Information Page
28
Database information page
  • Provides a detailed description about the
    database contents, source, ftp site, literature
  • Lists information about the fields that are
    present in the database including
  • Name of field
  • Short name for field
  • Type of field
  • index it is indexed
  • num indexed and a numerical field
  • id unique field
  • show not indexed, just for display
  • Number of keys for that field
  • Date it was indexed
  • Lists databases that it is linked to and how many
    entries are linked respectively

29
PROSITE information page
30
Browsing indices
  • This gives information on what is being indexed
    for a particular field
  • Single words, multiple words, controlled
    vocabulary..
  • To browse an index go to the information page for
    a particular field from a certain database
  • If you want to look at all indexed terms use
  • If you want all terms beginning with trans use
    trans
  • If you want all terms containing the string trans
    use trans

31
Browsing the description field index for terms
beginning with trans...
32
Query manager
  • Found under the results tab
  • Saves a history of results obtained in the
    session
  • Page allows you to return to previous results
    and
  • Combine them using logical operators thus
    allowing you to perform a multi-step query
  • Use a different view to display them
  • Perform further actions link, save, delete

33
The Query Manager
34
Project manager
  • Found under the projects tab
  • Saves a history of queries performed in the
    session
  • Can upload/download SRS session files from a
    desktop
  • In a permanent session, the project manager can
    also
  • Manage numerous SRS projects at the same time
  • Move queries/views between projects
  • Upload/download projects to desktop
  • Delete projects

35
Project manager page
36
User owned databanks
  • Found in the category user owned databanks on
    top page
  • User can upload their own nucleotide or protein
    sequence data into a user owned database
  • sequences must be in fasta format
  • any number of sequences can be uploaded
  • database is specific to the individual and to the
    session
  • Can launch applications on database sequences

37
User owned data
38
  • Paste or upload a file
  • Fasta formatted files
  • Any number of sequences
  • Maintained throughout user session

39
Operations on results
  • Linking link results to other databases
  • Saving save results in different formats to the
    browser or a file
  • Viewing view results using different formats
  • Sequence analysis launch applications on the
    results
  • SRS6 11 protein applications, 6 nucleic acid
    apps.
  • SRS7 more than 100 applications available

40
The Results Page
Operations
41
SRS6 versus SRS7
  • SRS7 provides over 100 applications while SRS6
    provides 17
  • You can retrieve results in either list or table
    format in SRS7
  • In SRS6 only the table format is available
  • Current EBI version 7.1.1

42
SRS6 -- first view
Start a new session by clicking here.
43
Top page
  1. Select one or more databases by ticking the
    corresponding box
  1. Select type of query form

44
Different types of database in SRS
  • Sequence structure
  • DNA, protein, three-dimensional structures
  • Sequence-related
  • Gene-related
  • Genome, mapping, mutations, transcription factors
  • SNP
  • Bibliographic
  • Medline, enzyme
  • User-defined

45
Standard query form
  1. Select AND or OR if multiple search items are used
  1. Submit query
  1. Type text to search for
  1. Select field to search
  1. Select number of results to show at a time

46
Query result -- table mode
  1. Accession number, description and sequence length
  1. Link sequences to other databases
  1. Mode of viewing can be changed
  1. Hypertext links
  1. Possibility to analyse sequences with other
    tools, e.g. FastA and ClustalW
  1. Tick boxes to select/deselect sequences for
    further analyses

47
Example query
  • Use SRS to answer the following questionFor
    which short-chain dehydrogenases/ reductases
    (SDR) are the three-dimensional structure known
    in PDB?

48
Example, Query form
  1. Enter the search term sdr
  1. Enter in which field to search

49
Example, Query result
  • Press the button Link in order to get to the Link
    page

50
Link page
  1. You can link in three different ways
  1. Finally, we press the Submit link button
  1. In this case, we select to link to PDB
  1. The we select chunk size and view mode

51
Link results
52
Example of a Swissprot entry
53
Example of a Swissprot entry, cont.
  • Click this link to get to the corresponding
    Medline entry (in PubMed)

54
PubMed entry
  • By clicking this link, you have the possibility
    to download the electronic version of the article.

55
The Top page tab
56
The Query tab
57
The Results tab
58
The Sessions tab
59
The Views tab
60
The Databanks tab
61
Acknowledgements
  • Bengt Persson
  • MBB, Karolinska institutet (demos)
  • 2can tutorial on SRS at EBI
  • http//downloads.lionbio.co.uk/publicsrs.html
    (The latest SRS server list)

62
server breakup
  • srs.sanger.ac.uk (5)
  • srs.ebi.ac.uk (5)
  • srs.csc.fi (5)
  • titanic.thep.lu.se/srs71/ (5)
  • If you think the load on a server is slowing your
    query, chose an alternative server to practice on.
Write a Comment
User Comments (0)
About PowerShow.com