Title: Lecture 3.1 MySQL and JDBC
1Lecture 3.1 MySQL and JDBCLets talk databases
- Sohrab Shah
- UBC Bioinformatics Centre
- sohrab_at_bioinformatics.ubc.ca
- http//bioinformatics.ubc.ca/people/sohrab
2Objectives
- Learn the basics of relational databases
- Learn how to use MySQL
- Learn how to use the Structured Query Language
(SQL) - Learn to communicate with MySQL through the Java
Database Connectivity (JDBC) protocol
3Outline
- Why are databases important in bioinformatics?
- Brief background in databases
- Introduction to the Structured Query Language
- A worked example a Sequence database in MySQL
- The JDBC protocol using Connector/J for MySQL
4MySQL
- Relational Databases
- Advantages
- Representing data
- SQL
- Examples
- Sequence
- INSERT
- DELETE
- UPDATE
- SELECT
- JOIN
- MySQL
- Free
- Fast
- Functional
- MySQL in bioinformatics
- Ensembl
- Gene Ontology
- UCSC
- JDBC
- Connector/J
- Making a connection
- Preparing a query
- Executing a query
- Getting results
5What is a database?
- Collection of information
- Spreadsheet
- Filing cabinet
- Oracle database
- Biology is abound with collections of data
- Tsunami, deluge, avalanche, flood
- Databases help us efficiently organise, integrate
and query data in order to make scientific
inferences
http//bioteach.ubc.ca
6Databases and bioinformatics
- Nucleotide records
36,653,899 - Protein sequences
4,436,362 - 3D structures
19,640 - Interactions complexes
52,385 - Human Unigene Cluster
118,517 - Maps and Complete Genomes
6,948 - Different taxonomy Nodes
283,121 - Human dbSNP
13,179,601 - Human RefSeq records
22,079 - bp in Human Contigs gt 5,000 kb (116)
2,487,920,000 - PubMed records
12,570,540 - OMIM records
15,138
7Molecular biology needs databases!
- High volume complex data structures
-
HELP!
8NAR Database Issue - 2004
142 articles
9RELATIONAL DATABASES
10Relational Databases
- A brief history
- Developed by E.F. Codd (IBM) 1969-70
- Died 2003
- Awarded the Turing prize for his work
- Developed 12 rules to define a RD that call for a
language to define, manipulate and query the data
in the database - 1 rule led to the Structured Query Language (SQL)
that is used in every RDMBS system on the market - ANSI standard (92,99)
11SQL
12Relational Model
- All data stored in tables
- Table is a relation made up of columns (fields)
and rows (records) - Intersection of a column and a row is a typed
value - Integer, Real, Varchar, Text, Blob, etc
- Operations on tables produce tables
13Advantages of the relational model
- Data independence
- Shielding the data from the application
- Efficiency
- Storage, retrieval, integration
- Data integrity/security
- Constraints, access controls
14ACID test
- Atomicity
- all or nothing transaction
- If one operation fails, all fail
- Consistency
- data integrity
- constraints
- Isolation
- Every transaction has a consistent view of the
database regardless of what other transactions
are being processes - Durability
- Once a transaction is complete, the newly updated
data will survive failures of any kind - logs
15Research fuelled by corporate databases gives us
great technology for biological science
- 30 years of research into robust systems
- Industry standards for databases
- Vendors committed to high-quality products
- Oracle, DB2, Sybase, MS SQLserver, etc
- Emergence of the internet and database driven
web-content set the stage for bioinformatics - Data mining tools for creating statistical
associations - Diapers and beer?
- Teradata, a division of NCR Corporation
16SQL
17What drives a database?
SQL
18SQL
- Structured Query Language (ANSI 92,99)
- Used in virtually every RDBMS product
- Has operations for
- Creating tables
- Modifying tables
- Relating tables
- Inserting data
- Updating data
- Retrieving sets of data
- Deleting sets of data
- Deleting tables
19SQL
- Not all implementations consistent
- WARNING
- MySQL CREATE TABLE statements ! PostgreSQL
CREATE TABLE statements
20Commercial RDBMS
- Oracle
- According to Forbes, Larry Ellison is the 9th
richest person in the US (18 billion) - DB2
- IBMs solution free for academics
- Microsoft SQL server
- For Windows
21Open Source RDBMS
- PostgreSQL
- http//www.postgresql.org/
- the worlds most advanced Open Source database
software - Began in 1986 at UC Berkeley
- For many years considered the most
sophisticated OS RDBMS - Performance?
- Comes with most Linux distros
- Small but loyal user community
22MySQL
- http//www.mysql.com/
- The world's most popular open source database
- gt 5,000,000 active installations
- Easy to use
- Very fast retrieval due to architecture
- Considered by many to be a toy database
- For years no row-level locking
- Did not handle transactions well
23MySQL
- Free
- As in free beer
- Dual license
- Commercial http//www.mysql.com/products/licensin
g/commercial-license.html - OpenSource http//www.mysql.com/products/licensin
g/opensource-license.html - As in free speech
- Fast
- Extremely fast reads for certain table types
- Outperforms any RDMBS for reads
- Functional
- Ease of use
- APIs in Perl, C, C, Java
- Client/server architecture
- Works well with Apache/PHP for very popular OS
dynamic web solution
24MySQL versions
- 3.23. (http//dev.mysql.com/doc/mysql/en/News-3.2
3.x.html) - Introduces row-level locking
- Introduces full-text indexing
- 4.0. (http//dev.mysql.com/doc/mysql/en/News-4.0.
x.html) - Transactions, foreign keys with InnoDB
- Improved Full-text indexing
- 4.1. (http//dev.mysql.com/doc/mysql/en/News-4.1.
x.html) - Subqueries
- 5.0. (http//dev.mysql.com/doc/mysql/en/News-5.0.
x.html) - Stored procedures
25MySQL examples in bioinformatics
- Free, fast and functional have made MySQL
pervasive in bioinformatics - Ensembl (http//www.ensembl.org)
- Automated eukaryotic annotation database
- Gene Ontology (http//www.geneontology.org)
- Controlled vocabulary for genes and functions
- UCSC Genome Browser (http//genome.ucsc.edu)
- Human and other genome browser
- BASE (http//base.thep.lu.se)
- BioArray Software Environment a web-based
database solution for microarrays
26Worked example a relational model for sequences
and features
- Create a relational model
- Tables to store
- data
- Sequence strings
- Meta-data
- Data about the data features and their
locations - Insert some records
- Query the data to pull out useful subsets
27Creating a Relational Database
- Start with a data set
- Divide data set into records
- The data
- Divide records into useful fields that describe
the particular record - The meta-data
- Create a model based on the useful fields
- Create a database from the model
- Insert the data into the database
- The data is now computable
28Example Genbank sequence record
26-APR-2004
2075 bp
ACCESSION L32174
gene lt1..1204
/gene"LAF1"
29Simple example a relational model for biological
sequences and features
30Remove the last page from your binder
31CREATE Sequence
- CREATE TABLE Sequence (
- sequence_id INT NOT NULL AUTO_INCREMENT,
- sequence LONGTEXT NOT NULL,
- defline TEXT,
- accession VARCHAR(255) NOT NULL,
- version INT DEFAULT 0,
- length INT DEFAULT 0,
- moltype INT NOT NULL,
- PRIMARY KEY(sequence_id)
- )
-
32CREATE Ontology
- CREATE TABLE Ontology (
- ontology_id INT NOT NULL AUTO_INCREMENT,
- term VARCHAR(255) NOT NULL,
- description TEXT NOT NULL,
- PRIMARY KEY (ontology_id)
- )
-
33CREATE Feature
- CREATE TABLE Feature (
- feature_id INT NOT NULL AUTO_INCREMENT,
- sequence_id INT NOT NULL,
- ontology_id INT NOT NULL,
- FOREIGN KEY (sequence_id) REFERENCES Sequence,
- FOREIGN KEY (ontology_id) REFERENCES Ontology,
- PRIMARY KEY(feature_id)
- )
-
-
34CREATE Location
- CREATE TABLE Location (
- location_id INT NOT NULL AUTO_INCREMENT,
- feature_id INT NOT NULL,
- start INT NOT NULL,
- stop INT NOT NULL,
- strand INT NOT NULL,
- FOREIGN KEY (feature_id) REFERENCES Feature,
- PRIMARY KEY(location_id)
- )
35CREATE Qualifier
- CREATE TABLE Qualifier (
- qualifier_id INT NOT NULL AUTO_INCREMENT,
- feature_id INT NOT NULL,
- ontology_id INT NOT NULL,
- value TEXT NOT NULL,
- FOREIGN KEY (feature_id) REFERENCES Feature,
- FOREIGN KEY (ontology_id) REFERENCES Ontology,
- PRIMARY KEY (qualifier_id)
- )
36INSERT an ontology
mysqlgt INSERT INTO Ontology (term, description)
VALUES -gt ('start codon', 'denotes an
Methionine codon of a transcript') Query OK, 1
row affected (0.00 sec) mysqlgt SELECT FROM
Ontology -------------------------------------
---------------------------------- ontology_id
term description
-----------------------------------
------------------------------------
3 start codon denotes an Methionine codon of
a transcript --------------------------------
--------------------------------------- 1 row in
set (0.01 sec)
37INSERT some more ontologies
mysqlgt INSERT INTO Ontology (term, description)
VALUES -gt ('exon', 'an exon in genomic
sequence') Query OK, 1 row affected (0.00
sec) mysqlgt INSERT INTO Ontology (term,
description) VALUES -gt ('exon type', '3\'UTR,
initial, internal, terminal, 5\'UTR') Query OK,
1 row affected (0.00 sec) mysqlgt SELECT FROM
Ontology -------------------------------------
---------------------------------- ontology_id
term description
-----------------------------------
------------------------------------
3 start codon denotes an Methionine codon of
a transcript 4 exon an
exon in genomic sequence
5 exon type 3'UTR, initial, internal,
terminal, 5'UTR ---------------------------
-------------------------------------------- 3
rows in set (0.00 sec)
38INSERT a sequence
mysqlgt DESC Sequence --------------------------
-------------------------------------
Field Type Null Key Default
Extra ----------------------------
-----------------------------------
sequence_id int(11) PRI NULL
auto_increment sequence longtext
defline
text YES NULL
accession varchar(255)
version
int(11) YES 0
length int(11) YES
0 moltype
int(11) 0
----------------------------------------
----------------------- 7 rows in set (0.00
sec) mysqlgt INSERT INTO Sequence (sequence,
defline, accession, version, length, moltype)
-gt VALUES ('ATGACGATCAGCATCAGCTACAGCTG', 'gt
seq1', 'seq1', 1, 26, 1) Query OK, 1 row
affected (0.00 sec)
39INSERT a Feature on a sequence
mysqlgt SELECT FROM Sequence -----------------
-----------------------------------------------
----------------------- sequence_id
sequence defline accession
version length moltype -----------------
-----------------------------------------------
----------------------- 2
ATGACGATCAGCATCAGCTACAGCTG gt seq1 seq1
1 26 1 -----------------
-----------------------------------------------
----------------------- 1 row in set (0.03
sec) mysqlgt SELECT FROM Ontology ------------
------------------------------------------------
----------- ontology_id term
description
---------------------------------------------
-------------------------- 3 start
codon denotes an Methionine codon of a
transcript 4 exon an
exon in genomic sequence
5 exon type 3'UTR, initial, internal,
terminal, 5'UTR ---------------------------
-------------------------------------------- 3
rows in set (0.00 sec) mysqlgt INSERT INTO
Feature (sequence_id, ontology_id) -gt VALUES
(2, 3) Query OK, 1 row affected (0.00 sec)
40INSERT a Location
mysqlgt SELECT From Feature ------------------
-------------------- feature_id sequence_id
ontology_id -------------------------------
------- 1 2 3
-------------------------------------- 1
row in set (0.01 sec) mysqlgt DESC
Location -----------------------------------
----------------------- Field Type
Null Key Default Extra
------------------------------------------
---------------- location_id int(11)
PRI NULL auto_increment feature_id
int(11) 0
start int(11) 0
stop int(11)
0 strand
int(11) 0
------------------------------------------
---------------- 5 rows in set (0.00
sec) mysqlgt INSERT INTO Location (feature_id,
start, stop, strand) -gt VALUES(1,1,3,1) Query
OK, 1 row affected (0.02 sec)
41Queries using SELECT
mysqlgt SELECT FROM Sequence -----------------
-----------------------------------------------
----------------------- sequence_id
sequence defline accession
version length moltype -----------------
-----------------------------------------------
----------------------- 2
ATGACGATCAGCATCAGCTACAGCTG gt seq1 seq1
1 26 1 3
SLKLSKLPSPLYQVCLE gt seq2 L32174
1 17 3 -----------------
-----------------------------------------------
----------------------- 2 rows in set (0.00
sec) mysqlgt SELECT sequence FROM Sequence WHERE
accession 'seq1' ----------------------------
sequence -----------------
----------- ATGACGATCAGCATCAGCTACAGCTG
---------------------------- 1 row in set
(0.12 sec) mysqlgt SELECT length FROM Sequence
WHERE sequence_id 3 -------- length
-------- 17 -------- 1 row in set
(0.03 sec)
42Joining tables
mysqlgt SELECT FROM Feature ------------------
-------------------- feature_id sequence_id
ontology_id -------------------------------
------- 1 2 3
2 2 4
3 2 4
-------------------------------------- 3
rows in set (0.04 sec)
43Setting up a complex query
- Consider sequence seq1 with the following
features - Initial exon from 1..6
- Internal exon from 15..20
- Note that with relational model the term exon
only appears once in the database
44Complex query
mysqlgt SELECT FROM Sequence WHERE sequence_id
2 --------------------------------------------
-------------------------------------------
sequence_id sequence
defline accession version length moltype
---------------------------------------------
------------------------------------------
2 ATGACGATCAGCATCAGCTACAGCTG gt seq1
seq1 1 26 1
---------------------------------------------
------------------------------------------ 1
row in set (0.04 sec) mysqlgt SELECT FROM
Feature WHERE sequence_id 2 -----------------
--------------------- feature_id
sequence_id ontology_id --------------------
------------------ 1 2
3 2 2
4 3 2 4
-------------------------------------- 3
rows in set (0.03 sec) mysqlgt SELECT FROM
Location ------------------------------------
---------- location_id feature_id start
stop strand -------------------------------
--------------- 1 1
1 3 1 2 2
1 6 1 3
3 15 20 1 -------------------
--------------------------- 3 rows in set
(0.20 sec) mysqlgt SELECT FROM
Ontology -------------------------------------
---------------------------------- ontology_id
term description
-----------------------------------
------------------------------------
3 start codon denotes an Methionine codon of
a transcript 4 exon an
exon in genomic sequence
5 exon type 3'UTR, initial, internal,
terminal, 5'UTR ---------------------------
-------------------------------------------- 3
rows in set (0.00 sec) mysqlgt SELECT FROM
Qualifier ------------------------------------
------------- qualifier_id feature_id
ontology_id value -----------------------
-------------------------- 1
2 5 initial 2
3 5 internal
--------------------------------------------
----- 2 rows in set (0.03 sec)
45Complex query
Return me the sub-sequences and coordinates of
the exon features of seq1
mysqlgt SELECT SUBSTRING(sequence, start, stop),
start, stop, term -gt FROM Sequence, Ontology,
Feature, Location -gt WHERE accession 'seq1'
AND term 'exon' AND -gt Feature.ontology_id
Ontology.ontology_id AND -gt
Feature.sequence_id Sequence.sequence_id AND
-gt Feature.feature_id Location.feature_id ---
-----------------------------------------------
--- SUBSTRING(sequence, start, stop) start
stop term ----------------------------------
------------------- ATGACG
1 6 exon CAGCTACAGCTG
15 20 exon
---------------------------------------------
-------- 2 rows in set (0.04 sec)
46Even more complex
Return me the sub-sequences, coordinates feature
name and qualifier value of the internal exon
features of seq1
mysqlgt SELECT SUBSTRING(sequence, start, stop),
start, stop, o1.term, value -gt FROM Sequence,
Feature, Ontology o1, Ontology o2, Location,
Qualifier -gt WHERE accession 'seq1' AND
-gt o1.term 'exon' AND -gt o2.term 'exon
type' AND -gt value 'internal' AND -gt
Feature.ontology_id o1.ontology_id AND -gt
Qualifier.ontology_id o2.ontology_id AND -gt
Qualifier.feature_id Feature.feature_id AND
-gt Feature.sequence_id Sequence.sequence_id
AND -gt Location.feature_id
Feature.feature_id -----------------------------
----------------------------------
SUBSTRING(sequence, start, stop) start stop
term value -------------------------------
--------------------------------
CAGCTACAGCTG 15 20
exon internal -------------------------------
-------------------------------- 1 row in
set (0.05 sec)
47Aggregate queries
mysqlgt SELECT FROM Sequence -----------------
-----------------------------------------------
----------------------- sequence_id
sequence defline accession
version length moltype -----------------
-----------------------------------------------
----------------------- 2
ATGACGATCAGCATCAGCTACAGCTG gt seq1 seq1
1 26 1 3
SLKLSKLPSPLYQVCLE gt seq2 L32174
1 17 3 4
MASQQQCGAR gt seq seq3
1 10 3 -----------------
-----------------------------------------------
----------------------- mysqlgt SELECT
count(), moltype from Sequence GROUP BY
moltype ------------------- count()
moltype ------------------- 1
1 2 3 -------------------
2 rows in set (0.08 sec)
48Using LIMIT
mysqlgt SELECT FROM Sequence LIMIT
2 --------------------------------------------
-------------------------------------------
sequence_id sequence
defline accession version length moltype
---------------------------------------------
------------------------------------------
2 ATGACGATCAGCATCAGCTACAGCTG gt seq1
seq1 1 26 1
3 SLKLSKLPSPLYQVCLE gt seq2
L32174 1 17 3
---------------------------------------------
------------------------------------------ 2
rows in set (0.08 sec)
49UPDATING a table
mysqlgt SELECT FROM Qualifier ----------------
---------------------------------
qualifier_id feature_id ontology_id value
-------------------------------------------
------ 1 2 5
initial 2 3
5 internal -----------------------------
-------------------- 2 rows in set (0.00
sec) mysqlgt UPDATE Qualifier SET value
'terminal' -gt WHERE qualifier_id 2 Query
OK, 1 row affected (0.00 sec) Rows matched 1
Changed 1 Warnings 0
50DELETING from a table
mysqlgt DELETE FROM Qualifier -gt WHERE
qualifier_id 2 Query OK, 1 row affected (0.04
sec) mysqlgt SELECT FROM Qualifier -----------
-------------------------------------
qualifier_id feature_id ontology_id value
--------------------------------------------
---- 1 2 5
initial -------------------------------------
----------- 1 row in set (0.03 sec)
51- mysqlgt INSERT INTO Location (feature_id, start,
stop, strand) - -gt VALUES (2, 1, 6, 1)
- Query OK, 1 row affected (0.00 sec)
- mysqlgt INSERT INTO Feature (sequence_id,
ontology_id) - -gt VALUES (2,4)
- Query OK, 1 row affected (0.00 sec)
- mysqlgt SELECT From Feature
- --------------------------------------
- feature_id sequence_id ontology_id
- --------------------------------------
- 1 2 3
- 2 2 4
- 3 2 4
- --------------------------------------
- 3 rows in set (0.00 sec)
52Optimisation
- Perking up MySQL
- Queries
- Database server
53Indexing
- In general, indexing your data makes retrieval
orders of magnitude faster - Consider a list of 1000000 sequences with
accession numbers - You need to find the one sequence with accession
number AC123456 - Response time requires O(1000000) operations if
the accession field is not indexed - Equivalent to scanning through a list
- Response time requires O(log(1000000)) O(6)
operations if the accession field is indexed - Somewhat like a hashtable lookup
54Types of indexes
- PRIMARY KEY
- To identify the main accessor field of the table
- UNIQUE
- Constraint to ensure that all entries in a field
are different - INDEX
- Creates a way to quickly search on a given field
- FULLTEXT
- For large TEXT fields gt 255 characters
- Compound indexes (column1, column2, )
- NOTE index is synonymous with KEY
55Drawbacks to indexing
- Need more disk space
- Can slow down inserts
- Know your data and the queries you will perform
on the data - Only index fields you think you will query on
- Requires spending time in the design phase to
define requirements of the database
56Creating an index
mysqlgt CREATE INDEX acindex ON Sequence
(accession) Query OK, 1 row affected (0.18
sec) Records 1 Duplicates 0 Warnings 0
57Tuning the database
- gt mysqladmin variables
- gt mysqld --help
58Variables (--variable-namevalue) and boolean
options FALSETRUE Value (after reading
options) ---------------------------------
----------------------------- basedir
/raid/db/mysql/mysql-max-4.0.14-pc-l
inux-i686/ bdb-home (No
default value) bdb-logdir
(No default value) bdb-tmpdir
(No default value) bind-address
(No default value) console
FALSE chroot
(No default value) character-sets-dir
/raid/db/mysql/mysql-max-4.0.14-pc-linux-i686/s
hare/mysql/charsets/ datadir
/raid/db/mysql/mysql-max-4.0.14-pc-linux-i686
/data/ default-character-set
latin1 enable-locking
FALSE enable-pstack FALSE gdb
FALSE innodb_data_ho
me_dir (No default
value) innodb_log_group_home_dir (No
default value) innodb_log_arch_dir
(No default value) innodb_flush_log_at_trx_commit
1 innodb_flush_method (No
default value) innodb_fast_shutdown
TRUE innodb_max_dirty_pages_pct
90 init-file (No default
value) log (No
default value) language
/raid/db/mysql/mysql-max-4.0.14-pc-linux-i686/shar
e/mysql/english/ local-infile
TRUE log-bin (No
default value) log-bin-index
(No default value) log-isam
myisam.log log-update
(No default value) log-slow-queries
(No default value) log-slave-updates
FALSE low-priority-updates
FALSE master-host (No
default value) master-user
test master-port 3306
59master-connect-retry
60 master-retry-count
86400 master-info-file
master.info master-ssl
FALSE master-ssl-key (No
default value) master-ssl-cert
(No default value) master-ssl-capath
(No default value) master-ssl-cipher
(No default value) myisam-recover
OFF memlock
FALSE disconnect-slave-event-count
0 abort-slave-event-count
0 max-binlog-dump-events
0 sporadic-binlog-dump-fail FALSE new
FALSE old-protocol
10 old-rpl-compat
FALSE pid-file
/raid/db/mysql/mysql-max-4.0.14-pc-linux-i686/data
/watson.pid log-error port
3306 report-host (No
default value) report-user
(No default value) report-password
(No default value) report-port
3306 rpl-recovery-rank
0 relay-log (No default
value) relay-log-index (No
default value) safe-user-create
FALSE server-id
1 show-slave-auth-info
FALSE concurrent-insert
TRUE skip-grant-tables
FALSE skip-slave-start
FALSE relay-log-info-file
relay-log.info slave-load-tmpdir
/raid/tmp/ socket
/tmp/mysql.sock sql-bin-update-same
FALSE sql-mode
OFF temp-pool TRUE tmpdir
/raid/tmp
60external-locking
FALSE use-symbolic-links
TRUE symbolic-links
TRUE log-warnings
FALSE warnings
FALSE back_log
50 bdb_cache_size
8388600 bdb_log_buffer_size
0 bdb_max_lock
10000 bdb_lock_max
10000 binlog_cache_size
32768 connect_timeout
5 delayed_insert_timeout
300 delayed_insert_limit
100 delayed_queue_size
1000 flush_time
0 ft_min_word_len
4 ft_max_word_len
254 ft_max_word_len_for_sort
20 ft_stopword_file (No default
value) innodb_mirrored_log_groups
1 innodb_log_files_in_group
2 innodb_log_file_size
5242880 innodb_log_buffer_size
1048576 innodb_buffer_pool_size
8388608 innodb_additional_mem_pool_size
1048576 innodb_file_io_threads
4 innodb_lock_wait_timeout
50 innodb_thread_concurrency
8 innodb_force_recovery
0 interactive_timeout
28800 join_buffer_size
131072 key_buffer_size
402653184 long_query_time
10 lower_case_table_names
FALSE max_allowed_packet
1047552 max_binlog_cache_size
4294967295 max_binlog_size
1073741824 max_connections
100 max_connect_errors
10 max_delayed_threads
20 max_heap_table_size 16777216
61max_join_size
18446744073709551615 max_relay_log_size
0 max_seeks_for_key
4294967295 max_sort_length
1024 max_tmp_tables
32 max_user_connections
0 max_write_lock_count
4294967295 bulk_insert_buffer_size
8388608 myisam_block_size
1024 myisam_max_extra_sort_file_size
268435456 myisam_max_sort_file_size
2147483647 myisam_repair_threads
1 myisam_sort_buffer_size
67108864 net_buffer_length
16384 net_retry_count
10 net_read_timeout
30 net_write_timeout
60 open_files_limit
0 query_cache_limit
1048576 query_cache_size
33554432 query_cache_type
1 read_buffer_size
2093056 read_rnd_buffer_size
262144 record_buffer
2093056 relay_log_space_limit
0 slave_compressed_protocol
FALSE slave_net_timeout
3600 read-only
FALSE slow_launch_time
2 sort_buffer_size
2097144 table_cache
512 thread_concurrency
8 thread_cache_size
8 tmp_table_size
33554432 thread_stack
196608 wait_timeout
28800 default-week-format 0 To see
what values a running MySQL server is using,
type 'mysqladmin variables' instead of 'mysqld
--help'.
62Tuning the system to your needs
- Need to think about uses of the database
- How many concurrent connections?
- Will there be large records?
- Will there be repetitive queries?
- Will I need large indexes?
- Tuning the system can give huge gains in
performance lets you get the most out of the
system
63Important parameters
- max_allowed_packet
- Largest amount of data to be transmitted to the
client in 1 packet - max_connections
- The largest number of concurrent connections to
the database server - datadir
- The location of the data files on the system
- query_cache
- Size of cache for repetitive queries
- Many, many others..
64DBA
65COMMUNICATING WITH MySQL
66Communicating with MySQL
- Through a GUI
- MySQL ControlCentre
- http//www.mysql.com/products/mysqlcc/
- Standalone application supported by MySQL
- Through the web
- PhpMyAdmin
- http//www.phpmyadmin.net/home_page/
- Works with Apache web server
- Through the Unix command line
- MySQL client
- Comes with MySQL
- Through APIs (Application Programming Interface)
- MySQL C API
- Perl DBI
- MySQL (C)
- http//dev.mysql.com/downloads/other/plusplus/
- JDBC (Java Database Connectivity)
- Java protocol and API for RDBMS communication
67Communicating with MySQL
- Choose the method that is right for the job
- Administration
- MySQL CC
- PHP MyAdmin
- Standalone Application
- APIs
- Web Application
- PHP/Java servlets
- Low throughput queries
- Command line client
68Working with JDBC
- JDBC is a standard API that provides
database-independent connectivity to allow a Java
application to interact with a database
http//java.sun.com/products/jdbc/overview.html
69Connector/J
- JDBC implementation for MySQL is Connector/J
- http//www.mysql.com/products/connector/j/
- Installation
- export CLASSPATH/path/to/mysql-connector-java-
version-bin.jarCLASSPATH
70Connector/J Steps
- Establish a connection
- Prepare one or more queries
- Execute one or more queries
- Process the results (if applicable)
- Destroy connection
71Connector/J Establishing a connection
- // we need the following 6 variables to make a
jdbc connection - String DBSERVERNAME mysql
- String JDBCDRIVERNAME com.mysql.jdbc.Driver
- String host my.database.com
- String databaseName sequence
- String user me
- String password mypwd
- //load the driver into memory
- Class.forName(JDBCDRIVERNAME).newInstance()
-
- // create the connection URL
- String connectionURL "jdbc DBSERVERNAME
//" host "/" database "?" "user
user "password" password -
- // get the connection from the driver manager
- Connection connection
DriverManager.getConnection(connectionURL)
72Preparing and executing a query
// object required to execute the query
Statement statement null // object to store
results of the query ResultSet resultSet
null // create the query string String query
"SELECT sequence_id FROM Sequence" //
initialise the statement statement
connection.createStatement() // execute the
query the results are returned in resultSet
resultSet statement.executeQuery(query)
73Process the results close connection
// iterate through the rows returned by the
query while (resultSet.next()) int
sequenceId resultSet.getInt("sample_id")
// do something with the sequenceId //
destroy the connection if its no longer needed
connection.close()
74Topics not covered
- MySQL tools
- mysqldump
- Tool to dump a schema, all the data and/or both
- mysqlimport
- Tool to import delimited files
- Look before you parse!
- mysqladmin
- For DBAs to create database, change passwords,
etc - Read the mysql documentation
75Topics not covered
- Setting connection parameters in JDBC
- Consult Connector/J docs
- Database design
- Extremely important process
- Many courses at univ/college
76Summary
- Relational databases are necessary in
bioinformatics - Relational databases allow us to efficiently
store and query large amounts of data - MySQL is a good choice for RDBMS engine because
it is highly functional at no cost - JDBC provides a way to access MySQL from within a
Java program
77Resources
- MySQL
- http//www.mysql.com
- http//dev.mysql.com.mysql/en/index.html
- http//www.mysql.com/products/mysqlcc/
- http//dev.mysql.com/doc/connector/j/en
- NAR Database Issue 2004
- http//nar.oupjournals.org/content/vol32/suppl_1
- JDBC
- http//java.sun.com/products/jdbc/
- Me
- sohrab_at_bioinformatics.ubc.ca