Title: Database Structure
1Database Structure
- Or, database lite - a very brief and simple
introduction to the system behind the tool
2In the beginning
- There was hardcopy. Writers write, and their
works get published in magazines, journals,
newspapers, conference proceedings - Months or years afterwards, other writers and
researchers would like to know what was written
on a topic. Wouldnt it be great if there was a
way to find everything that had been published on
a topic without having to page through every
likely journal?
3Voila -
- The hardcopy Index is born
- An alphabetical list of words, representing
subjects - Under each word a list of articles deemed to be
about that subject. - This is all done, laboriously, by humans.
- Given that, how many subjects do you think an
article will be listed under?
4Aside
- If they thought their customers would pay the
higher price, some Index vendors added Abstracts. - Thus, Abstracting and Indexing services or
AI. You see this terminology in the literature
sometimes. - The abstracts were all laboriously written by
humans. Skilled humans. - Skilled humans are very expensive. And slow.
- Paper and publishing are expensive too.
- How many times do you think an entry for an
article would be duplicated (appear under
multiple subjects) in this situation?
5Ok, ok, you get the picture
- Just so you know that the electronic situation we
have today is all grounded in a physical reality.
Once it was all people and paper. - Now its very electronic, but there are still a
lot of people involved.
6Enter the computer
- Even before word processors, when articles were
still being produced on typewriters - - What if you could put information about those
articles into a machine readable file - a
database - something you could search? - Think of the advantages!
7Database advantages
- No longer bound by the confines of paper, space,
expensive skilled personnel - Articles could be found under any of the terms
associated with them, not just one or two - Even if material has to be re-keyed - typed into
the database - this doesnt require subject
specialists, just typists (cheap labor) - Turn around time is faster - your labor force
isnt thinking and composing, just typing and if
you choose to just offer your index online, you
dont wait for physical publishing to take place.
8As time goes on
- The process of from article to index gets even
faster - when articles are created electronically (e.g.
word processing), no re-keying is needed, just
software to convert/rearrange the material to fit
your database fields - If you want to enhance the records at all,
though, youre back to dealing with humans
9Note
- The electronic database is not necessarily
Cheaper to create. The costs are shifted. Maybe
it actually costs more! But customers will buy it
because - Its so much more powerful and efficient.
10End of historical background.
- Now we really will get on to database structure.
11Databases - fields and records
- Think about drivers licenses. They all have the
owners name, address, date of birth, bad photo
and you can bet the DMVs database has fields
just like that Name, Addr, DOB, BadPic, etc. - The database designer decides what fields are
needed.
12Fields and records cont.
- For an article database, youd probably have a
field for the article title, the name of the
journal it appeared in what else? - One full set of fields will make up a record.
Every record in the database will have the same
set of fields (even if, in some records, some
fields are blank).
13Define your fields
which make up records
Sleep deprived rats are bigger biters
001
Science News
and form the basis of your database.
14Beyond fields and records
- Fields and records are the basis. What makes the
database fast, powerful, and efficient are the
indexes of the fields. - An index is a list of all the words in a field,
with some kind of identifier pointing back to the
record each word came from. - An index list can be alphabetized, and in various
ways optimized for searching. This list is
sometimes referred to as an inverted file.
15A very, very simplistic example
- Say we had 3 articles
- Milky Ways Last Major Merger. Science News. v.
162 24 p. 376 - Its a Dogs Life.The Economist. December 21,
2002. p. 61 - Manhattan Mayhem.Smithsonian. v. 33 9 p. 44
16Lets enhance these just a little
- Well add a one-line description to each record
(so we have a few more words to search on) - Record 1
- Milky Ways Last Major Merger. Science News. v.
162 24 p. 376 New clues about galaxy formation
indicate early collision affected Milky Ways
shape.
17Record 2
- Its a Dogs Life.The Economist. December 21,
2002. p. 61 From hard labour to a beauty contest,
a history of the work and whims of dog breeding.
18Record 3
- Manhattan Mayhem.Smithsonian. v. 33 9 p.
44Martin Scorseses realistic portrayal of
pre-Civil War strife - Gangs of New York -
re-creates the brutal street warfare waged
between immigrant groups.
19My database has 4 fields
- Record number
- Article title
- Journal name
- Abstract
50 characters
50 characters
200 characters
20Milky Ways Last Major Merger
New clues about galaxy formation indicate early
collision affected Milky Ways shape.
0001
Science News
From hard labour to a beauty contest, a history
of the work and whims of dog breeding.
The Economist
Its a Dogs Life
0002
Martin Scorseses realistic portrayal of
pre-Civil War strife - Gangs of New York -
re-creates the brutal street warfare waged
between immigrant groups.
Manhattan Mayhem
0003
Smithsonian
21Indexing the fields
- The initial list of words from the Article Title
field looks like - Milky
- Ways
- Last
- Major
- Merger
- Its
- a
- Dogs
- Life
- Manhattan
- Mayhem
- Hmm what is funky here? Will our indexing
program handle those possessives and contractions
( s )? Do we want to clog it up with little
words like a ?
22After some decisions
- Now we can sort!
- Dogs
- Its
- Last
- Life
- Major
- Manhattan
- Mayhem
- Merger
- Milky
- Ways
- Uh oh. How will we know which record a word came
from??
23Lets start over
- Lets make sure our index list includes the
record number, and which field the word came
from - 0001 Milky TI
- 0001 Ways TI
- 0001 Last TI
- 0001 Major TI
- 0001 Merger TI
- 0002 Its TI
- 0002 Dogs TI
- 0002 Life TI
- 0003 Manhattan TI
- 0003 Mayhem TI
24And one more thing
- We can include a number representing the order of
the word within the field (why?) - So we get something like this
25Now well sort again
26Our index of the Journal Name field
27After indexing each field, combine the indexes so
you only have one list to search
28And the abstracts
- Decisions and clean-up on this list of words
- Stop words
- Hyphenated words
- Proper names - double post to include the
phrase too? - Alternate spellings - do we do anything about
them or no?
29(No Transcript)
30Luckily
- Software does almost all of this work for us.
- And youll probably never see any indexes in
their raw state - what weve been going over here
is very under the hood, often even proprietary
material for the database vendors. - But we can see a sort of users view
31Examples
- EBSCO MasterFiles Subjects list
- FirstSearch indexes - single word and phrase
- Reminder indexes come from record structure
- example of elaborate record structure ERIC on
FirstSearch
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39Assignment
- What would your ideal database record look like?