Title: The Certainty of Citations
1The Certainty of Citations
- A proposal for an objective method of measuring
certainty
2Genealogy Background
Notice the light at the top of the picture.
3The FM Bobo Story
Grandmother
Grandfather of grandmother
4(No Transcript)
5(No Transcript)
61860 Census
7(No Transcript)
8(No Transcript)
91870 Census
10(No Transcript)
11(No Transcript)
12Marriage RecordCarroll County Arkansas Marriage
Records Eastern District Grooms Index 1869-1930
Book/Page Groom Age Bride Age Date
A 63 BOBO FRANCES M. 19 LITTRELL MATILDA 16 6/02/1872
Note 3 year gap in age.
http//www.rootsweb.com/arcchs/MARB.html
131880 Census
14Remember Jarrett for later
15(No Transcript)
161920 Census
17(No Transcript)
18Jarretts Funeral Book
19Record Summary
Record Date Record Type Birth Reported Age Reported Implied Birth Death Rept Cen Age Date
8/23/1860 CEN 8 1852 1-Jun
7/14/1870 CEN 18 1852 1-Jun
6/2/1872 MAR 19 1853
6/17/1880 CEN 25 1855 1-Jun
1/22/1920 CEN 71 1849 1-Jan
2/12/1951 FUN 11/17/1932
1/1/1955 GRAV 10/1/1845 1845 11/10/1931
20Lets talk about that
Note person partially in picture.
21The Information Flow Diagram
- Event an association of an action, place, time,
and person(s)
EVENT
Dick Eastman at GENTECH2, January 1994
22The Information Flow Diagram
- Reporter a person who creates a record about an
event. - We can measure confidence or bias.
EVENT
John Wylie, president of GENTECH for 5 years
REPORTER
23The Information Flow Diagram
- Record a report about an event, which may not
be complete or accurate - Measure granularity.
EVENT
RECORD
REPORTER
24Whats Granularity?
Small Medium Large
NAME James Powell Sharbrough J Sharbrough Sharbrough
DATE June 2, 1872 June, 1872 1872
PLACE 123 Elm St Harris County Texas
25Granularity Examples
Case 1 Case 2 Case 3
Name FM Bobo - 2 Francis M Bobo 3 Bobo -1
Date 1953 -1 June 1853 2 2 Jun 1872 - 3
Place 153 Elm St, Tulsa, OK - 3 Carroll Co, Ark 2 Ark -1
6 7 5
26The Information Flow Diagram
ER Gap
- Reviewer a person who reviews records and draws
conclusions. - Evaluate ER Gap, evaluate Reporter.
EVENT
RECORD
REPORTER
REVIEWER
Tony Burroughs, NGS 2001, Portland OR
27The Information Flow Diagram
- Conclusion a statement by a reviewer about a
collection of records related to an event - Report a collection of conclusions.
EVENT
RECORD
REPORT
REPORTER
REVIEWER
28ER Gap
Far
All Records about my family0
Secondary Record1
Secondary Record1
Primary Record2
Near
Far
Near
29Features of EVIDENCE The Record
- Granularity
- Mind the Gap - ER Gap
- Reporter
30CONCLUSION Rate It
- 1 - Believe
- 2 - Know
- 3 - Can Prove
- 0 No claim
- Negative numbers -1, -2, -3
31TRUST The Report
32(No Transcript)
33(No Transcript)
34(No Transcript)
35So many formulas
- so few examples.
- Record granularity measurement 3 to 9
- ER Gap 0, 1, or 2
- Reviewer evaluation of reporter -1 to 10
- Reviewer confidence - -3 to 3
- Trust number, positive feedback ratio
- Granularity / 5 ER Gap Report Eval / 5
Reviewer Confidence Trust ratio / 0.5
36The Death Certificate
Demographic Info
Medical Info
37Its What-if Time
What if we could make the future however we like?
38Mechanical Certainty
- Finding Needles in Really Big Haystacks
39Record Linking
- Building Indices
- Finding larger patterns
40- Where
- x indicates the identifier and its value on the
record from the file initiating the search
(record A) - y indicates the identifier and its value on the
record from the file being searched (record B) - LINKED pairs may refer either to all linked
pairs, or to a defined subset of these and - UNLINKABLE pairs may refer either to all
unlinkable pairs, or to a defined subset,
provided the linked and the unlinkable sets (or
subsets) are otherwise strictly comparable with
each other.
41Examples
- FIRST INITIALS
- AGREEMENT
- DISAGREEMENT
- LETTER Q
- YEAR OF BIRTH
- SIMILARITY (difference 1 year)
- DISSIMILARITY (difference 11 years)
- GIVEN NAMES
- SIMILARITY (first 3 letters agree, none disagree
eg Sam vs Samuel) - SIMILARITY DISSIMILARITY (first 3 letters
agree, 4th disagrees eg Samuel vs Sampson) - DIFFERENT BUT LOGICALLY RELATED IDENTIFIERS
- PLACE of WORK vs PLACE of DEATH (Provo vs Salt
Lake City)
42Some more examples
43Discrimination
- A lookup table containing the frequencies of
values for identifiers, as they appear in the
file being searched. - SURNAMES Brown (0.39), Aube (0.014), and Skuda
(0.00004). - FIRST NAMES John(5.30), Axel (0.020), and Ulder
(0.0045).
44Competing Hypotheses
Record Date Record Type Birth Reptd Age Reptd Implied Birth Death Rept Cen Age Date Rate
8/23/1860 CEN Â 8 1852 Â 1-Jun 60
7/14/1870 CEN Â 18 1852 Â 1-Jun 60
6/2/1872 MAR Â 19 1853 Â Â 40
1/22/1920 CEN Â 71 1849 Â 1-Jan 40
6/17/1880 CEN Â 25 1855 Â 1-Jun 25
2/12/1951 FUN Â Â Â 11/17/1932 Â 10
1/1/1955 GRAV 10/1/1845 Â 1845 11/10/1931 Â 5
45The Digital Research Assistant
- Search for records on internet
- Evaluate their relevance to assignment
- Evaluate their granularity, confidence, etc
- Evaluate patterns, such as families
- Report matches
- Let me set the knobs for the parameters
46The DRA will have ...
- A heirarchy of useful comparison algorithms
- A method of searching across the Internet - and
paying for it - A method of documenting the source of that search
that satisfies the rules of preserving
intellectual property and academic research
47Who knows what the formula will be?
- We are asking which dragons must be slain, but we
arent saying how it must happen. - We are talking about possible ways to accomplish
our goal. - That goal is connecting to new information, with
confidence.
48Summary
- Any type of review
- Measurements of Records
- Measurement of conclusions
- Rating of publishers
- Mechanical searches
- Record Linking
- Smart Searches
- Groupwork and Rights
49Never forget to have fun