The Dark Side of Data Mining - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

The Dark Side of Data Mining

Description:

Banks: CIBC, RBC, TD... Credit card companies. MSN and Gmail. The ... Elizabeth Taylor's marriages and the stock market. Ethical Considerations. Fishing ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 15
Provided by: ACA99
Category:
Tags: banks | dark | data | elizabeth | mining | side

less

Transcript and Presenter's Notes

Title: The Dark Side of Data Mining


1
The Dark Side of Data Mining
  • Hannah McLaren
  • 100044892

2
Agenda
  • What is data mining
  • A quick overview of data mining security
  • Security breaches
  • Ethical issues
  • Protecting the integrity of data mining
  • Questions

3
Data Mining
  • Data mining is applying deductive models to data
    in order to discover patterns
  • Whos doing it? Everyone
  • AirMiles, Aeroplan
  • Banks CIBC, RBC, TD
  • Credit card companies
  • MSN and Gmail
  • The government
  • Anyone who markets

4
(No Transcript)
5
Data Mining Security 2 Faces
  • The security of the actual data being mined
  • Vulnerable to being stolen
  • Intentionally incorrect data can also be added
  • Unethical companies try to sell their databases
    without consent
  • The ethical integrity of the knowledge mined from
    the data
  • Can cause bias
  • Deliberately misleading inferences

6
Data Mining and E-Commerce
  • Earlier presentations have already covered why
    data mining is useful for e-commerce
  • Data mining security is a major factor
  • To generate accurate models of our customers, we
    have to be confident that the information is
    accurate and valid
  • Companies known to lack security or ethics tend
    not to succeed

7
Security Breaches
  • How do unwanted miners gain access?
  • Guessing easy login/password combinations
  • Or, sometimes theres no password protection
  • ie Aeroplan
  • Brute force attack over the internet
  • Access is given as part of a business deal
  • the most common method
  • Physically stealing the machines
  • The information stolen is generally used to
    create fake identities
  • Sometimes information is not stolen, but modified

8
Ethical Considerations
  • Snooping
  • Mining the data with a specific goal in mind
  • Searching through the data for a particular
    person/organization
  • Dredging
  • Imposing patterns on data where none exist
  • Or, finding relationships and creating
    interesting explanations
  • Elizabeth Taylors marriages and the stock market

9
Ethical Considerations
  • Fishing
  • Searching for that one key attribute that affects
    most relations
  • Not necessarily a valid attribute, could just be
    a fluke
  • Reselling
  • Offering a database of customer information for
    sale, without customer knowledge or consent

10
Getting a Security Blanket
  • Different strategies are being put in place to
    protect data from being stolen or unethically
    used
  • Governments are creating stricter standards on
    data collection and mining practices
  • Well, some governments.
  • Canada has PIPEDA

11
Tech Solutions
  • A number of companies are in the business of
    keeping data secure
  • RSA offers encryption solutions
  • Also a new 2 factor authentication system that
    makes it almost impossible to access an account
    using just a username/pin
  • Cisco offers its Adaptive Threat Defence program
    for networks
  • Security software relies on defined rules of how
    a legitimate application acts
  • ie it knows that a web-based server
    application should not be generating email traffic

12
Tech Solutions
  • Mining data ethically is a different problem
  • Inductive modelling tools like Cognos Scenario
    are not built to produce amoral models
  • Human error, either unintentional or deliberate
    is the cause
  • All involved parties need to have knowledge of
    what question is being answered
  • Critical analysis of the results
  • 95 confidence that the wins/losses in the NFC
    will predict gains/losses on the NSYE does this
    even make sense?
  • Using multiple test sets repeatedly to check the
    validity of results

13
In the news
  • Microsoft has partnered with the RCMP to create
    CETS (Child Exploitation Tracking System)
  • What they needed was a tracking system that
    would not repeat information but link and connect
    criminal behavior online that is difficult for
    the human eye to see.
  • John Hancock, senior consultant with Microsoft
    Canada
  • CETS will allow police agencies across the
    country to
  • Manage and analyze huge volumes of data
  • Cross-reference obscure data relationships
  • Use social-network analysis to discover
    communities of child predators
  • Can they keep this database secure? Ethical?

14
For more information
  • RSA http//www.rsasecurity.com/
  • Cisco Systems http//www.cisco.com/
  • Tool Thwarts Online Child Predators from the
    Microsoft Press sitehttp//channels.microsoft.com
    /presspass/features/2005/apr05/04-07CETS.asp
  • Jensen, D. (2000). Data snooping, dredging and
    fishing The dark side of data mining. SIGKDD
    Explorations. 1-2, 52-54.
  • Kantardzic, M. (2003). Data mining Concepts,
    models, methods and algorithms. Hoboken, NJ
    Wiley-Interscience, IEEE Press.
  • Berry, M. J. A. (1997). Data mining techniques
    for marketing, sales, and customer support. New
    York Wiley.
  • Chen, Z. (2002). Intelligent data warehousing
    From data preparation to data mining. Boca Raton,
    FL CRC Press.
Write a Comment
User Comments (0)
About PowerShow.com