Title: Confidentiality, Privacy and Trust 1
1Confidentiality, Privacy and Trust - 1
- Prof. Bhavani Thuraisingham
- The University of Texas at Dallas
- October 6, 2008
2CPT Confidentiality, Privacy and Trust
- Before I as a user of Organization A send data
about me to organization B, I read the privacy
policies enforced by organization B - If I agree to the privacy policies of
organization B, then I will send data about me to
organization B - If I do not agree with the policies of
organization B, then I can negotiate with
organization B - Even if the web site states that it will not
share private information with others, do I trust
the web site - Note while confidentiality is enforced by the
organization, privacy is determined by the user.
Therefore for confidentiality, the organization
will determine whether a user can have the data.
If so, then the organization van further
determine whether the user can be trusted
3What is Privacy
- Medical Community
- Privacy is about a patient determining what
patient/medical information the doctor should be
released about him/her - Financial community
- A bank customer determine what financial
information the bank should release about him/her - Government community
- FBI would collect information about US citizens.
However FBI determines what information about a
US citizen it can release to say the CIA
4Some Privacy concerns
- Medical and Healthcare
- Employers, marketers, or others knowing of
private medical concerns - Security
- Allowing access to individuals travel and
spending data - Allowing access to web surfing behavior
- Marketing, Sales, and Finance
- Allowing access to individuals purchases
5Data Mining as a Threat to Privacy
- Data mining gives us facts that are not obvious
to human analysts of the data - Can general trends across individuals be
determined without revealing information about
individuals? - Possible threats
- Combine collections of data and infer information
that is private - Disease information from prescription data
- Military Action from Pizza delivery to pentagon
- Need to protect the associations and correlations
between the data that are sensitive or private
6Some Privacy Problems and Potential Solutions
- Problem Privacy violations that result due to
data mining - Potential solution Privacy-preserving data
mining - Problem Privacy violations that result due to
the Inference problem - Inference is the process of deducing sensitive
information from the legitimate responses
received to user queries - Potential solution Privacy Constraint Processing
- Problem Privacy violations due to un-encrypted
data - Potential solution Encryption at different
levels - Problem Privacy violation due to poor system
design - Potential solution Develop methodology for
designing privacy-enhanced systems
7Privacy Constraint /Policy Processing
- Privacy constraints processing
- Based on prior research in security constraint
processing - Simple Constraint an attribute of a document is
private - Content-based constraint If document contains
information about X, then it is private - Association-based Constraint Two or more
documents taken together is private individually
each document is public - Release constraint After X is released Y becomes
private - Augment a database system with a privacy
controller for constraint processing
8Inference/Privacy Control
Interface to the Semantic Web
Technology By UTD
Inference Engine/ Rules Processor (Reasoning in
OWL?)
Privacy Policies Ontologies Rules
OWL/RDF Documents Web Pages, Databases
OWL/RDF Data Management
9Semantic Model for Privacy Control
Dark lines/boxes contain private information
Cancer
Influenza
Has disease
Johns address
Patient John
England
address
Travels frequently
10Privacy Preserving Data Mining
- Prevent useful results from mining
- Introduce cover stories to give false results
- Only make a sample of data available so that an
adversary is unable to come up with useful rules
and predictive functions - Randomization
- Introduce random values into the data and/or
results - Challenge is to introduce random values without
significantly affecting the data mining results - Give range of values for results instead of exact
values - Secure Multi-party Computation
- Each party knows its own inputs encryption
techniques used to compute final results -
- Rules, predictive functions
- Approach Only make a sample of data available
- Limits ability to learn good classifier
11Platform for Privacy Preferences (P3P) What is
it?
- P3P is an emerging industry standard that enables
web sites to express their privacy practices in a
standard format - The format of the policies can be automatically
retrieved and understood by user agents - It is a product of W3C World wide web consortium
- www.w3c.org
- When a user enters a web site, the privacy
policies of the web site is conveyed to the user
If the privacy policies are different from user
preferences, the user is notified User can then
decide how to proceed - Several major corporations are working on P3P
standards including
12Platform for Privacy Preferences (P3P)
Organizations
- Several major corporations are working on P3P
standards including - Microsoft
- IBM
- HP
- NEC
- Nokia
- NCR
- Web sites have also implemented P3P
- Semantic web group has adopted P3P
13Platform for Privacy Preferences (P3P)
Specifications
- Initial version of P3P used RDF to specify
policies Recent version has migrated to XML - P3P Policies use XML with namespaces for
encoding policies - P3P has its own statements and data types
expressed in XML P3P schemas utilize XML schemas - P3P specification released in January 20005 uses
catalog shopping example to explain concepts P3P
is an International standard and is an ongoing
project - Example Catalog shopping
- Your name will not be given to a third party but
your purchases will be given to a third party
-
-
14P3P and Legal Issues
- P3P does not replace laws
- P3P work together with the law
- What happens if the web sites do no honor their
P3P policies - Then appropriate legal actions will have to be
taken - XML is the technology to specify P3P policies
- Policy experts will have to specify the policies
- Technologies will have to develop the
specifications - Legal experts will have to take actions if the
policies are violated
15Privacy for Assured Information Sharing
Data/Policy for Federation
Export
Export
Data/Policy
Data/Policy
Export
Data/Policy
Component
Component
Data/Policy for
Data/Policy for
Agency A
Agency C
Component
Data/Policy for
Agency B
16Key Points
- 1. There is no universal definition for privacy,
each organization must definite what it means by
privacy and develop appropriate privacy policies - 2. Technology alone is not sufficient for privacy
We need technologists, Policy expert, Legal
experts and Social scientists to work on Privacy - 3. Some well known people have said Forget about
privacy Therefore, should we pursue research on
Privacy? - Interesting research problems, there need to
continue with research - Something is better than nothing
- Try to prevent privacy violations and if
violations occur then prosecute - 4. We need to tackle privacy from all directions
17Application Specific Privacy?
- Examining privacy may make sense for healthcare
and financial applications - Does privacy work for Defense and Intelligence
applications? - Is it meaningful to have privacy for
surveillance and geospatial applications - Once the image of my house is on Google Earth,
then how much privacy can I have? - I may want my location to be private, but does it
make sense if a camera can capture a picture of
me? - If there are sensors all over the place, is it
meaningful to have privacy preserving
surveillance? - This suggestion that we need application specific
privacy - It is not meaningful to examine PPDM for every
data mining algorithm and for every application
18Data Mining and Privacy Friends or Foes?
- They are neither friends nor foes
- Need advances in both data mining and privacy
- Need to design flexible systems
- For some applications one may have to focus
entirely on pure data mining while for some
others there may be a need for privacy-preserving
data mining - Need flexible data mining techniques that can
adapt to the changing environments - Technologists, legal specialists, social
scientists, policy makers and privacy advocates
MUST work together
19Popular Social Networks
- Face book - A social networking website.
Initially the membership was restricted to
students of Harvard University. It was originally
based on what first-year students were given
called the face book which was a way to get to
know other students on campus. As of July 2007,
there over 34 million active members worldwide.
From September 2006 to September 2007 it
increased its ranking from 60 to 6th most visited
web site, and was the number one site for photos
in the United States. - Twitter- A free social networking and
micro-blogging service that allows users to send
updates (text-based posts, up to 140 characters
long) via SMS, instant messaging, email, to the
Twitter website, or an application/ widget within
a space of your choice, like MySpace, Facebook, a
blog, an RSS Aggregator/reader. - My Space - A popular social networking website
offering an interactive, user-submitted network
of friends, personal profiles, blogs, groups,
photos, music and videos internationally.
According to AlexaInternet, MySpace is currently
the worlds sixth most popular English-language
website and the sixth most popular website in any
language, and the third most popular website in
the United States, though it has topped the chart
on various weeks. As of September 7, 2007, there
are over 200 million accounts.
20Social Networks More formal definition
- A structural approach to understanding social
interaction. - Networks consist of Actors and the Ties between
them. - We represent social networks as graphs whose
vertices are the actors and whose edges are the
ties. - Edges are usually weighted to show the strength
of the tie. - In the simplest networks, an Actor is an
individual person. - A tie might be is acquainted with. Or it might
represent the amount of email exchanged between
persons A and B.
21Social Network Examples
- Effects of urbanization on individual well-being
- World political and economic system
- Community elite decision-making
- Social support, Group problem solving
- Diffusion and adoption of innovations
- Belief systems, Social influence
- Markets, Sociology of science
- Exchange and power
- Email, Instant messaging, Newsgroups
- Co-authorship, Citation, Co-citation
- SocNet software, Friendster
- Blogs and diaries, Blog quotes and links
22History
- Sociograms were invented in 1933 by Moreno.
- In a sociogram, the actors are represented as
points in a two-dimensional space. The location
of each actor is significant. E.g. a central
actor is plotted in the center, and others are
placed in concentric rings according to
distance from this actor. - Actors are joined with lines representing ties,
as in a social network. In other words a social
network is a graph, and a sociogram is a
particular 2D embedding of it. - These days, sociograms are rarely used (most
examples on the web are not sociograms at all,
but networks). But methods like MDS
(Multi-Dimensional Scaling) can be used to lay
out Actors, given a vector of attributes about
them. - Social Networks were studied early by researchers
in graph theory (Harary et al. 1950s). Some
social network properties can be computed
directly from the graph. - Others depend on an adjacency matrix
representation (Actors index rows and columns of
a matrix, matrix elements represent the tie
strength between them).
23Social Network Analysis of 9/11 Terrorists
(www.orgnet.com)
Early in 2000, the CIA was informed of two
terrorist suspects linked to al-Qaeda. Nawaf
Alhazmi and Khalid Almihdhar were photographed
attending a meeting of known terrorists in
Malaysia. After the meeting they returned to Los
Angeles, where they had already set up
residence in late 1999.
24Social Network Analysis of 9/11 Terrorists
- What do you do with these suspects? Arrest or
deport them immediately? No, we need to use them
to discover more of the al-Qaeda network. - Once suspects have been discovered, we can use
their daily activities to uncloak their network.
Just like they used our technology against us, we
can use their planning process against them.
Watch them, and listen to their conversations to
see... - who they call / email
- who visits with them locally and in other cities
- where their money comes from
- The structure of their extended network begins to
emerge as data is discovered via surveillance.
25Social Network Analysis of 9/11 Terrorists
A suspect being monitored may have many contacts
-- both accidental and intentional. We must
always be wary of 'guilt by association'.
Accidental contacts, like the mail delivery
person, the grocery store clerk, and neighbor may
not be viewed with investigative interest.
Intentional contacts are like the late
afternoon visitor, whose car license plate is
traced back to a rental company at the airport,
where we discover he arrived from Toronto (got to
notify the Canadians) and his name matches a cell
phone number (with a Buffalo, NY area code) that
our suspect calls regularly. This intentional
contact is added to our map and we start tracking
his interactions -- where do they lead? As data
comes in, a picture of the terrorist organization
slowly comes into focus. How do investigators
know whether they are on to something big? Often
they don't. Yet in this case there was another
strong clue that Alhazmi and Almihdhar were up to
no good -- the attack on the USS Cole in October
of 2000. One of the chief suspects in the Cole
bombing Khallad was also present along with
Alhazmi and Almihdhar at the terrorist meeting
in Malaysia in January 2000. Once we have their
direct links, the next step is to find their
indirect ties -- the 'connections of their
connections'. Discovering the nodes and links
within two steps of the suspects usually starts
to reveal much about their network. Key
individuals in the local network begin to stand
out. In viewing the network map in Figure 2, most
of us will focus on Mohammed Atta because we now
know his history. The investigator uncloaking
this network would not be aware of Atta's
eventual importance. At this point he is just
another node to be investigated.
26Social Network Analysis of 9/11 Terrorists
Figure 2 shows the two suspects and
27Social Network Analysis of 9/11 Terrorists
28Social Network Analysis of 9/11 Terrorists
- We now have enough data for two key conclusions
- All 19 hijackers were within 2 steps of the two
original suspects uncovered in 2000! - Social network metrics reveal Mohammed Atta
emerging as the local leader - With hindsight, we have now mapped enough of the
9-11 conspiracy to stop it. Again, the
investigators are never sure they have uncovered
enough information while they are in the process
of uncloaking the covert organization. They also
have to contend with superfluous data. This data
was gathered after the event, so the
investigators knew exactly what to look for.
Before an event it is not so easy. - As the network structure emerges, a key dynamic
that needs to be closely monitored is the
activity within the network. Network activity
spikes when a planned event approaches. Is there
an increase of flow across known links? Are new
links rapidly emerging between known nodes? Are
money flows suddenly going in the opposite
direction? When activity reaches a certain
pattern and threshold, it is time to stop
monitoring the network, and time to start
removing nodes. - The author argues that this bottom-up approach of
uncloaking a network is more effective than a top
down search for the terrorist needle in the
public haystack -- and it is less invasive of the
general population, resulting in far fewer "false
positives".
29Social Network Analysis of Steroid Usage in
Baseball (www.orgnet.com)
Figure 2 shows the two suspects and
When the Mitchell Report on steroid use in Major
League Baseball MLB, was published, people were
surprised at who and how many players were
mentioned. The diagram below shows a human
network created from data found in the Mitchell
Report. Baseball players are shown as green
nodes. Those who were found to be providers of
steroids and other illegal performance enhancing
substances appear as red nodes. The links reveal
the flow of chemicals -- from provider to player.
30Knowledge Sharing in Organizations Finding
Experts
Figure 2 shows the two suspects and
31Knowledge Sharing Network Finding Experts
(www.orgnet.com)
Figure 2 shows the two suspects and
Organizational leaders are preparing for the
potential loss of expertise and knowledge flow
due to turnover, downsizing, outsourcing, and the
coming retirements of the baby boom generation.
The model network (previous chart) is used to
illustrate the knowledge continuity analysis
process. Each node in this sample network
(previous chart) represents a person that works
in a knowledge domain. Some people have more /
different knowledge than others. Employees who
will retire in 2 years or less have their nodes
colored red. Those who will retire in 3-4 years
are colored yellow. Those retiring in 5 years or
later are colored green. A gray, directed line
is drawn from the seeker of knowledge to the
source of expertise. A--B indicates that A seeks
expertise / advice from B. Those with many
arrows pointing to them are sought often for
assistance. The top subject matter experts --
SMEs -- in this group are nodes 29, 46, 100, 41,
36 and 55. The SMEs were discovered using a
network metric in InFlow that is similar to how
the Google search engine ranks web pages --
using both direct and indirect links. Of the top
six SMEs in this group, half are colored red100
or yellow46, 55. The loss of person 46 has the
greatest potential for knowledge loss. 90 of the
network is within 3 steps of accessing this key
knowledge source.
32Social Networks Security and Privacy Issues
European Network and Information Security Agency
- The European Network and Information Security
Agency (ENISA) has released its first issue paper
Security Issues and Recomendations for Online
Social Networks". - http//www.enisa.europa.eu/doc/pdf/deliverables/en
isa_pp_social_networks.pdf - Four groups of threats privacy related threats,
variants of traditional network and information
security threats, identity related threats,
social threats. - Recommendations are given for governments
(oversight and adaption of existing data
protection legislation), companies that run such
networks, technology developers, and research and
standardisation bodies. - Some concenrs recommnendation to use automated
filters against "offensive, litigious or illegal
content". This brings potential freedom of speech
issues. European Digital Rights has started a
campaign against a similar recommendation by the
Council of Europe.Issue of portability of
profiles social graphs are also addressed.
However what is missing is that Information
about social links is not about only one user,
but also the others which he is linked to. They
have to agree if this information is moved to
different platforms.
33Social Networks Security and Privacy Issues
Microsoft Recommendations http//www.microsoft.com
/protect/yourself/personal/communities.mspx
- Online communities require you to provide
personal information. Profiles are public.
Comments you post are permanently recorded on the
community site.You might even mention when you
plan to be out of town. - E-mail and phishing scammers count on the
appealing sense of trust that is often fostered
in online communities to steal your personal
information. The more you reveal in profiles and
posts, the more vulnerable you are to scams,
spam, and identity theft. - Here are some features to look for when you're
considering joining an online community - Privacy policies that explain exactly what
information the service will collect and how it
might be used. User guidelines that outline a
basic code of conduct for users on their sites.
Sites have the option to penalize reported
violators with account suspension or
termination.Special provisions for children and
their parents, such as family-friendly options
geared towards protecting children under a
certain age.Password protection to help keep
your account secure..E-mail address hiding,
which lets you display only part of your e-mail
address on the site's membership lists. Filtering
options Offered on blogging sites, these tools
let you to choose which subscribers can see what
you've written.
34Role of Semantic Web
- FOAF (Friend of a Friend)
- Social Graph represented in RDF
- Use the reasoning tools and analyze the social
network for suspicious events - Protect the privacy of individuals
35FOAF http//www.foaf-project.org/abouthttp//en.
wikipedia.org/wiki/FOAF_(software)
- FOAF (an acronym of Friend of a Friend) is a
machine-readable ontology describing persons,
their activities and their relations to other
people and objects. Anyone can use FOAF to
describe him or herself. FOAF allows groups of
people to describe social networks without the
need for a centralised database. - FOAF's descriptive vocabulary is expressed using
RDF Resource Description Framework and OWL Web
Ontology Language. - Computers may use these FOAF profiles to find,
for example, all people living in Europe, or to
list all people both you and a friend of you
know. This is accomplished by defining
relationships between people. Each profile has a
unique identifier (such as the person's e-mail
addresses, a URI of the homepage or weblog of the
person), which is used when defining these
relationships.
36FOAF http//www.foaf-project.org/abouthttp//en.
wikipedia.org/wiki/FOAF_(software)
- The FOAF project, which defines and extends the
vocabulary of a FOAF profile, was started in 2000
by and . It can be considered the first Social
Semantic Web application, in that it combines RDF
technology with 'Social Web' concerns. - Tim Berners-Lee in a recent essay redefined the
Semantic web concept into something he calls the
Giant Global Graph, where relationships transcend
networks/documents. He considers the GGG to be on
equal grounds with Internet and World Wide Web,
stating that "I express my network in a FOAF
file, and that is a start of the revolution."
37FOAF http//www.foaf-project.org/abouthttp//en.
wikipedia.org/wiki/FOAF_(software)
- The following FOAF profile (written in XML
format) states that Jimmy Wales is the name of
the person described here. His e-mail address,
homepage and depiction are resources, which means
that each of them can be described using RDF as
well. He has Wikipedia as an interest, and knows
Angela Beesley (which is the name of a 'Person'
resource). - xmlnsrdf"http//www.w3.org/1999/02/22-rdf-syntax
-ns" xmlnsrdfs"http//www.w3.org/2000/01/rdf-sc
hema"
Jimmy Wales rdfresource"mailtojwales_at_bomis.com" /
.com/" / Jimbo
s.com/aus_img_small.jpg" / rdfresource"http//www.wikimedia.org"
rdfslabel"Wikipedia" /
Angela
Beesley
38Confidentiality, Privacy and Trust (CPT)
- How can be CPT be incorporated into FOAF?
- Assignment 3