Fifth Workshop on Link Analysis, Counterterrorism, and Security' - PowerPoint PPT Presentation

About This Presentation
Title:

Fifth Workshop on Link Analysis, Counterterrorism, and Security'

Description:

... algorithms are seriously vulnerable to manipulation by, e.g., adding a ... When text has been altered for concealment, compiler techniques may help to spot ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 10
Provided by: DavidSki6
Category:

less

Transcript and Presenter's Notes

Title: Fifth Workshop on Link Analysis, Counterterrorism, and Security'


1
  • Fifth Workshop on Link Analysis,
    Counterterrorism, and Security.
  • or
  • Antonio Badia
  • David Skillicorn

2
  • Open Problems
  • An individualized list
  • (with some feedback from
  • workshop participants)

3
  • Process improvements
  • Better overall processes.
  • Defence in depth is the key to lower error
    rates good/normal should look like it
  • from every direction
  • Handling multiple kinds of data at once
    (attributed with relational)
  • We dont know very many algorithms that
    exploit more than one type of data
  • within the same algorithm
  • Using graph analysis techniques more widely
  • Although there are good reasons to expect
    that a graph approach will be more
  • robust than a direct approach, this is
    hardly ever done for good reasons because
  • its harder and messier
  • Better ways to exploit the fact that normality
    implies internal consistency
  • This only makes sense in an adversarial
    setting so it has received little attention
  • but it is a good, basic technique

4
  • Legal and social frameworks for preemptive data
    analysis
  • The arguments for widespread data collection,
    and ways to mitigate the downsides
  • need to be developed further, and explained
    by the knowledge discovery community
  • to those who have legitimate concerns about
    the cost/benefit tradeoff
  • Challenges of open virtual worlds
  • New virtual worlds, such as the Multiverse,
    make it much hard to gather data using
  • any kind of surveillance the consequences
    need to be understood
  • Focus on emergent properties rather than
    collected ones
  • Attributes that are derived from the
    collective properties of many individual
  • records are much more resistant to
    manipulation than those collected directly in
  • individual records
  • Collaboration with linguists, sociologists,
    anthropologists, etc.
  • Applying technology well depends on deeper
    understanding of context, and computing
  • people do not necessarily do this well
  • Better use of visualization, especially
    multiple views

5
  • Easy technical advances
  • Hardening standard techniques against
    manipulation (by insiders and outsiders)
  • Most existing algorithms are seriously
    vulnerable to manipulation by, e.g., adding a
  • few particular data records
  • Distinguishing the bad from the unusual
  • Its straightforward to identify the normal
    in a dataset, but once these records
  • have been removed, it still remains to
    separate the bad and the unusual little has
  • been done to attack this problem
  • Getting graph techniques to work as well as
    they should
  • Although graph algorithms have known
    theoretical advantages, it has been
  • surprisingly difficult to turn these into
    practical advantages
  • Strong but transparent predictors
  • We know predictors that are strong, and
    predictors that are transparent (they
  • explain their predictions) but we dont
    know any that are both at once

6
  • Detecting when models need to be updated
    because the setting has changed
  • In adversarial settings, there is a constant
    arms race, and so a greater need to
  • update models regularly automatic ways to
    know when to do this are not really
  • known
  • Clustering to find fringe records
  • In adversarial settings, the records of
    interest are likely to be close to the normal
  • data, rather than outliers techniques for
    detecting such fringe clusters are
  • needed
  • Better 1-class prediction techniques
  • In many settings, only normal data is
    available existing 1-class prediction is
    unusably
  • fragile
  • Temporal change detection (trend/concept drift
    in every analysis)
  • One way to detect manipulation is to see
    change for which there seems to be no
  • explanation detecting this would be useful

7
  • Keyless fusion algorithms, and an understanding
    of the limits of fusion
  • Most fusion uses key attributes that are
    thought of as describing identity but,
  • anecdotally, almost any set of attributes
    can play this role, and we need to
  • understand the theory and limits
  • Better symbiotic knowledge discovery humans
    and algorithms coupled together
  • Many analysis systems have a loop between
    analyst and knowledge-discovery tools,
  • but there seem to be interesting ways to
    make this loop more productive

8
  • Difficult technical advances
  • Finding larger structures in text
  • Very little structure above the level of
    named entities is done at present but there
  • are opportunities to extract larger
    structures both to check for normality, and to
  • use them to understand content better
  • Authorship detection from small samples
  • The web has become a place where authors are
    plentiful, and it would be useful to
  • detect that the same person has written in
    this blog and that forum
  • Unusual region detection in graphs
  • Most graph algorithms focus either on
    clustering or on exploring the region of a
  • single node it is also interesting to find
    regions that are somehow anomalous
  • Performance improvements to allow scaling to v.
    large datasets
  • Changes of three orders of magnitude in
    quantity require changes in the qualitative
  • properties of algorithms scalability
    issues need more attention

9
  • Better use of second-order algorithms
  • Approaches in which an algorithm is run
    repeatedly under different conditions and it
  • is a change from one run to the next that is
    significant have potential but are hardly
  • ever used
  • Systemic functional linguistics for
    content/mental state extraction from text
  • SFL takes into account the personal and
    social dimensions of language, and brings
  • together texts that look very different on
    the surface this will have payoffs in
  • several dimensions of text exploitation
  • Adversarial parsing (cf error correction in
    compilers)
  • When text has been altered for concealment,
    compiler techniques may help to spot
  • where these changes have occurred and what
    they might have been.
Write a Comment
User Comments (0)
About PowerShow.com