Chapter 1 Introduction: Data-Analytic Thinking - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Chapter 1 Introduction: Data-Analytic Thinking

Description:

Chapter 1 Introduction: Data-Analytic Thinking * Data and Data Science Capability as a Strategic Asset Signet Bank s management was convinced that modeling ... – PowerPoint PPT presentation

Number of Views:669
Avg rating:3.0/5.0
Slides: 55
Provided by: 6649734
Category:

less

Transcript and Presenter's Notes

Title: Chapter 1 Introduction: Data-Analytic Thinking


1
Chapter 1Introduction Data-Analytic Thinking
2
  • The past fifteen years have seen extensive
    investments in business infrastructure, which
    have improved the ability to collect data
    throughout the enterprise.
  • Virtually every aspect of business is now open to
    data collection and often even instrumented for
    data collection operations, manufacturing,
    supply-chain management, customer behavior,
    marketing campaign performance, workflow
    procedures, and so on.
  • At the same time, information is now widely
    available on external events such as market
    trends, industry news, and competitors
    movements.
  • This broad availability of data has led to
    increasing interest in methods for extracting
    useful information and knowledge from data-the
    realm of data science.

3
The Ubiquity of Data Opportunities
  • With vast amounts of data now available,
    companies in almost every industry are focused on
    exploiting data for competitive advantage.
  • In the past, firms could employ teams of
    statisticians, modelers, and analysts to explore
    datasets manually, but the volume and variety of
    data have far outstripped the capacity of manual
    analysis.
  • At the same time, computers have become far more
    powerful, networking has become ubiquitous, and
    algorithms have been developed that can connect
    datasets to enable broader and deeper analyses
    than previously possible.
  • The convergence of these phenomena has given rise
    to the increasing widespread business application
    of data science principles and data mining
    techniques.

4
The Ubiquity of Data Opportunities
  • Data mining is used for general customer
    relationship management to analyze customer
    behavior in order to manage attrition and
    maximize expected customer value.
  • The finance industry uses data mining for credit
    scoring and trading, and in operations via fraud
    detection and workforce management.
  • Major retailers from Walmart to Amazon apply data
    mining throughout their businesses, from
    marketing to supply-chain management.
  • Many firms have differentiated themselves
    strategically with data science, sometimes to the
    point of evolving into data mining companies.
  • The primary goals of this book are to help you
    view business problems from a data perspective
    and understand principles of extracting useful
    knowledge from data.

5
The Ubiquity of Data Opportunities
  • The primary goals of this book are to help you
    view business problems from a data perspective
    and understand principles of extracting useful
    knowledge from data.
  • There is a fundamental structure to data-analytic
    thinking, and basic principles that should be
    understood.
  • There are also particular areas where intuition,
    creativity, common sense, and domain knowledge
    must be brought to bear.

6
The Ubiquity of Data Opportunities
  • Throughout the first two chapters of this books,
    we will discuss in detail various topics and
    techniques related to data science and data
    mining.
  • The terms data science and data mining often
    are used interchangeably, and the former has
    taken a life of its own as various individuals
    and organizations try to capitalize on the
    current hype surrounding it.
  • At a high level, data science is a set of
    fundamental principles that guide the extraction
    of knowledge from data. Data mining is the
    extraction of knowledge from data, via
    technologies that incorporate these principles.
  • As a term, data science often is applied more
    broadly than the traditional use of data
    mining, but data mining techniques provide some
    of the clearest illustrations of the principles
    of data science.

7
Example Hurricane Frances
  • Consider an example from a New York Time story
    from 2004
  • Hurricane Frances was on its way, barreling
    across the Caribbean, threatening a direct hit on
    Floridas Atlantic coast. Residents made for
    higher ground, but far away, in Bentonville,
    Ark., executives at Wal-Mart Stores decided that
    the situation offered a great opportunity for one
    of their newest data-driven weapons predictive
    technology.
  • A week ahead of the storms landfall, Linda M.
    Dillman, Wal-Marts chief information officer,
    pressed her staff to come up with forecasts based
    on what had happened when Hurricane Charley
    struck several weeks earlier. Backed by the
    trillions of bytes worth of shopper history that
    is stored in Wal-Marts data warehouse, she felt
    that the company could start predicting whats
    going to happen, instead of waiting for it to
    happen, as she put it. (Hays, 2004)

8
Example Hurricane Frances
  • Consider why data-driven prediction might be
    useful in this scenario.
  • It might be useful to predict that people in the
    path of the hurricane would buy more bottled
    water. Maybe, but this point seems a bit obvious,
    and why would we need data science to discover
    it?
  • It might be useful to project the amount of
    increase in sale due to the hurricane, to ensure
    that local Wal-Mart are properly stocked.
  • Perhaps mining the data could reveal that a
    particular DVD sold out in the hurricanes path
    but maybe it sold out that week at Wal-Marts
    across the country, not just where the hurricane
    landing was imminent.

9
Example Hurricane Frances
  • The prediction could be somewhat useful, but is
    probably more general than Ms. Dillman was
    intending.
  • It would be more valuable to discover patterns
    due to the hurricane that were not obvious.
  • To do this, analysts might examine the huge
    volume of Wal-Mart data from prior, similar
    situations (such as Hurricane Charley) to
    identify unusual local demand for products.

10
Example Hurricane Frances
  • From such patterns, the company might be able to
    anticipate unusual demand for products and rush
    stock to the stores ahead of the hurricanes
    landfall. Indeed, that is what happened.
  • The New York Times (Hays, 2004) reported
    thatthe experts mined the data and found that
    the stores would indeed need certain products-and
    not just the usual flashlights. We didnt know
    in the past that strawberry PopTarts increase in
    sales, like seven times their normal sales rate,
    ahead of a hurricane, Ms. Dillman said in a
    recent interview. And the pre-hurricane
    top-selling item was beer.

11
Example Predicting Customer Churn
  • How are such data analyses performed? Consider a
    second, more typical business scenario and how it
    might be treated from a data perspective.
  • Assume you just landed a great analytical job
    with MegaTelCo, one of the largest
    telecommunication firms in the United States.
  • They are having major problem with customer
    retention in their wireless business. In the
    mid-Atlantic region, 20 of cell phone customers
    leave when their contracts expire, and it is
    getting increasingly difficult to acquire new
    customers.
  • Since the cell phone market is now saturated, the
    huge growth in the wireless market has tapered
    off.

12
Example Predicting Customer Churn
  • Communications companies are now engaged in
    battles to attract each others customers while
    retaining their own.
  • Customers switching from one company to another
    is called churn, and it is expensive all around
    one company must spend on incentives to attract a
    customer while another company loses revenue when
    the customer departs.
  • You have been called in to help understand the
    problem and to devise a solution.
  • Attracting new customers is much more expensive
    than retaining existing ones, so a good deal of
    marketing budget is allocated to prevent churn.

13
Example Predicting Customer Churn
  • Marketing has already designed a special
    retention offer. Your task is to devise a
    precise, step-by-step plan for how the data
    science team should use MegaTelCos vast data
    resources to decide which customers should be
    offered the special retention deal prior to the
    expiration of their contract.
  • Think carefully about what data you might use and
    how they would be used. Specifically, how should
    MegaTelCo choose a set of customers to receive
    their offer in order to best reduce churn for a
    particular incentive budget? Answering this
    question is much more complicated than it may
    seem initially.

14
Data Science, Engineering, and Data-Driven
Decision Making
  • Data science involves principles, processes, and
    techniques for understanding phenomena via the
    (automated) analysis of data.
  • In this book, we will view the ultimate goal of
    data science as improving decision making, as
    this generally is of direct interest to business.

15
Data Science, Engineering, and Data-Driven
Decision Making
  • Figure 1-1 places data science in the context of
    various other closely related and data related
    processes in the organization.
  • It distinguishes data science from other aspects
    of data processing that are gaining increasing
    attention in business. Lets start at the top.

16
Data Science, Engineering, and Data-Driven
Decision Making
  • Data-driven decision-making (DDD) refers to the
    practice of basing decisions on the analysis of
    data, rather than purely on intuition.
  • For example, a marketer could select
    advertisements based purely on her long
    experience in the field and her eye for what will
    work. Or, she could base her selection on the
    analysis of data regarding how consumers react to
    different ads.
  • She could also use a combination of these
    approaches. DDD is not an all-or-nothing
    practice, and different firms engage in DDD to
    greater or lesser degrees.

17
Data Science, Engineering, and Data-Driven
Decision Making
  • Economist Erik Brynjolfsson and his colleagues
    from MIT and Penns Wharton School conducted a
    study of how DDD affects firm performance
    (Brynjolfsson, Hitt, Kim,2011).
  • They developed a measure of DDD that rates firms
    as to how strongly they use data to make
    decisions across the company.
  • They show that statistically, the more data
    driven a firm is, the more productive it is-even
    controlling for a wide range of possible
    confounding factors.
  • And the differences are not small. One standard
    deviation higher on the DDD scale is associated
    with a 4-6 increase in productivity. DDD also
    is correlated with higher return on assets,
    return on equity, asset utilization, and market
    value, and the relationship seems to be causal.

18
Data Science, Engineering, and Data-Driven
Decision Making
  • The sort of decisions we will be interested in
    this book mainly fall into two type
  • (1) decisions for which discoveries need to be
    made within data, and
  • (2) decisions that repeat, especially at massive
    scale, and so decision-making can benefit from
    even small increases in decision-making accuracy
    based on data analysis.
  • The Walmart example above illustrates a type 1
    problem Linda Dillman would like to discover
    knowledge that will help Walmart prepare for
    Hurricane Francess imminent arrival.
  • In 2012, Walmarts competitor Target was in the
    news for a data-driven decision-making case of
    its own, also a type 1 problem (Duhigg, 2012).
    Like most retailers, Target cares about
    consumers shopping habits, what drives them, and
    what can influence them.

19
Data Science, Engineering, and Data-Driven
Decision Making
  • Consumers tend to have inertia in their habits
    and getting them to change is very difficult.
    Decision makers at Target knew, however, that the
    arrival of a new baby in a family is one point
    where people do change their shopping habits
    significantly.
  • In the Target analysts word, As soon as we get
    them buying diapers from us, theyre going to
    start buying everything else too. Most retailers
    know this and so they compete with each other
    trying to sell baby-related products to new
    parents. Since most birth records are public,
    retailers obtain information on births and send
    out special offers to the new parents.

20
Data Science, Engineering, and Data-Driven
Decision Making
  • However, Target wanted to get a jump on their
    competition. They were interested in whether they
    could predict that people are expecting a baby.
    If they could, they would gain an advantage by
    making offers before their competitors. Using
    techniques of data science, Target analyzed
    historical data on customers who later were
    revealed to have been pregnant.
  • For example, pregnant mothers often change their
    diets, their wardrobes, their vitamin regimens,
    and so on. These indicators could be extracted
    from historical data, assembled into predictive
    models, and then deployed in marketing campaigns.

21
Data Science, Engineering, and Data-Driven
Decision Making
  • We will discuss predictive models in much detail
    as we go through the book.
  • For the time being, it is sufficient to
    understand that a predictive model abstracts away
    most of the complexity of the world, focusing in
    on particular set of indicators that correlate in
    some way with a quantity of interest.
  • Importantly, in both the Walmart and the Target
    example, the data analysis was not testing a
    simple hypothesis. Instead, the data were
    explored with the hope that something useful
    would be discovered.

22
Data Science, Engineering, and Data-Driven
Decision Making
  • Our churn example illustrates type 2 DDD problem.
    MegaTelCo has hundreds of millions of customers,
    each a candidate for defection. Ten of millions
    of customers have contracts expiring each month,
    so each one of them has an increased likelihood
    of defection in the near future. If we improve
    our ability to estimate, for a given customer,
    how profitable it would be for us to focus on
    her, we can potentially reap large benefits by
    applying this ability to the millions of
    customers in the population.
  • This same logic applies to many of the areas
    where we have seen the most application of data
    science and data mining direct marketing, online
    advertising, credit scoring, financial trading,
    help-desk management, fraud detection, search
    ranking, product recommendation, and so on.

23
Data Science, Engineering, and Data-Driven
Decision Making
  • The diagram in figure 1-1 shows data science
    supporting data-driven decision-making, but also
    overlapping with data-driven decision making.
    This highlights the often overlooked fact that,
    increasingly, business decisions are being made
    automatically by computer systems. Different
    industries have adopted automatic decision-making
    at different rates. The finance and
    telecommunications industries were early adopts,
    largely because of their precocious development
    of data networks and implementation of
    massive-scale computing, which allowed the
    aggregation and modeling of data at a large
    scale, as well as the application of the
    resultant models to decision-making.

24
Data Science, Engineering, and Data-Driven
Decision Making
  • In the 1990s, automated decision-making changed
    the banking and customer credit industries
    dramatically. In the 1990s, banks and
    telecommunications companies also implemented
    massive-scale systems for managing data-driven
    fraud control decisions.
  • As retail system were increasingly computerized,
    merchandising decisions were automated. Famous
    example include Harrahs casinos reward programs
    and the automated recommendations of Amazon and
    Netflix. Currently we are seeing a revolution in
    advertising, due in large part to a huge increase
    in the amount of time consumers are spending
    online, and the ability online to make
    (literally) split-second advertising decision.

25
Netflix??? (?)
  • 2013?2?Netflix???????(House of Cards)??
  • ?????? (David Fincher???????????????????)
  • ??????? (Kevin Spacey?????????????????)
  • ??? 26 ?,??????

26
Netflix???
  • ???????75?????????
  • ?????????????????????

27
Netflix??? (?)
  • ?????(House of Cards)??
  • ??1. ??????2. ?????? 13 ??
  • ?????? 2,900 ? Netflix ?????????,???????????????,
    ??????,???????
  • ??IMDb ? 15,762 ???,???? 9.0 (2013.2.28),??????
    IMDb MOVIEmeter ????????????

28
Netflix ???????????
  • ?????????????????
  • ??????????????????????????
  • ????????,??????????,???????????

29
Data Processing and Big Data
  • It is important to digress here to address
    another point. There is a lot to data processing
    that is not data sciencedespite the impression
    one might get from the media. Data engineering
    and processing are critical to support data
    science, but they are more general.
  • For example, these days many data processing
    skills, systems, and technologies often are
    mistakenly cast as data science. To understand
    data science and data-driven businesses it is
    important to understand the differences.
  • Data science needs access to data and it often
    benefits from sophisticated data engineering that
    data processing technologies may facilitate, but
    these technologies are not data science
    technologies per se.

30
Data Processing and Big Data
  • Data processing technologies are very important
    for many data-oriented business tasks that do not
    involve extracting knowledge or data-driven
    decision-making, such as efficient transaction
    processing, modern web system processing, and
    online advertising campaign management.
  • Big data technologies (such as Hadoop, HBase,
    and MongoDB) have received considerable media
    attention recently. Big data essentially means
    datasets that are too large for traditional data
    processing systems, and therefore require new
    processing technologies.
  • As with the traditional technologies, big data
    technologies are used for many tasks, including
    data engineering. Occasionally, big data
    technologies are actually used for implementing
    data mining techniques. However, much more often
    the well-known big data technologies are used for
    data processing in support of the data mining
    techniques and other data science activities.

31
Data Processing and Big Data
  • Previously, we discussed Brynjolfssons study
    demonstrating the benefits of data-driven
    decision-making. A separate study, conducted by
    economist Prasanna Tambe of NYUs Stern School,
    examined the extent to which big data
    technologies seem to help firms (Tambe, 2012). He
    finds that, after controlling for various
    possible confounding factors, using big data
    technologies is associated with significant
    additional productivity growth.
  • Specifically, one standard deviation higher
    utilization of big data technologies is
    associated with 13 higher productivity than
    the average firm one standard deviation lower in
    terms of big data utilization is associated with
    13 lower productivity. This leads to
    potentially very large productivity differences
    between the firms at the extremes.

32
From Big Data 1.0 to Big Data 2.0
  • One way to think about the state of big data
    technologies is to draw an analogy with the
    business adoption of Internet technologies.
  • In Web 1.0, businesses busied themselves with
    getting the basic internet technologies in place,
    so that they could establish a web presence,
    build electronic commerce capability, and improve
    the efficiency of their operations.
  • Once firms had incorporated Web 1.0 technologies
    thoroughly (and in the process had driven down
    prices of the underlying technology) they started
    to look further. They began to ask what the Web
    could do for them, and how it could improve
    things theyd always doneand we entered the era
    of Web 2.0, where new systems and companies began
    taking advantage of the interactive nature of the
    Web.

33
From Big Data 1.0 to Big Data 2.0
  • We should expect a Big Data 2.0 phase to follow
    Big Data 1.0. Once firms have become capable of
    processing massive data in a flexible fashion,
    they should begin asking What can I now do that
    I couldnt do before, or do better than I could
    do before? This is likely to be the golden era
    of data science.

34
Data and Data Science Capability as a Strategic
Asset
  • The prior sections suggest one of the fundamental
    principles of data science data, and the
    capability to extract useful knowledge from data,
    should be regarded as key strategic assets.
  • Too many businesses regard data analytics as
    pertaining mainly to realizing value from some
    existing data, and often without careful regard
    to whether the business has the appropriate
    analytical talent.
  • The best data science team can yield little value
    without the appropriate data the right data
    often cannot substantially improve decisions
    without suitable data science talent. As with all
    assets, it is often necessary to make investments.

35
Data and Data Science Capability as a Strategic
Asset
  • Thinking explicitly about how to invest in data
    assets very often pays off handsomely. The
    classic story of little Signet Bank from the
    1990s provides a case in point. Previously, in
    the 1980s, data science had transformed the
    business of consumer credit.
  • Modeling the probability of default (??) had
    changed the industry from personal assessment of
    the likelihood of default to strategies of
    massive scale and market share, which brought
    along concomitant economies of scale.
  • It may seem strange now, but at the time, credit
    cards essentially had uniform pricing, for two
    reasons(1) the companies did not have adequate
    information systems to deal with differential
    pricing at massive scale, and (2) bank management
    believed customers would not stand for price
    discrimination.

36
Data and Data Science Capability as a Strategic
Asset
  • Around 1990, two strategic visionaries (Richard
    Fairbanks and Nigel Morris) realized that
    information technology was powerful enough that
    they could do more sophisticated predictive
    modelingusing the sort of techniques that we
    discuss throughout this bookand offer different
    terms (nowadays pricing, credit limits,
    low-initial-rate balance transfers, cash back,
    loyalty points, and so on).
  • These two men had no success persuading the big
    banks to take them on as consultants and let them
    try. Finally, after running out of big banks,
    they succeeded in garnering the interest of a
    small regional Virginia bank Signet Bank.

37
Data and Data Science Capability as a Strategic
Asset
  • Signet Banks management was convinced that
    modeling profitability, not just default
    probability, was the right strategy.
  • They know that a small proportion of customers
    actually account for more than 100 of a banks
    profit from credit card operations (because the
    rest are break-even or money-losing).
  • If they could model profitability, they could
    make better offers to the best customers and
    skim the cream of the big banks clientele.

38
Data and Data Science Capability as a Strategic
Asset
  • But Signet Bank had one really big problem in
    implementing this strategy.
  • They did not have the appropriate data to model
    profitability with the goal of offering different
    terms to different customers. No one did.
  • Since banks were offering credit with a specific
    set of terms and a specific default model, they
    had the data to model profitability (1) for the
    terms they actually have offered in the past, and
    (2) for the sort of customer who was actually
    offered credit (that is, those who were deemed
    worthy of credit by the existing model).

39
Data and Data Science Capability as a Strategic
Asset
  • What could Signet Bank do? They brought into play
    a fundamental strategy of data science acquire
    the necessary data at a cost. In Signets case,
    data could be generated on the profitability of
    customers given different credit terms by
    conducting experiments. Different terms were
    offered at random to different customers.
  • This may seem foolish outside the context of
    data-analytic thinking youre likely to lose
    money! This is true. In this case, losses are the
    cost of data acquisition. The data analytic
    thinker needs to consider whether she expects the
    data to have sufficient value to justify the
    investment.

40
Data and Data Science Capability as a Strategic
Asset
  • So what happened with Signet Bank? As you might
    expect, when Signet began randomly offering terms
    to customers for data acquisition, the number of
    bad accounts soared.
  • Signet went from an industry-leading charge-off
    (??) rate (2.9 of balances went unpaid) to
    almost 6 charge-offs.
  • Losses continued for a few years while the data
    scientists worked to build predictive models from
    the data, evaluate them, and deploy them to
    improve profit.

41
Data and Data Science Capability as a Strategic
Asset
  • Because the firm viewed these losses as
    investments in data, they persisted despite
    complaints from stakeholders.
  • Eventually, Signets credit card operation turned
    around and became so profitable that it was spun
    off to separate it from the banks other
    operations, which now were overshadowing the
    consumer credit success.

42
Data and Data Science Capability as a Strategic
Asset
  • Fairbanks and Morris became Chairman and CEO and
    President and COO, and proceeded to apply data
    science principles throughout the businessnot
    just customer acquisition but retention as well.
  • When a customer calls looking for a better offer,
    data driven models calculate the potential
    profitability of various possible actions
    (different offers, including sticking with the
    status quo), and the customer service
    representatives computer presents the best
    offers to make.
  • Fairbanks and Morriss new company grew to be one
    of the largest credit card issuers in the
    industry with one of the lowest charge off rates.
    In 2000, the bank was reported to be carrying out
    45,000 of these scientific tests as they called
    them.

43
Data and Data Science Capability as a Strategic
Asset
  • The idea of data as a strategic asset is
    certainly not limited to Capital One, nor even to
    the banking industry.
  • Amazon was able to gather data early on online
    customers, which has created significant
    switching costs consumers find value in the
    rankings and recommendations that Amazon
    provides. Amazon therefore can retain customers
    more easily, and can even charge a premium
    (Brynjolfsson Smith, 2000).
  • Harrahs casinos famously invested in gathering
    and mining data on gamblers, and moved itself
    from a small player in the casino business in the
    mid-1990s to the acquisition of Caesars
    Entertainment in 2005 to become the worlds
    largest gambling company.

44
Data and Data Science Capability as a Strategic
Asset
  • The huge valuation of Facebook has been credited
    to its vast and unique data assets (Sengupta,
    2012), including both information about
    individuals and their likes, as well as
    information about the structure of the social
    network.
  • Information about network structure has been
    shown to be important to predicting and has been
    shown to be remarkably helpful in building models
    of who will buy certain products (Hill, Provost,
    Volinsky, 2006).
  • It is clear that Facebook has a remarkable data
    asset whether they have the right data science
    strategies to take full advantage of it is an
    open question.
  • In the book we will discuss in more detail many
    of the fundamental concepts behind these success
    stories, in exploring the principles of data
    mining and data-analytic thinking.

?
45
Data-Analytic Thinking
  • Analyzing case studies such as the churn problem
    improves our ability to approach problems
    data-analytically.
  • When faced with a business problem, you should be
    able to assess whether and how data can improve
    performance. We will discuss a set of fundamental
    concepts and principles that facilitate careful
    thinking. We will develop frameworks to structure
    the analysis so that it can be done
    systematically.
  • Understanding the fundamental concepts, and
    having frameworks for organizing data-analytic
    thinking not only will allow one to interact
    competently, but will help to envision
    opportunities for improving data-driven
    decision-making, or to see data-oriented
    competitive threats.
  • Firms in many traditional industries are
    exploiting new and existing data resources for
    competitive advantage. They employ data science
    teams to bring advanced technologies to bear to
    increase revenue and to decrease costs.

46
Data-Analytic Thinking
  • Increasingly, managers need to oversee analytics
    teams and analysis projects, marketers have to
    organize and understand data-driven campaigns,
    venture capitalists must be able to invest wisely
    in businesses with substantial data assets, and
    business strategists must be able to devise plans
    that exploit data.
  • On a scale less grand, but probably more common,
    data analytics projects reach into all business
    units. Employees throughout these units must
    interact with the data science team.
  • If these employees do not have a fundamental
    grounding in the principles of dataanalytic
    thinking, they will not really understand what is
    happening in the business.

47
Data-Analytic Thinking
  • This lack of understanding is much more damaging
    in data science projects than in other technical
    projects, because the data science is supporting
    improved decisionmaking.
  • Firms where the business people do not
    understand what the data scientists are doing are
    at a substantial disadvantage, because they waste
    time and effort or, worse, because they
    ultimately make wrong decisions.

48
This Book
  • This book concentrates on the fundamentals of
    data science and data mining. These are a set of
    principles, concepts, and techniques that
    structure thinking and analysis. They allow us to
    understand data science processes and methods
    surprisingly deeply, without needing to focus in
    depth on the large number of specific data mining
    algorithms.

49
Data Mining and Data Science, Revisited
  • Fundamental concept Formulating data mining
    solutions and evaluating the results involves
    thinking carefully about the context in which
    they will be used. If our goal is the extraction
    of potentially useful knowledge, how can we
    formulate what is useful?
  • It depends critically on the application in
    question. For our churn-management example, how
    exactly are we going to use the patterns
    extracted from historical data? Should the value
    of the customer be taken into account in addition
    to the likelihood of leaving?
  • More generally, does the pattern lead to better
    decisions than some reasonable alternative?

50
Data Mining and Data Science, Revisited
  • How well would one have done by chance? How well
    would one do with a smart default alternative?
  • These are just four of the fundamental concepts
    of data science that we will explore.
  • By the end of the book, we will have discussed a
    dozen such fundamental concepts in detail, and
    will have illustrated how they help us to
    structure data-analytic thinking and to
    understand data mining techniques and algorithms,
    as well as data science applications, quite
    generally.

51
Chemistry Is Not About Test Tubes Data Science
Versusthe Work of the Data Scientist
  • This book focuses on the science and not on the
    technology. You will
  • not find instructions here on how best to run
    massive data mining jobs on Hadoop clusters, or
    even what Hadoop is or why you might want to
    learn about it.
  • We focus here on the general principles of data
    science that have emerged. In 10 years time the
    predominant technologies will likely have changed
    or advanced enough that a discussion here would
    be obsolete, while the general principles are the
    same as they were 20 years ago, and likely will
    change little over the coming decades.

52
Summary
  • This book is about the extraction of useful
    information and knowledge from large volumes of
    data, in order to improve business
    decision-making. As the massive collection of
    data has spread through just about every industry
    sector and business unit, so have the
    opportunities for mining the data. Underlying the
    extensive body of techniques for mining data is a
    much smaller set of fundamental concepts
    comprising data science.
  • These concepts are general and encapsulate much
    of the essence of data mining and business
    analytics.

53
Summary
  • Success in todays data-oriented business
    environment requires being able to think about
    how these fundamental concepts apply to
    particular business problemsto think data
    analytically.
  • For example, in this chapter we discussed the
    principle that data should be
  • thought of as a business asset, and once we are
    thinking in this direction we start to ask
    whether (and how much) we should invest in data.
    Thus, an understanding of these fundamental
    concepts is important not only for data
    scientists themselves, but for anyone working
    with data scientists, employing data scientists,
    investing in data-heavy ventures, or directing
    the application of analytics in an organization.

54
Summary
  • There is convincing evidence that data-driven
    decision-making and big data technologies
    substantially improve business performance. Data
    science supports data-driven decision-makingand
    sometimes conducts such decision-making
    automaticallyand depends upon technologies for
    big data storage and engineering, but its
    principles are separate.
  • The data science principles we discuss in this
    book also differ from, and are complementary to,
    other important technologies, such as statistical
    hypothesis testing and database querying (which
    have their own books and classes).
  • The next chapter describes some of these
    differences in more detail.
Write a Comment
User Comments (0)
About PowerShow.com