Beaconstac Analytics PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Beaconstac Analytics


1
Big Data and Internet of things(IOT)
2
Project Morpheus (Beaconstac Analytics)
May 2015
Garima Batra Core Platform Engineer MobStac
3
A quick intro about Beaconstac
1
  • Beaconstac is a proximity marketing and analytics
    platform for beacons
  • Several beacon specific events are defined to aid
    proximity marketing
  • The events include Camp on event, beacon exit
    event, region enter, region exit etc.
  • Beaconstac analytics platform makes it easy for
    managers/marketers/developers to analyze event
    data
  • Components include Beaconstac iOS/Android sdk,
    beaconstac portal


4
Why Hadoop?
1
  • Collect event logs generated from Beaconstac SDK
    usage
  • Needed a system to answer queries like
  • Heat map of beacons by the number of visits
    received in a specified time interval.
  • Heat map of beacons by the amount of time spent
    in a specified time interval.
  • Average time spent by users near different
    beacons
  • Last seen per user
  • Last seen per beacon
  • Analyzing data with custom attributes filters
  • Traversed path in an area by individual users


5
Leveraging Amazon's EMR for Beaconstac Analytics
1
  • Amazon's Streaming API for writing mapper and
    reducer functions in Python
  • Input - Copy programs to Amazon S3
  • Output Copy the processed/output data to S3
  • Initial tests were run using Amazon's EMR
    console. Here you can define the following -
  • Cluster configuration Name, Termination
    protection, Logging, logs location on S3 etc.
  • Software configuration Hadoop AMI version,
    applications to be installed on startup etc.
  • Hardware configuration Types of nodes master,
    Core and Task
  • Security keys, allowed users
  • Bootstrap actions Configure Hadoop, Custom
    actions etc.
  • Steps Streaming program, Hive program, Pig
    program


6
Integrating EMR in production
1

7
Batch processing for Morpheus
1
AWS Data pipeline

8
Deep dive into EMR startup and job submission
1

9
How Does AWS Data Pipeline Work?
1
  • Pipeline definition - specifies the business
    logic of your data management
  • AWS Data pipeline web service - interprets the
    pipeline definition and assigns tasks to workers
    to move and transform data.
  • Task runner - polls the AWS Data Pipeline web
    service for tasks and then performs those tasks.


10
Morpheus version of Data pipeline
1
Copy the output to Elastic Search
Run EMR jobs
Copy logs from Kafka to S3
  • Runs every hour
  • Requires a Kafka consumer script
  • Runs once every day
  • Processes each job and produces output
  • Each job comprises of mapper and reducer scripts
  • Runs once every day
  • Inserts output in Elastic search


11
Settings file in each job
1
1
Questions??
Source Lorem Ipsum
Write a Comment
User Comments (0)
About PowerShow.com