Big Data Technologies - Machine Learning Online Course - SimpliDistance - PowerPoint PPT Presentation

About This Presentation
Title:

Big Data Technologies - Machine Learning Online Course - SimpliDistance

Description:

Big data technologies are designed to handle challenges. Tools used in Hadoop eco-systems like Map Reduce, PIG, HIVE, HBase, Sqoop, Spark, Storm etc. are popular for big data processing. There are NO-SQL database like MongoDB are also used. Distance MBA in AI and Machine Learning Online Course would cover some of these technologies along with analytics. SimpliDistance helps you to suggest the best online courses. – PowerPoint PPT presentation

Number of Views:46

less

Transcript and Presenter's Notes

Title: Big Data Technologies - Machine Learning Online Course - SimpliDistance


1
Distance MBA
  • SimpliDistance

2
Why Care for Big Data Distance MBA in AI and ML
- SimpliDistance
  • We live in the connected world where more than
    3.7 billion humans are connected to internet. We
    use many digital gadgets daily. Communicate with
    our friends and the world by posting messages,
    likes, forwards on Facebook, twitter, through
    emails, mobile apps like Whatsapp and so on.
    Millennial are recording every event of their
    life using photos and videos of what food they
    are eating, which movie they are watching, on
    which airport they are waiting for the flight
    etc.

3
  • What we are doing is generating humongous amount
    of data with our interactions.  Internet search
    engines are inseparable part of our daily life.
    Google processes 3.5 billion searches a day!

4
  • Why should we care about this? The challenges to
    handle such large amount of data seem to be very
    daunting. Does it even makes sense to attempt to
    embark on such an arduous task? Let us assume
    that we are interested in knowing how a
    particular disease is spreading across a country.
    This as it can cause epidemic. We are interested
    in this because we can take some action to
    prevent the outburst. As a part of the solution,
    one may try to communicate with all the hospitals
    and private medical practitioners to get some
    head start on the information. With this data,
    aid can be provided to crisis hit areas. However,
    this would be a huge effort and would also be
    time consuming.

5
  • Another way to look at this issue is with the
    help of search engines. Observe which search
    queries are related to the disease or medicines
    are being seen on the search engine. How many of
    them are as recent as one week or 15 days.
    Analysis of such data would be very useful in
    tracking how such disease is spreading. This will
    be very quick, almost real time. Hence it makes
    lot of sense to process such data.

6
3 Vs of Big Data
  • Let us see what characteristics the data need to
    satisfy so that it can be called as big data.  At
    start, typically three qualities volume, velocity
    and variety, popularly called as 3 Vs, were used
    to qualify anything as big data.

7
Volume
  • Very large volume is the first characteristics of
    big data. Here are some sample statistics about
    number of users which produce large volume of
    data everyday using comments, posts, photos,
    videos, likes etc on different platforms on
    social media. Here are examples of large volumes.
  • Facebook 2000 million users.
  • Google 111 million users.
  • Instagram 1000 million users. etc.

8
Velocity
  • Velocity refers to the speed with which data is
    generated from human interactions with social
    media, mobile apps, websites etc.
  • It is very peculiar characteristics of big data.
    One should be able to handle this velocity to get
    insights and use it for the competitive
    advantage.

9
Variety
  • Variety refers to different types of data. We
    generate different types of data like text files,
    PDFs, excel sheets, emails, photos, databases,
    videos  and data generated by sensors,. This
    includes structured data like database records
    and unstructured data like comments, likes etc.
    The unstructured data cannot retrieved using
    structured Query Language (SQL). Hence, a
    different type of databases known as NO-SQL
    databases are used to handle such data. MongoDB,
    CouchDB are example of such databases.

10
Veracity, Validity, Volatility
  • Dictionary meaning of word Veracity is
    conformity to facts or accuracy. In data
    processing the bigger challenge in cleanliness of
    the data. As data is gathered from multiple
    sources, chances of getting lot of noise or bad
    data are very high. If you use this so called
    dirty data without cleaning it, predictions and
    analysis would not be useful or may be sometimes
    grossly wrong.  Accuracy or cleanliness of data
    refers to Veracity. Along with veracity, validity
    of data in the context is also equally important.

11
  • Volatility refers to how long the data would be
    valid in the business context. For example,
    comments of Facebook about a movie being released
    recently may not have longer impact. User
    sentiments may keep on changing every week.
    Processing the data quickly and taking corrective
    actions faster is required.

12
Volume
  • It makes sense to process this big data only if
    it has value. Extracting value becomes difficult
    due to velocity and volatility.
  • Big Data Technologies are designed to handle
    these challenges. Tools used in Hadoop
    eco-systems like Map Reduce, PIG, HIVE, HBase,
    Sqoop, Spark, Storm etc. are popular for big data
    processing. 

13
  • There are NO-SQL database like MongoDB are also
    used. Distance MBA in AI and Machine Learning
    Online Course would cover some of these
    technologies along with analytics. After
    completion of the distance MBA program, in a data
    science job role, you would use technologies and
    overcome these challenges.

14
Conclusion
  • SimpliDistance helps to visitors in career
    mapping personalized guidance experience with
    the use of technology suggest suitable courses
    to upgrade their knowledge/skills to grow in
    their career.
  • SimpliDistance is the best Distance Learning
    Portal in India.

15
Thank You
Write a Comment
User Comments (0)
About PowerShow.com