Data Science tutorial for beginner level to advanced level | Data Science projects - PowerPoint PPT Presentation

About This Presentation
Title:

Data Science tutorial for beginner level to advanced level | Data Science projects

Description:

This is a complete tutorial to learn data science from beginner level to advanced level. Know about the projects that are deployed at each and every level. These are some of the examples of data set and why you should take them. – PowerPoint PPT presentation

Number of Views:520
Slides: 23
Provided by: Username withheld or not provided

less

Transcript and Presenter's Notes

Title: Data Science tutorial for beginner level to advanced level | Data Science projects


1
Data Science tutorial for beginner level and
advanced level Data Science projects
  • - IQ ONLINE TRAINING

2
What is Data Science?
  • Data Science is, additionally known as
    data-driven science.
  • It is an interdisciplinary area approximately
    about scientific methods, processes, and
    structures to extract the data or insights from
    statistics in diverse forms, structured or
    unstructured, similar to data mining.

3
Data Science Levels
  • In choosing what to start with, the dataset has
    been divided into 3 levels
  • 1. Beginner Level The newbie degree comprises of
    knowledge sets that can be with no trouble
    labored with and doesnt want any data set
    technique that is problematic in nature.
  • 2. Intermediate level The intermediate level has
    tougher data analytics initiatives which consist
    of mid and big data units that require excellent
    potential in pattern attention.
  • 3. Advanced Level The advanced degree is
    suitable for those who have to comprehend in
    evolved themes similar to deep studying,
    recommender techniques and way more.

4
Beginner Level Data Science Projects
  • 1. Iris Data Set
  • 2. Titanic Data Set
  • 3. Boston Housing Data Set
  • 4. Bigmart Sales Data Set
  • 5. Loan Prediction Data Set

5
1. IRIS Data Set
  • This is presumed to be the most versatile,
    resourceful and easy dataset in pattern
    recognition literature. Its data has only 150
    rows and 4 columns.

6
2. Titanic Data Set
  • This is a very versatile dataset in having so
    many help guides and tutorials, in the global
    data science community.

7
3. Boston Housing Data Set
  • This data set is popularly used in pattern
    recognition literature and originates from the
    real estate industry in Boston, USA. Also a
    regression problem, its data has 506 rows and 14
    columns. It is a small data set giving the
    opportunity to attempt any technique and not
    worrying about any memory issue on computer.

8
4. Bigmart Sales Data Set
  • One industry known to extensively use analytics
    in optimizing business processes is retail.
    Various tasks such as inventory management,
    product placement, product building, customized
    offers, etc. are properly carried out using data
    science techniques.
  • The data comprises of 8523 rows and 12 variables.

9
5. Loan Prediction Data Set
  • Insurance, among all industries, is known to have
    largest use data science methods and analytics.
    You are provided with enough information to work
    on data sets of insurance companies, the
    challenges to be faced, strategies to be used,
    the variables that would influence the outcome,
    and many others. It has a classification problem
    with 615 rows and 13 columns.

10
Intermediate Level Data Science Projects
  • 1. Million Song Data Set
  • 2. Movie Lens Data Set
  • 3. Trip History Data Set
  • 4. Census Income Data Set
  • 5. Text Mining Data Set

11
1. Million Song Data Set
  • You might not be aware of the fact analytics is
    used in the entertainment industry as well. It is
    a regression problem which consists 515345
    observations and 90 variables. On the other hand,
    it is just a tiny subset of its million song data
    original database.

12
2. Movie Lens Data Set
  • Movie Lens Data Set gives you the opportunity to
    build a recommendation engine. If you arent
    aware, it is known to be the most popular and
    quoted data set in the data science industry. It
    comes in different dimensions and has over a
    million ratings from 6000 users on more than 4000
    movies.

13
3. Trip History Data Set
  • Coming from a bike sharing service in the US, it
    requires you to utilize your skills in pro data
    munging. It is a classification problem with each
    file having 7 columns and it is provided
    quarter-wise from 2010.

14
4. Census Income Data Set
  • Census Income Data Set is a classic machine
    learning problem and an imbalanced
    classification. Machine learning is known to be
    extensively used for solving imbalanced problems
    like fraud detection, cancer detection, etc. This
    dataset has 48842 rows and 14 columns.

15
5. Text Mining Data Set
  • This data set is originally from Siam competition
    2007. The dataset comprises of aviation safety
    reports describing the problems which occurred in
    certain flights. It is a multi-classification,
    high dimensional problem. It has 21519 rows and
    30438 columns.

16
Advanced Level Data Science Projects
  • 1. KDD 1999 Data Set
  • 2. Chicago Crime Data Set
  • 3. Yelp Data Set
  • 4. Identify your Digits Data Set
  • 5. Image-Net Data Set

17
1. KDD 1999 Data Set
  • KDD originally brought the idea of the data
    mining competition to the whole world. It has
    been of very good use for a long time thereby
    providing a very enriching experience. It poses a
    classification kind of problem having 4M rows and
    48 columns in a 1.2GB file.

18
2. Chicago Crime Data Set
  • Data scientists nowadays are expected to handle
    very large volumes of data sets because companies
    no longer want to work on samples but use full
    data. Such data set will give you the necessary
    experience needed to handle such large datasets
    on any local machines you use.

19
3. Yelp Data Set
  • This data set is known to be a part of round 8 of
    the Yelp Dataset Challenge comprising of almost
    200,000 images, within 3 JSON files of 2GB.

20
4. Identify your Digits Data Set
  • Study, analyze and recognize elements in images
    from this dataset. It is very similar to how the
    camera lens detects faces by making use of image
    recognition. Build and test this technique known
    to be a digit recognition problem.
  • It has 7000 images with 28 X 28 size making it
    31MB sizing.

21
5. Image-Net Data Set
  • This dataset offers various problems encompassing
    localization, object detection, screen praising
    and classification. With all its images freely
    available, you can look for any kind of image and
    create your project around it. For now, it has
    14,197,122 images of various shapes with a size
    of 140GB.

22
Contact Us 1 732-593-8450
91 9989527180Mail Us info_at_iqtrainings.comww
w.iqonlinetraining.com
Write a Comment
User Comments (0)
About PowerShow.com