Title: Data Science tutorial for beginner level to advanced level | Data Science projects
1Data Science tutorial for beginner level and
advanced level Data Science projects
2What is Data Science?
- Data Science is, additionally known as
data-driven science. - It is an interdisciplinary area approximately
about scientific methods, processes, and
structures to extract the data or insights from
statistics in diverse forms, structured or
unstructured, similar to data mining.
3Data Science Levels
- In choosing what to start with, the dataset has
been divided into 3 levels - 1. Beginner Level The newbie degree comprises of
knowledge sets that can be with no trouble
labored with and doesnt want any data set
technique that is problematic in nature. - 2. Intermediate level The intermediate level has
tougher data analytics initiatives which consist
of mid and big data units that require excellent
potential in pattern attention. - 3. Advanced Level The advanced degree is
suitable for those who have to comprehend in
evolved themes similar to deep studying,
recommender techniques and way more.
4Beginner Level Data Science Projects
- 1. Iris Data Set
- 2. Titanic Data Set
- 3. Boston Housing Data Set
- 4. Bigmart Sales Data Set
- 5. Loan Prediction Data Set
51. IRIS Data Set
- This is presumed to be the most versatile,
resourceful and easy dataset in pattern
recognition literature. Its data has only 150
rows and 4 columns.
62. Titanic Data Set
- This is a very versatile dataset in having so
many help guides and tutorials, in the global
data science community.
73. Boston Housing Data Set
- This data set is popularly used in pattern
recognition literature and originates from the
real estate industry in Boston, USA. Also a
regression problem, its data has 506 rows and 14
columns. It is a small data set giving the
opportunity to attempt any technique and not
worrying about any memory issue on computer.
84. Bigmart Sales Data Set
- One industry known to extensively use analytics
in optimizing business processes is retail.
Various tasks such as inventory management,
product placement, product building, customized
offers, etc. are properly carried out using data
science techniques. - The data comprises of 8523 rows and 12 variables.
95. Loan Prediction Data Set
- Insurance, among all industries, is known to have
largest use data science methods and analytics.
You are provided with enough information to work
on data sets of insurance companies, the
challenges to be faced, strategies to be used,
the variables that would influence the outcome,
and many others. It has a classification problem
with 615 rows and 13 columns.
10Intermediate Level Data Science Projects
- 1. Million Song Data Set
- 2. Movie Lens Data Set
- 3. Trip History Data Set
- 4. Census Income Data Set
- 5. Text Mining Data Set
111. Million Song Data Set
- You might not be aware of the fact analytics is
used in the entertainment industry as well. It is
a regression problem which consists 515345
observations and 90 variables. On the other hand,
it is just a tiny subset of its million song data
original database.
122. Movie Lens Data Set
- Movie Lens Data Set gives you the opportunity to
build a recommendation engine. If you arent
aware, it is known to be the most popular and
quoted data set in the data science industry. It
comes in different dimensions and has over a
million ratings from 6000 users on more than 4000
movies.
133. Trip History Data Set
- Coming from a bike sharing service in the US, it
requires you to utilize your skills in pro data
munging. It is a classification problem with each
file having 7 columns and it is provided
quarter-wise from 2010.
144. Census Income Data Set
- Census Income Data Set is a classic machine
learning problem and an imbalanced
classification. Machine learning is known to be
extensively used for solving imbalanced problems
like fraud detection, cancer detection, etc. This
dataset has 48842 rows and 14 columns.
155. Text Mining Data Set
- This data set is originally from Siam competition
2007. The dataset comprises of aviation safety
reports describing the problems which occurred in
certain flights. It is a multi-classification,
high dimensional problem. It has 21519 rows and
30438 columns.
16Advanced Level Data Science Projects
- 1. KDD 1999 Data Set
- 2. Chicago Crime Data Set
- 3. Yelp Data Set
- 4. Identify your Digits Data Set
- 5. Image-Net Data Set
171. KDD 1999 Data Set
- KDD originally brought the idea of the data
mining competition to the whole world. It has
been of very good use for a long time thereby
providing a very enriching experience. It poses a
classification kind of problem having 4M rows and
48 columns in a 1.2GB file.
182. Chicago Crime Data Set
- Data scientists nowadays are expected to handle
very large volumes of data sets because companies
no longer want to work on samples but use full
data. Such data set will give you the necessary
experience needed to handle such large datasets
on any local machines you use.
193. Yelp Data Set
- This data set is known to be a part of round 8 of
the Yelp Dataset Challenge comprising of almost
200,000 images, within 3 JSON files of 2GB.
204. Identify your Digits Data Set
- Study, analyze and recognize elements in images
from this dataset. It is very similar to how the
camera lens detects faces by making use of image
recognition. Build and test this technique known
to be a digit recognition problem. - It has 7000 images with 28 X 28 size making it
31MB sizing.
215. Image-Net Data Set
- This dataset offers various problems encompassing
localization, object detection, screen praising
and classification. With all its images freely
available, you can look for any kind of image and
create your project around it. For now, it has
14,197,122 images of various shapes with a size
of 140GB.
22Contact Us 1 732-593-8450
91 9989527180Mail Us info_at_iqtrainings.comww
w.iqonlinetraining.com