Title: Basics of R programming (1)
1R PROGRAMMING
- ARAVALI COLLEGE OF ENGINEERING MANAGEMENT
- (ACEM, FARIDABAD)
2Program Name B.TECH CSESemester VIICourse
Name R PROGRAMMINGCourse Code
OEC-CS-701(III)Faculty Name RASHIKA
SINGHDesignation ASSISTANT PROFESSORDepartment
__CSE_____
3UNIT No.1
- INTRODUCTION
- Getting R
- R Version
- 32-bit versus 64-bit
- The R Environment
- Command Line Interface
- RStudio
- Revolution Analytics
- Learning Outcome
- Familiarize themselves with R and the RStudio IDE
4R PROGRAMMING
- R is a software environment which is used to
analyze statistical information and graphical
representation. - R allows us to do modular programming using
functions. - This programming language was named R, based on
the first name letter of the two authors (Robert
Gentleman and Ross Ihaka).
5- "R is an interpreted computer programming
language which was created by Ross Ihaka and
Robert Gentleman at the University of Auckland,
New Zealand." - The R Development Core Team currently develops R.
It is also a software environment used to
analyze statistical information, graphical
representation, reporting, and data modeling. - R is the implementation of the S
programming language, which is combined
with lexical scoping semantics. - R not only allows us to do branching and looping
but also allows to do modular programming using
functions. R allows integration with the
procedures written in the C, C, .Net, Python,
and FORTRAN languages to improve efficiency. - In the present era, R is one of the most
important tool which is used by researchers, data
analyst, statisticians, and marketers for
retrieving, cleaning, analyzing, visualizing, and
presenting data.
6History of R Programming
- The history of R goes back about 20-30 years ago.
R was developed by Ross lhaka and Robert
Gentleman in the University of Auckland, New
Zealand, and the R Development Core Team
currently develops it. - This programming language name is taken from the
name of both the developers. - The first project was considered in 1992. The
initial version was released in 1995, and in
2000, a stable beta version was released.
7(No Transcript)
8The following table shows the release date,
version, and description of R language
Version-Release Date Description
0.49 1997-04-23 First time R's source was released, and CRAN (Comprehensive R Archive Network) was started.
0.60 1997-12-05 R officially gets the GNU license.
0.65.1 1999-10-07 update.packages and install.packages both are included.
1.0 2000-02-29 The first production-ready version was released.
91.4 2001-12-19 First version for Mac OS is made available.
2.0 2004-10-04 The first version for Mac OS is made available.
2.1 2005-04-18 Add support for UTF-8encoding, internationalization, localization etc.
2.11 2010-04-22 Add support for Windows 64-bit systems.
2.13 2011-04-14 Added a function that rapidly converts code to byte code.
102.14 2011-10-31 Added some new packages.
2.15 2012-03-30 Improved serialization speed for long vectors.
3.0 2013-04-03 Support for larger numeric values on 64-bit systems.
3.4 2017-04-21 The just-in-time compilation (JIT) is enabled by default.
3.5 2018-04-23 Added new features such as compact internal representation of integer sequences, serialization format etc.
11Features of R programming
- It is a simple and effective programming language
which has been well developed. - It is data analysis software.
- It is a well-designed, easy, and effective
language which has the concepts of user-defined,
looping, conditional, and various I/O facilities. - It has a consistent and incorporated set of tools
which are used for data analysis. - For different types of calculation on arrays,
lists and vectors, R contains a suite of
operators. - It provides effective data handling and storage
facility. - It is an open-source, powerful, and highly
extensible software. - It provides highly extensible graphical
techniques. - It allows us to perform multiple calculations
using vectors. - R is an interpreted language.
12Why use R Programming?
13- R is important for Data Science?
- R plays a very important role in Data Science,
you will be benefited with following operations
in R. - You can run your code without any compiler R is
an interpreted language. Hence we can run code
without any compiler. R interprets the code and
makes the development of code easier. - Many calculations done with vectors R is a
vector language, so anyone can add functions to a
single Vector without putting in a loop. Hence, R
is powerful and faster than other languages. - Statistical Language R used in biology,
genetics as well as in statistics. R is a turning
complete language where any type of task can
perform. - 2. R is Good for Business?
- R will just not help you in the technical fields,
it will also be a great help in your business. - Here, the major reason is that R is open-source,
therefore it can be modified and redistributed as
per the users need. It is great for
visualization and has far more capabilities as
compared to other tools. - For data-driven businesses, lack of Data
Scientists is a huge concern. Companies are using
R programming as their core platform and are
recruiting trained R programmers.
14- 3. R is a gateway to Lucrative Career
- R language is used extensively in Data Science.
This field offers some of the - highest-paying jobs in the world today.
- 4. Open-source
- R is an open-source language. It is maintained
by a community of active users and - you can avail R for free. You can modify various
functions in R and make your own - packages. Since R is issued under the General
Public Licence (GNU), there are no - restrictions on its usage.
- 5. Popularity
- R has become one of the most popular programming
languages in the industries. - Conventionally, R was mostly used in academia but
with the emergence of Data - Science, the need for R in the industries became
evident. R is used at Facebook for social - network analysis. It is being used at Twitter for
semantic analysis as well as visualizations. - 6. Robust Visualization Library
- R comprises of libraries like ggplot2,
plotly that offer aesthetic graphical plots to
its - users. R is most widely recognized for its
stunning visualizations which gives it an - edge over other Data Science programming
languages.
157. With R, you can develop amazing Web-Apps R
provides you with the ability to build aesthetic
web-applications. Using the R Shiny package, you
can develop interactive dashboards straight from
the console of your R IDE. Using this, you can
embed your visualizations and enhance the
storytelling of your data analysis through
aesthetic visualizations. 8. A go-to language
for Statistics and Data Science R is the
standard language for Statistics and Data
Science. R was developed for statistics, by
statisticians. It has been in use even before the
word Data Science was coined. Statisticians
and Data Scientists are most familiar with R
than any other programming language. R
facilitates various statistical operations
through its thousands of packages. 9. R is being
used in almost every industry R is one of the
most widely used programming languages in the
world today. It is used in almost every industry,
ranging from finance, banking to medicine and
manufacturing. R is used for portfolio
management, risk analytics in finance and banking
industries. It is used for carrying out an
analysis of drug discovery and genomic analysis
in bioinformatics. R is also used to implement
various statistical measures to optimize
industrial processes.
16Pros and Cons of R Programming Language
17PROS/ADVATAGES OF R PROGRAMMING
- 1. Open Source
- R is an open-source programming language. This
means that anyone can work with R without any
need for a license or a fee. Furthermore, you can
contribute towards the development of R
by customizing its packages, developing new ones
and resolving issues. - 2. Exemplary Support for Data Wrangling
- R provides exemplary support for data wrangling.
The packages like dplyr, readr are capable of
transforming messy data into a structured form. - 3. The Array of Packages
- R has a vast array of packages. With over 10,000
packages in the CRAN repository, the number is
constantly growing. These packages appeal to all
the areas of industry. - 4. Quality Plotting and Graphing
- R facilitates quality plotting and graphing. The
popular libraries like ggplot2 and plotly advocate
for aesthetic and visually appealing graphs that
set R apart from other programming languages. - 5. Highly Compatible
- R is highly compatible and can be paired with
many other programming languages like C, C,
Java, and Python. It can also be integrated with
technologies like Hadoop and various other
database management systems as well.
18- 6. Platform Independent
- R is a platform-independent language. It is a
cross-platform programming language, meaning that
it can be run quite easily on Windows, Linux, and
Mac. - 7. Eye-Catching Reports
- With packages like Shiny and Markdown, reporting
the results of an analysis is extremely easy with
R. You can make reports with the data, plots and
R scripts embedded in them. You can even make
interactive web apps that allow the user to play
with the results and the data. - 8. Machine Learning Operations
- R provides various facilities for carrying out
machine learning operations like classification,
regression and also provides features for
developing artificial neural networks. - 9. Statistics
- R is prominently known as the lingua franca of
statistics. This is the main reason as to why R
is dominant among other programming languages for
developing statistical tools. - 10. Continuously Growing
- R is a constantly evolving programming language.
It is a state of the art technology that provides
updates whenever any new feature is added.
19Disadvantages of R Programming
- 1. Weak Origin
- R shares its origin with a much older programming
language S. This means that its base package
does not have support for dynamic or 3D graphics.
With common packages of R like Ggplot2 and
Plotly, it is possible to create dynamic, 3D as
well as animated graphics. - 2. Data Handling
- In R, the physical memory stores the objects.
This is in contrast to other languages like
Python. Furthermore, R utilizes more memory as
compared with Python. Also, R requires the entire
data in one single place, that is, in the memory.
Therefore, it is not an ideal option when dealing
with Big Data. However, with data management
packages and integration with Hadoop possible,
this is easily covered. - 3. Basic Security
- R lacks basic security. This feature is an
essential part of most programming languages like
Python. Because of this, there are several
restrictions with R as it cannot be embedded into
a web-application.
20- 4. Complicated Language
- R is not an easy language to learn. It has a
steep learning curve. Due to this, people who do
not have prior programming experience may find it
difficult to learn R. - 5. Lesser Speed
- R packages and the R programming language is much
slower than other languages like MATLAB and Python
. - 6. Spread Across various Packages
- The algorithms in R are spread across different
packages. Programmers without prior knowledge of
packages may find it difficult to implement
algorithms.
21R VERSIONS
- RStudio Server enables users and administrators
to have very fine-grained control over which
versions of R are used in various contexts.
Capabilities include - Administrators can install several versions of R
and specify a global default version as well as
per-user or per-group default versions. - Users can switch between any of the available
versions of R as they like. - Users can specify that individual R projects
remember their last version of R and always use
that version until explicitly migrated to a new
version.
22- On Windows, RStudio uses the system's current
version of R by default. When R is installed on
Windows it writes the version being installed to
the Registry as the "current" version of R (the
specific registry keys written are
described here). This is the version of R which
RStudio runs against by default. - You can override which version of R is used via
General panel of the RStudio Options dialog. This
dialog allows you to specify that RStudio should
always bind to the default 32 or 64-bit version
of R, or to specify a different version
altogether
23Note that by holding down the Control key during
the launch of RStudio you can cause the R version
selection dialog to display at startup.
2432-BIT VERSUS 64-BIT
- we have two R binaries
- /APPS/32/bin/R /APPS/64/bin/R If you are on 32
bit, then you automatically get the first when
you just say R. If you are on 64 bit, then you
automatically get the second when you just say R. - On 64 bit you can run the first, if you so
desire, but you must invoke it using the full
path name /APPS/32/bin/R. It will work in
emulation mode. - If you never load your own C code into R
with dyn.load(sharedlibraryname) and never use
libraries other than those that those installed
by the system administrators and you load
withlibrary(packagename), then you should have no
problems. Just say R with no additional fuss to
invoke R, and it will work. - Otherwise, you will have to be aware of What
architecture you are running on.What architecture
the R binary you are r
25APPLICATIONS OF R
- 1. Finance
- Data Science is most widely used in the financial
industry. - R is the most popular tool for this role. This is
because R provides an advanced statistical suite
that is able to carry out all the
necessary financial tasks. - With the help of R, financial institutions are
able to perform downside risk measurement, adjust
risk performance and utilize visualizations
like candlestick charts, density plots, drawdown
plots, etc. - 2. Banking
- Just like financial institutions, banking
industries make use of R for credit risk modeling
and other forms of risk analytics. - Bank of America makes use of R for financial
reporting. With the help of R, the data
scientists at BOA are able to analyze financial
losses and make use of Rs visualization tools.
26- 3. Healthcare
- Genetics, Bioinformatics, Drug Discovery,
Epidemiology are some of the fields in healthcare
that make heavy usage of R. With the help of R,
these companies are able to crunch data and
process information, providing an essential
backdrop for further analysis and data
processing. - 4. Social Media
- For many beginners in Data Science and R, social
media is a data playground. Sentiment Analysis
and other forms of social media data mining are
some of the important statistical tools that are
used with R. - Social Media is also a challenging field for Data
Science because the data prevalent on social
media websites is mostly unstructured in nature.
R is used for social media analytics, for
segmenting potential customers and targeting them
for selling your products.
27- 5. E-Commerce
- The e-commerce industry is one of the most
important sectors that utilize Data Science. R is
one of the standard tools that is being used in
e-commerce. - Since these internet-based companies have to deal
with various forms of data, structured and
unstructured, as well as from varying data
sources like spreadsheets and databases (SQL
NoSQL), R proves to be an effective choice for
these industries. - 6. Manufacturing
- Manufacturing companies like Ford, Modelez, and
John Deere use R to analyze customer sentiment.
This helps them optimize their product according
to trending consumer interests and also to match
their production volume to varying market demand.
They also use R to minimize their production
costs and maximize profits.
28Real-Life Use Cases of R Language
- Facebook Facebook uses R to update status and
its social network graph. It is also used for
predicting colleague interactions with R. - Ford Motor Company Ford relies on Hadoop. It
also relies on R for statistical analysis as well
as carrying out data-driven support for decision
making. - Google Google uses R to calculate ROI on
advertising campaigns and to predict economic
activity and also to improve the efficiency of
online advertising. - Foursquare R is an important stack behind
Foursquares famed recommendation engine. - John Deere Statisticians at John Deere use R
for time series modeling and also geospatial
analysis in a reliable and reproducible way. The
results are then integrated with Excel and SAP. - Microsoft Microsoft uses R for the Xbox
matchmaking service and also as a statistical
engine within the Azure ML framework. - Mozilla It is the foundation behind the Firefox
web browser and uses R to visualize web activity.
29FUTURE SCOPE OF R
30- 1. Data Scientist
- The profession of Data Scientist is the most
demanding job role. A Data Scientist is supposed
to extract data, transform it into a structured
format, perform analysis and forecast future
insights. For this purpose, R is the most ideal
tool as it provides efficient data handling
capability as well as a robust set of analysis
and machine learning tools. - 2. Business Analyst
- A Business Analyst has to develop solutions that
are technical in nature for the various business
problems. They are required to seek solutions,
advance the efforts of the company as well as
fulfill the requirements of the business. For
this purpose, R provides various business
intelligence tools through its extensive
packages.
31- 3. Data Analyst
- A Data Analyst is responsible for extracting and
analyzing data. This task requires extensive
usage of Rs statistical libraries to deliver
accurate results so that the companies can make
careful data-driven decisions. - 4. Data Visualization Expert
- R is most popular for its visualization
libraries. Due to this reason, Data Visualization
experts in R programming are in-demand in the
industries. The various packages of R
likeggplot2, plotly, etc provide visually
appealing graphs and plots to their users. - 5. Quantitative Analyst
- Quantitative Analysts are engaged in the
financial and banking industries. These
industries have to deal with all types of data
and R provides an ideal solution to their various
data problems.
32Aravali College of Engineering And
Management Jasana, Tigaon Road, Neharpar,
Faridabad, Delhi NCR Toll Free Number 91-
8527538785 Website www.acem.edu.in