**WHAT IS R PROGRAMMING?**

R is a language and environment for statistical computing and graphics. It provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. It is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes:

- an effective data handling and storage facility,
- a suite of operators for calculations on arrays, in particular matrices,
- a large, coherent, integrated collection of intermediate tools for data analysis,
- graphical facilities for data analysis and display either on-screen or on hardcopy, and
- a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

**DIFFERENCE BETWEEN VECTOR, LIST, MATRIX AND DATAFRAME.**

A **vector** is a series of data elements of the same basic type. The members in the vector are known as a component.

The R object that contains elements of different types such as numbers, strings, vectors, or another list inside it, is known as **List**.

A two-dimensional data structure used to bind the vectors from the same length, known as the **matrix**. The matrix contains the same types of elements.

A **Data** frame is a generic form of a matrix. It is a combination of lists and matrices. In the Data frame, different data columns contain different data types.

**GIVE ANY 5 FEATURES OF R.**

5 features of R are:

- Simple and effective programming language.
- It is a data analysis software.
- It gives an effective storage facility and data handling.
- It gives high extensible graphical techniques.
- It is an interpreted language.

**WHAT ARE THE ADVANTAGES AND DISADVANTAGES OF R?**

Advantages of R are:

- Open Source
- Data Wrangling
- Array of Packages
- Platform Independent
- Machine Learning Operations

Disadvantages of R are:

- Weak origin
- Data Handling
- Basic Security
- Complicated Language
- Lesser Speed

**WHAT ARE THE DIFFERENT DATA STRUCTURES IN R? BRIEFLY EXPLAIN ABOUT THEM**.

**What are the steps to build and evaluate a linear regression model in R? **

When creating a linear regression model, the following successive actions must be taken:

- In order to develop the model on the train set and assess its performance on the test set, you must first divide the data into train and test sets.
- The “catools” package’s split() method. This function offers a split-ratio option that you can customise based on your requirements.
- You can now proceed to building the model on the training set once you have finished dividing the data into the training and test sets.
- A model is constructed using the “lm()” function.
- Finally you can predict the values on the test set, using the “predict()” function.
- The final step would be to find out the RMSE, the lower the RMSE value, the better the prediction.

**What is the confusion matrix in R?**

It is possible to assess the accuracy of the created model using a confusion matrix. A cross-tabulation of observed and anticipated classes is calculated. The “confusionmatrix()” method from the “caTools” package can be used to accomplish this.

**How would you write a custom function in R? Give an example.**

This is the syntax to write a custom function In R:

<object-name>=function(x){

—

—

—

}

Let’s look at an example to create a custom function in R ->

fun1<-function(x){ ifelse(x>5,100,0) }

z<-c(1,2,3,4,5,6,7,8,9,10)

fun1(z)->z

**What packages are used for data mining in R?**

Some packages used for data mining in R:

- data.table- provides fast reading of large files
- rpart and caret- for machine learning models.
- GGplot- provides various data visualisation plots.
- tm- to perform text mining.
- Forecast- provides functions for time series analysis

**HOW WOULD YOU MAKE MULTIPLE PLOTS ONTO A SINGLE PAGE IN R?**

Plotting multiple plots onto a single page using base graphs is quite easy:

For, example if you want to plot 4 graphs onto the same pane, you can use the below command:

par(mfrow=c(2,2))

This will result in:

**Given a vector of values, how would you convert it into a time series object?**

Let’s say this is our vector->

a<-c(1,3,5,7,9)

To convert this into a time series object->

as.ts(a)->a

Let’s plot this:

ts.plot(a)

**What is a White Noise model and how can you simulate it using R?**

A fundamental time series model is the white noise (WN) model. The simplest illustration of a stationary process is one example.

A white noise model includes:

- a continuous fixed mean
- a constant fixed variance
- No pattern across time

Simulating a white noise model in R:

arima.sim(model=list(order=c(0,0,0)),n=50)->wn

ts.plot(wn)

**What is a Random Walk model and how can you simulate it using R?**

A random walk is a simple example of a non-stationary process.

A random walk has:

- No specified mean or variance
- Strong dependence over time
- It’s changes or increments are white noise

Simulating random walk in R:

arima.sim(model=list(order=c(0,1,0)),n=50)->rw ts.plot(rw)

**GIVE THE COMMAND TO CREATE A HISTOGRAM AND TO REMOVE A VECTOR FROM THE R WORKSPACE.**

hist() is the command to create a histogram, where you can specify the details by typing hist(v,main,xlab,xlim,ylim,breaks,col,border).

- v is a vector containing numeric values used in histogram.
- main indicates the title of the chart.
- col is used to set the color of the bars.
- border is used to set the border color of each bar.
- xlab is used to give a description of x-axis.
- xlim is used to specify the range of values on the x-axis.
- ylim is used to specify the range of values on the y-axis.
- breaks is used to mention the width of each bar.

rm() is used to remove a vector from the R workspace.

**WHY DO WE USE APPLY() FUNCTION IN R?**

This is used to apply the same function to each of the elements in an Array. For example, finding the mean of the rows in every row.

**HOW DO YOU CREATE A VECTOR IN R?**

To create a vector in R, you have to use the <- symbol to assign a name to a vector. For example if you want to store the values 4 5 8 14 as a vector in x, you will have to type the command: x<-c(4,5,8,14)

**EXPLAIN THE DIFFERENT FUNCTIONS THAT CAN BE APPLIED FOR NORMAL DISTRIBUTION IN R.**

The different functions that can be applied for normal distribution in R are as follows:

dnorm(x, mean, sd)

pnorm(x, mean, sd)

qnorm(p, mean, sd)

rnorm(n, mean, sd)

Following is the description of the parameters used in above functions −

- x is a vector of numbers.
- p is a vector of probabilities.
- n is the number of observations(sample size).
- mean is the mean value of the sample data. Its default value is zero.
- sd is the standard deviation. Its default value is 1.

**EXPLAIN THE DIFFERENT FUNCTIONS THAT CAN BE APPLIED FOR BINOMIAL DISTRIBUTION IN R.**

The different functions that can be applied for Binomial distribution in R are as follows:

dbinom(x, size, prob)

pbinom(x, size, prob)

qbinom(p, size, prob)

rbinom(n, size, prob)

Following is the description of the parameters used −

- x is a vector of numbers.
- p is a vector of probabilities.
- n is the number of observations.
- size is the number of trials.
- prob is the probability of success of each trial.

**WHAT IS THE MAIN DIFFERENCE BETWEEN AN ARRAY AND A MATRIX?**

A matrix is always two-dimensional as it has only rows and columns. But an array can be of any number of dimensions and each dimension is a matrix. For example, a 332 array represents 2 matrices each of dimension 33.

**HOW CAN YOU LOAD AND USE A CSV FILE IN R?**

A CSV file can be loaded using the read.csv function. R creates a data frame on reading the CSV files using this function.

**HOW DO YOU GET THE NAME OF THE CURRENT WORKING DIRECTORY IN R?**

The command getwd() gives the name of the current working directory in R.

**HOW DO YOU INSTALL A PACKAGE IN R?**

To install a package in R, you need to give the following command:

install.packages(“package name”)

**WHAT IS THE OUTPUT OF RUNIF(6)?**

runif(6) generates 6 random numbers from a uniform distribution between 0 and 1.

**GIVE THE R COMMAND TO GET THE PROBABILITY OF GETTING 26 OR LESS HEADS FROM 51 TOSSES OF A COIN USING PBINOM.**

The R command to get the probability of getting 26 or less heads from a 51 tosses of a coin using pbinom is:

x<-pbinom(26,51,0.5)

print(x)

The first command obtains the required probability and stores the value in x. The second command, ie., print(x) prints or shows the value of x.

**GIVE THE COMMANDS TO OBTAIN THE MEAN, MEDIAN AND MODE OF A DATASET.**

The command for obtaining the mean of a dataset is: mean(…)

The command for obtaining the median of a dataset is: median(…)

The command for obtaining the mode of a dataset is: mode(…)

**How are R commands written?**

By using # at the starting of the line of code like #division commands are written.

**What is t-tests() in R?**

It is used to determine if the means of two groups are equal or not by using the t.test() function.

**What is the use of subset() and sample() functions in R?**

Subset() is used to select the variables and observations and sample() function is used to generate a random sample of the size n from a dataset.

**How can you produce co-relations and covariances?**

Cor-relations are produced by cor() and covariances are produced by cov() function.

**What is the workspace in R?**

Workspace is the current R working environment which includes any user defined objects like vectors, lists etc.

**What is the fitdistr() function?**

It is used to provide the maximum likelihood fitting of univariate distributions. It is defined under the MASS package.

**Why is the library() function used?**

This function is used to show the packages which are installed.

**On which type of data binary operators are worked?**

Binary operators work on matrices, vectors and scalars.

**Which function is used to create a frequency table?**

Frequency table is created by the table() function.

**How Can You Identify the Data Type of an Object?**

Using the functions class() or typeof(), you can identify the data type of an object in R. The class() function returns the actual data type, whereas typeof() returns a more detailed idea of the type of data.