Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Summary stats

We will now cover some basic measures of central tendency, dispersion, and simple plots. The first question that we will address is how R handles the missing values in calculations? To see what happens, create a vector with a missing value (NA in the R language), then sum the values of the vector with sum():

> a = c(1,2,3,NA)

> sum(a)
[1] NA

Unlike SAS, which would sum the non-missing values, R does not sum the non-missing values but simply returns that at least one value is missing. Now, we could create a new vector with the missing value deleted but you can also include the syntax to exclude any missing values with na.rm=TRUE:

> sum(a, na.rm=TRUE)
[1] 6

Functions exist to identify the measures of central tendency and dispersion of a vector:

> data = c(4,3,2,5.5,7.8,9,14,20)

> mean(data)
[1] 8.1625

> median(data)
[1] 6.65

> sd(data)
[1] 6.142112

> max(data)
[1] 20

> min(data)
[1] 2

> range(data)
[1]  2 20

> quantile(data)
   0%   25%   50%   75%  100% 
 2.00  3.75  6.65 10.25 20.00

A summary() function is available that includes the mean, median, and quartile values:

> summary(data)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.000   3.750   6.650   8.162  10.250  20.000

We can use plots to visualize the data. The base plot here will be barplot, then we will use abline() to include the mean and median. As the default line is solid, we will create a dotted line for median with lty=2 to distinguish it from mean:

> barplot(data)

> abline(h=mean(data))

> abline(h=median(data), lty=2)

The output of the preceding command is as follows:

A number of functions are available to generate different data distributions. Here, we can look at one such function for a normal distribution with a mean of zero and standard deviation of one using rnorm() to create 100 data points. We will then plot the values and also plot a histogram. Additionally, to duplicate the results, ensure that you use the same random seed with set.seed():

> set.seed(1)

> norm = rnorm(100)

This is the plot of the 100 data points:

> plot(norm)

The output of the preceding command is as follows:

Finally, produce a histogram with hist(norm):

> hist(norm)

The following is the output of the preceding command:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Summary stats

Create new playlist

Sign In

Sign Up

Summary stats

Table of Contents for
Summary stats