What You’ll Learn in This Hour:
How to write and use a simple R function
How to return objects from a function
How to control flow through a function
So far in this book you have seen many functions being used. For example, in the earlier hour on single-mode data structures you saw that you could create vectors using functions such as c
, seq
, and rep
. One of the strengths of R is that you can extend it by writing your own functions. This allows you to create utilities that can perform a variety of tasks. In this hour, we look at ways to create our own functions, specify inputs, and return results to the user. We also introduce the “if/else” structure in R, and we use this
to control the flow of code within a function.
You have seen that functions in R allow you to perform a number of tasks in a simple command. This approach has parallels in most programmable languages, such as “macros” in Visual Basic and SAS.
Creating your own functions is a powerful aspect of R that allows you to “wrap up” a series of steps into a simple container. This way, you can capture common workflows and utilities and call them when needed instead of producing long, verbose scripts of repeated code snippets that can be difficult to manage.
Before we write our own functions, let’s take a closer look at the structure of an existing R function. Consider, for example, the upper.tri
function, which allows us to identify values in the upper triangle of a matrix:
> myMat # A sample matrix
[,1] [,2] [,3]
[1,] 1 6 3
[2,] 1 3 8
[3,] 5 4 1
> upper.tri(myMat) # Upper triangle
[,1] [,2] [,3]
[1,] FALSE TRUE TRUE
[2,] FALSE FALSE TRUE
[3,] FALSE FALSE FALSE
> myMat [ upper.tri(myMat) ] # Values from upper triangle
[1] 6 3 8
As seen here, we can call the upper.tri
function using round brackets, specifying the matrix as the first input. However, if we simply print the upper.tri
function, we can see its contents:
> upper.tri # Print the upper.tri function
function (x, diag = FALSE)
{
x <- as.matrix(x)
if (diag) row(x) <= col(x)
else row(x) < col(x)
}
The function is split into two parts:
The top part defines the inputs to the function (in this case, the inputs are x
and diag
).
The next part, captured within curly brackets, contains the main “body” of the function.
In a similar way, we can create our own functions by specifying a function name, defining the function inputs, and specifying the actions we wish to take in the function body.
We can create a simple function in R using the function
keyword. The curly brackets are used to contain the body of the function. In this simple example, we create a function that accepts a single input:
> addOne <- function(x) {
+ x + 1
+ }
Our new addOne
function adds 1 to any input object. Once we’ve created a function, we can call that function in the usual way:
> addOne(x = 1:5) # Call the addOne function
[1] 2 3 4 5 6
Tip: Saving Outputs
Here, we see the values 2 to 6 returned from a function. If we want to save the output from a function for later use, we need to assign the output from the function to an object, as shown here:
> result <- addOne(1:5)
> result
[1] 2 3 4 5 6
The function created is itself an R object. As such, it exists in the R Workspace, and can be managed and reused in future sessions if you save your Workspace objects, as discussed in Hour 2, “The R Environment.”
The body of our simple addOne
function contains only one line of code. If the function body contains only a single line of code, we can omit the curly brackets, as follows:
> addOne <- function(x) x + 1
> addOne(x = 1:5) # Call the addOne function
[1] 2 3 4 5 6
Note: Named Arguments
As you saw in Hour 6, “Common R Utility Functions,” there are many ways to call functions and define arguments. In the preceding example, addOne(x = 1:5)
is equivalent to addOne(1:5)
. In this hour, we will name all arguments when calling the functions to aid clarity, but common convention in R is that the first argument (or arguments) is not directly named.
Caution: Continual Prompts
In many of our examples, we see the familiar command prompt for the first line of the function, with plus (+) symbols prefixing the following lines. These signify the “continuation” prompt in R, and are not part of the code itself (in other words, you should not type these symbols when creating your functions).
Tip: Using the Script Window
As mentioned earlier, functions typically contain more than one line of code. As such, the script window (in RStudio or other interface) is preferred to the console window when developing functions.
A function is an R object, so it can be named like any other R object. Hence, its name
Can be of any length
Can contain any combinations of letters, numbers, underscores, and period characters
Cannot start with a number
One thing to note, however, is that creating a function can cause existing functions to be “masked.” Consider the following example:
> X <- 1:5 # Create a vector
> median(X) # The median of the vector is 3
[1] 3
> find("median") # Where is the "median" function?
[1] "package:stats"
> median <- function(input) "Hello" # Create a new "median" function
> median(X) # The median of the vector is "Hello"
[1] "Hello"
> find("median") # Where is the "median" function?
[1] ".GlobalEnv" "package:stats"
> rm(median) # Remove the new "median" function from the
workspace
> median(X) # The median of the vector is 3
[1] 3
Here we have created a new median
function in the R Workspace, thus “masking” the original median
function, which still exists in the stats package. As such, care should be taken when naming functions to ensure you don’t “mask” existing key functions.
In the previous section, we created a very simple function called addOne
, defined as follows:
> addOne <- function(x) {
+ x + 1
+ }
Note that this function takes a single argument, x
. If we wanted to extend this example, we could add a second argument:
> addNumber <- function(x, number) {
+ x + number
+ }
> addNumber(x = 1:5, number = 2)
[1] 3 4 5 6 7
Our new function (addNumber
) now accepts two arguments (x
and number
) and adds these values together. Note, however, that these are both required arguments because they do not have default values. As such, calling the function without both arguments defined will result in an error:
> addNumber() # Calling with no arguments
Error in addNumber() : argument "x" is missing, with no default
> addNumber(x = 1:5) # Calling with only the "x" argument
Error in addNumber(x = 1:5) : argument "number" is missing, with no default
> addNumber(number = 2) # Calling with only the "number" argument
Error in addNumber(number = 2) : argument "x" is missing, with no default
> addNumber(x = 1:5, number = 2) # Calling with both arguments
[1] 3 4 5 6 7
If we want to assign default values for arguments to a function, we can specify them directly in the argument definition, as follows:
> addNumber <- function(x, number = 0) {
+ x + number
+ }
> addNumber(x = 1:5) # Call function with default (number = 0)
[1] 1 2 3 4 5
> addNumber(x = 1:5, number = 1) # Call function with number = 1
[1] 2 3 4 5 6
When we define a function, we can create objects within the function body. This may help to simplify functions or make them generally more readable. For example, we may create an object to be returned:
> addNumber <- function(x, number = 0) {
+ theAnswer <- x + number # Create "theAnswer" by adding "x" and "number"
+ theAnswer # Return the value
+ }
If we call the function, note that the theAnswer
object is not accessible once the function has been executed:
> output <- addNumber(x = 1:5, number = 1) # Call the function creating
"output" object
> output # Look at value of "output"
[1] 2 3 4 5 6
> theAnswer # "theAnswer" object does not exist
Error: object 'theAnswer' not found
When we run a function, R loads argument inputs and objectives created into a separate, temporary area of memory (a memory “frame”). Once the execution of the function is complete, the output is returned and the temporary area of memory closed. As such, objects created within a function call should be considered “local” to that function, so any required outputs must be explicitly returned from the function.
In the preceding example, you saw an object created within the function body. Let’s extend that example to include the creation of more “local” objects. In this example, we create a function called plusAndMinus
, which creates two “local” objects (called PLUS
and MINUS
) and attempts to return both of them:
> plusAndMinus <- function(x, y) {
+ PLUS <- x + y # Define "PLUS"
+ MINUS <- x - y # Define "MINUS"
+ PLUS # Return "PLUS"
+ MINUS # Return "MINUS"
+ }
> plusAndMinus(x = 1:5, y = 1:5) # Call function
[1] 0 0 0 0 0
As you can see, only the last object (the MINUS
object) is returned from the function—the PLUS
object value is not returned and, as discussed earlier, is only a local object, so the value cannot be retrieved.
R functions can only return a single object, which is the result of the last line of code in the function. This can be confirmed by swapping the order of the PLUS
and MINUS
return objects:
> plusAndMinus <- function(x, y) {
+ PLUS <- x + y # Define "PLUS"
+ MINUS <- x - y # Define "MINUS"
+ MINUS # Return "MINUS"
+ PLUS # Return "PLUS"
+ }
> plusAndMinus(x = 1:5, y = 1:5) # Call function
[1] 2 4 6 8 10
If we want to return more than one value from a function (for example, the PLUS
and MINUS
objects), we need to combine them into a single object. First, let’s return the two values in a list:
> plusAndMinus <- function(x, y) {
+ PLUS <- x + y # Define "PLUS"
+ MINUS <- x - y # Define "MINUS"
+ list(PLUS, MINUS) # Return "PLUS" and "MINUS" in a list
+ }
> plusAndMinus(x = 1:5, y = 1:5) # Call function
[[1]]
[1] 2 4 6 8 10
[[2]]
[1] 0 0 0 0 0
This returns a single object, a list, containing the two values. When we return a list in this way, we should name the elements so we can more easily reference the values later:
> plusAndMinus <- function(x, y) {
+ PLUS <- x + y # Define "PLUS"
+ MINUS <- x - y # Define "MINUS"
+ list(plus = PLUS, minus = MINUS) # Return "PLUS" and "MINUS" in a list
+ }
> output <- plusAndMinus(x = 1:5, y = 1:5) # Call function, saving the output
> output # Print the output
$plus
[1] 2 4 6 8 10
$minus
[1] 0 0 0 0 0
> output$plus # Print the "plus" element
[1] 2 4 6 8 10
The list object is an appropriate structure in this example, because we are returning multiple vectors. However, we may be returning a number of single values from a function, in which case a vector may be more suitable. Consider the following example, where we return some summary statistics as a vector:
> summaryFun <- function(vec, digits = 3) {
+
+ # Create some summary statistics
+ theMean <- mean(vec)
+ theMedian <- median(vec)
+ theMin <- min(vec)
+ theMax <- max(vec)
+
+ # Combine them into a single vector and round the values
+ output <- c(Mean = theMean, Median = theMedian, Min = theMin, Max = theMax)
+ round(output, digits = digits)
+ }
>
> X <- rnorm(50) # Generate 50 samples from a normal distribution
> summaryFun(X) # Produce summaries of the vector
Mean Median Min Max
-0.214 -0.051 -2.633 1.764
Note: Checking Function Inputs
For the preceding functions, we frequently make assumptions about the structure of the inputs. For example, in the summaryFun
function we assume the vec
input is a numeric object (otherwise functions such as mean
make no sense). Later, in Hour 8, “Writing Functions: Part II,” we will cover ways of checking function inputs. This includes functions for checking the structure of inputs and for producing error or warning messages when those inputs are not appropriate for the function.
In the function examples you’ve seen so far in this hour, the “flow” through the body of the function has been completely linear and sequential. However, we may alternatively wish to control the flow based on decisions using an “if/else” statement.
Note: What Do We Mean by “If/Else”?
If you are not familiar with programming, the if/else statement is a common structure, where code is executed, or not, based on certain decisions. Consider this pseudo-code example:
IF I have enough money, I will buy a can of soda and a candy bar
ELSE I will just buy the can of soda
Often, we will only need an “IF” statement. Note that because either option in this example involves buying a can of soda, we can rewrite without the “ELSE” statement:
Buy the can of soda
IF I have enough money, I will also buy a candy bar
We can also have nested statements, such as this:
IF I have enough money, I will buy a can of soda and a candy bar
ELSE {
IF they have my favorite type of candy bar I will just buy that
ELSE I will just buy the can of soda
}
We can use a similar structure within our code to control the flow of the function based on specific choices.
The basic structure of an if/else statement in R is as follows:
if (something is TRUE) {
do this
}
else {
do this instead
}
As with functions, we use curly brackets to contain a body of code. However, if these are simple one-line statements, we may omit the curly brackets, as follows:
if (something is TRUE) do this
else do this instead
The “test” that is performed within the if
statement (marked as “something is TRUE
” here) is called the “condition,” and should take the form of a single TRUE
or FALSE
value.
Let’s look at a simple example of this in action. Here, we use the cat
function, which prints text to the screen based on whether the number passed to it is positive or negative:
> posOrNeg <- function(X) {
+ if (X > 0) {
+ cat("X is Positive")
+ }
+ else {
+ cat("X is Negative")
+ }
+ }
> posOrNeg(1) # is 1 positive or negative?
X is Positive
> posOrNeg(-1) # is -1 positive or negative?
X is Negative
> posOrNeg(0) # is 0 positive or negative?
X is Negative
Note: If/Else in a Script
Note that the above example of if/else is contained within a function. If, instead, the if/else code was run interactively or as part of a script, it would interpret the if
part of the statement as a single command and would fail when the else
statement is encountered:
> X <- 1
> if (X > 0) {
+ cat("X is Positive")
+ }
X is Positive
> else {
Error: unexpected 'else' in "else"
> cat("X is Negative")
X is Negative
> }
Error: unexpected '}' in "}"
To guard against this issue, we can rewrite the command positioning the else
statement immediately following the closing curly bracket of the if
component as follows:
> X <- 1
> if (X > 0) {
+ cat("X is Positive")
+ } else { # NOTE: "else" on same line as closing } of "if"
+ cat("X is Negative")
+ }
X is Positive
In this example, positive and negative integers are handled and the function will return the correct message. However, when we pass the function a 0, this would be reported as a negative, which isn’t true (in the most popular definition 0 is neither positive nor negative).
We can improve our example by using a nested if/else statement:
> posOrNeg <- function(X) {
+ if (X > 0) {
+ cat("X is Positive")
+ }
+ else {
+ if (X == 0) cat("X is Zero")
+ else cat("X is Negative")
+ }
+ }
> posOrNeg(1) # is 1 positive or negative?
X is Positive
> posOrNeg(0) # is 0 positive or negative?
X is Zero
Consider the following example:
> posOrNeg <- function(X) {
+ if (X > 0) {
+ cat("X is Positive")
+ }
+ else {
+ cat("")
+ }
+ }
> posOrNeg(1) # is 1 positive or negative?
X is Positive
> posOrNeg(0) # is 0 positive or negative?
In this example, the “else” part of the statement does nothing, so we can drop it and simplify as follows:
> posOrNeg <- function(X) {
+ if (X > 0) {
+ cat("X is Positive")
+ }
+ }
> posOrNeg(1) # is 1 positive or negative?
X is Positive
> posOrNeg(0) # is 0 positive or negative?
In the preceding example, the posOrNeg
function accepts an input called X
and the condition is X > 0
. Running this condition outside the if/else statement shows that it returns a single logical value:
> X <- 1 # Set X to 1
> X > 0 # Is X greater than 0?
[1] TRUE
> X <- 0 # Set X to 0
> X > 0 # Is X greater than 0?
[1] FALSE
If we instead provide a vector of values to this function, we get the following warning message:
> posOrNeg <- function(X) {
+ if (X > 0) cat("X is Positive")
+ else cat("X is Negative")
+ }
> posOrNeg(-2:2) # is 1 positive or negative?
X is Negative
Warning message:
In if (X > 0) cat("X is Positive") else cat("X is Negative") :
the condition has length > 1 and only the first element will be used
In this case, when running the condition outside the if/else statement, we can see that the result is a vector of logicals:
> X <- -2:2 # Set X to -2:2
> X > 0 # Is X greater than 0?
[1] FALSE FALSE FALSE TRUE TRUE
The if/else structure is looking for a single “choice” (that is, should it run the first “if” section of code or the second “else” section of code?). In this example, the condition has returned five “answers” (FALSE FALSE FALSE TRUE TRUE
).
R handles this mismatch by only using the first “answer” (as per the warning message). This is FALSE
, hence the result (“X is Negative
”).
In the last example, you saw that the condition should be a single TRUE
or FALSE
value. You also saw that warnings and unexpected behaviors can occur if multiple logical values are generated from the condition.
One way of handling this is to use the all
and any
functions to collapse a vector of logicals into a single TRUE
or FALSE
value:
> X <- -2:2 # Set X to -2:2
> X > 0 # Is X greater than 0?
[1] FALSE FALSE FALSE TRUE TRUE
> all(X > 0) # Are all values of X greater than 0?
[1] FALSE
> any(X > 0) # Are any values of X greater than 0?
[1] TRUE
We can use these functions directly in the condition as follows:
> posOrNeg <- function(X) {
+ if (all(X > 0)) cat("All values of X are > 0")
+ else {
+ if (any(X > 0)) cat("At least 1 value of X is > 0")
+ else cat("No values are > 0")
+ }
+ }
> posOrNeg(-2:2)
At least 1 value of X is > 0
> posOrNeg(1:5)
All values of X are > 0
> posOrNeg(-(1:5))
No values are > 0
Sometimes we may want the person calling the function to choose the flow of the function. In this case, we can provide a logical argument that the function passes directly to the condition in the if/else statement:
> logVector <- function(vec, logIt = FALSE) {
+ if (logIt == TRUE) vec <- log(vec)
+ else vec <- vec
+ vec
+ }
> logVector(1:5)
[1] 1 2 3 4 5
> logVector(1:5, logIt = TRUE) # Call the function with logIt = TRUE
[1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379
Again, the “else” portion of this statement changes nothing, so we can drop it:
> logVector <- function(vec, logIt = FALSE) {
+ if (logIt == TRUE) vec <- log(vec)
+ vec
+ }
> logVector(1:5)
[1] 1 2 3 4 5
> logVector(1:5, logIt = TRUE) # Call the function with logIt = TRUE
[1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379
There is one more simplification we can make. Consider the possible outcomes from the condition:
If logIt
is TRUE
, then logIt == TRUE
will be TRUE
.
If logIt
is FALSE
, then logIt == TRUE
will be FALSE
.
So, regardless of the result, logIt == TRUE
will always return the same value as logIt
. Therefore, we can simplify the condition as follows:
> logVector <- function(vec, logIt = FALSE) {
+ if (logIt) vec <- log(vec)
+ vec
+ }
> logVector(1:5)
[1] 1 2 3 4 5
> logVector(1:5, logIt = TRUE) # Call the function with logIt = TRUE
[1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379
Using all
and any
, we can summarize logical vectors as follows:
> X <- -2:2 # Set X to -2:2
> X > 0 # Is X greater than 0?
[1] FALSE FALSE FALSE TRUE TRUE
> all(X > 0) # Are all values of X greater than 0?
[1] FALSE
> any(X > 0) # Are any values of X greater than 0?
[1] TRUE
We can introduce the !
notation before any logical statement to convert TRUE
values to FALSE
values and FALSE
values to TRUE
values. This can be seen here:
> X <- -2:2 # Set X to -2:2
> X > 0 # Is X greater than 0?
[1] FALSE FALSE FALSE TRUE TRUE
> !(X > 0) # Reverse logical values
[1] TRUE TRUE TRUE FALSE FALSE
We can also use the !
notation before the all
and any
functions to reverse the meanings of the conditions as follows:
> posOrNeg <- function(X) {
+ if (all(X > 0)) cat("
All values of X are greater than 0")
+ if (!all(X > 0)) cat("
Not all values of X are greater than 0")
+ if (any(X > 0)) cat("
At least 1 value of X is greater than 0")
+ if (!any(X > 0)) cat("
No values of X are greater than 0")
+ }
> posOrNeg(1:5) # All > 0
All values of X are greater than 0
At least 1 value of X is greater than 0
> posOrNeg(-2:2) # Some > 0, Some <= 0
Not all values of X are greater than 0
At least 1 value of X is greater than 0
> posOrNeg(-(1:5)) # All <= 0
Not all values of X are greater than 0
No values of X are greater than 0
Note: New Line Characters
Note the use of the
character in the call to cat
in the preceding example. The
character specifies that a new line is written, which is why each statement printed is on a separate line. This can be further seen in this example:
> cat("Hello
there")
Hello
there
In all our examples so far, there has been a single condition. If we have more than one condition, we can use the &
or |
notation to combine conditions. Here is a rather contrived example to show the use of these operators:
> betweenValues <- function(X, Min = 1, Max = 10) {
+ if (X >= Min & X <= Max) cat(paste("X is between", Min, "and", Max))
+ if (X < Min | X > Max) cat(paste("X is NOT between", Min, "and", Max))
+ }
> betweenValues(5)
X is between 1 and 10
> betweenValues(5, Min = -2, Max = 2)
X is NOT between -2 and 2
We may also mix conditions that come from different sources. Consider the following example that mixes a condition passed from the user with one derived within the function:
> logVector <- function(vec, logIt = FALSE) {
+ if (all(vec > 0) & logIt) vec <- log(vec)
+ vec
+ }
> logVector(1:5, logIt = TRUE) # Logs the data
[1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379
> logVector(-5:5, logIt = TRUE) # Doesn't log the data because first condition not met
[1] -5 -4 -3 -2 -1 0 1 2 3 4 5
When multiple conditions are combined with &
and/or |
conditions, each condition is evaluated separately, and the each result is compared. To illustrate this, consider the following example:
> logVector <- function(vec) {
+ if (all(vec > 0) & all(log(vec) <= 2)) cat("Numbers in range")
+ else cat("Numbers not in range")
+ }
> logVector(1:10) # Some logged values are greater than 2
Numbers not in range
> logVector(1:5) # All values are in range
Numbers in range
Let’s consider the way in which the condition from the last call is evaluated:
The all(vec > 0)
statement is evaluated, resulting in a TRUE
value.
The all(log(vec) <= 2)
statement is evaluated, also resulting in a TRUE
value.
The results of the two statements are compared: TRUE
& TRUE
= TRUE
.
Now consider the following example:
> logVector(-2:2)
Numbers not in range
Warning message:
In log(vec) : NaNs produced
In this example, we see a return value (“Numbers not in range”) and also a warning message. This message occurs because both conditions are evaluated and compared. The first condition returns a FALSE
value, but the second condition generates a warning message because the function is attempting to calculate logs of negative numbers (which is not mathematically possible).
To remedy these issues, we can use the “control” versions of the &
and |
operators. This changes the flow so that the second condition is only evaluated if the result of the first is inconclusive. To use the “control” and/or statement, we use double notation (&&
or ||
). Let’s update our logVector
function with “control” notation:
> logVector <- function(vec) {
+ if (all(vec > 0) && all(log(vec) <= 2)) cat("Numbers in range")
+ else cat("Numbers not in range")
+ }
> logVector(-2:2)
Numbers not in range
You can see that the earlier message has been avoided because we specified a “control and” using the &&
notation. Now, the flow of the condition is as follows:
The all(vec > 0)
statement is evaluated, resulting in a FALSE
value.
Because the first condition is FALSE
, the whole statement must be FALSE
, so a FALSE
value is returned without evaluating the second condition.
Earlier in this hour, in the “Return Objects” section, you saw that the last evaluated line of code within a function generates the return value. Consider this example:
> verboseFunction <- function(X) {
+ if (all(X > 0)) output <- X # if all values of X > 0, set output to X
+ else {
+ X [ X <= 0 ] <- 0.1 # Set all values <=0 to 0.1
+ output <- log(X) # Take logs of the X input data, set as output
+ }
+ output # Return the value of output
+ }
> verboseFunction(-2:2) # Call our function
[1] -2.3025851 -2.3025851 -2.3025851 0.0000000 0.6931472
If all the values of X
are greater than 0, we set the output to 0. At this point in the function (that is, the first line of the body of the function) we already know the value we want to return from the function. If we wish to return the result of the function early, we can force this to happen using the return
function. This way, we can rewrite our function as follows:
> verboseFunction <- function(X) {
+ if (all(X > 0)) return(X) # Return early if all values of X are > 0
+
+ # Carry on if not returned already
+ X [ X <= 0 ] <- 0.1 # Set all values <=0 to 0.1
+ log(X) # Return the logged X values
+ }
> verboseFunction(-2:2)
[1] -2.3025851 -2.3025851 -2.3025851 0.0000000 0.6931472
This provides a clear, readable behavior where results are returned earlier in the function when certain conditions are met.
So far in this hour, all our examples have been very simple (and, often, rather useless). This has been done to ensure we focus on the basic syntax of R functions, but at this point it is worth exploring a more complete and useful worked example to see the various components discussed in this hour in action.
The following function summarizes a numeric object, calculating a variety of statistics:
> summaryFun <- function(vec, digits = 3) {
+ N <- length(vec) # Calculate the number of values in "vec"
+ if (N == 0) return(NULL) # Return NULL if "vec" is empty
+
+ testMissing <- is.na(vec) # Look for missing values
+ if (all(testMissing)) {
+ output <- c( N = N, nMissing = N, pMissing = 100)
+ return(output) # Return simple summary if all missing
values
+ }
+
+ nMiss <- sum(testMissing) # Calculate the number of missing values
+ pMiss <- 100 * nMiss / N # Calculate the percentage of missing values
+ vec <- vec [ !testMissing ] # Remove missing values from the vector
+ someStats <- c(Mean = mean(vec), Median = median(vec), SD = sd(vec),
+ Min = min(vec), Max = max(vec)) # Calculate a number of statistics
+
+ output <- c(someStats, N = N, nMissing = nMiss, pMissing = pMiss)
+ round(output, digits = digits)
+ }
> summaryFun(c()) # Empty Vector
NULL
> summaryFun(rep(NA, 10)) # Vector of missing values
N nMissing pMissing
10 10 100
> summaryFun(1:10) # Basic numeric vector
Mean Median SD Min Max N nMissing pMissing
5.500 5.500 3.028 1.000 10.000 10.000 0.000 0.000
> summaryFun(airquality$Ozone) # Vector containing missings
Mean Median SD Min Max N nMissing pMissing
42.129 31.500 32.988 1.000 168.000 153.000 37.000 24.183
In this hour, we have covered the basic structure of an R function, and you have seen how to create simple functions of your own. In particular, you saw how to specify the function inputs, define what your functions “do” with those inputs, and how results are returned from your functions. Beyond this, we covered the if/else structure, which allows you to control the overall flow through a function.
In the next hour, we will use the skills you learned here to create more complex functions, including the use of error messaging and the checking of function inputs.
A. During the history of R, a number of naming conventions have come and gone. The current convention (which I’ve followed in this hour) is to use camel-case starting with a lower case letter (e.g. myFunction
). However, there are no specific rules as to how functions should be named.
Q. How do I load and share my functions?
A. Functions are R objects so, when created, they exist in the workspace of the current session. If you save that workspace and restart in the same working directory, your function (and other) objects should still exist. If you want to share with other users, or reuse your functions in other projects, we can do the following:
Save the function definitions as scripts, then open and re-execute them in other sessions.
Save your functions together in your own “package,” which can be shared and loaded into R (you’ll see how to do this in Hour 19, “Package Building”).
Q. Can I “globally assign” local objects so they can be seen later?
A. Yes, this can be achieved with the assign
function. However, this practice is discouraged, and we recommend that any required results are passed back to the user in the manner discussed in this hour.
Q. What is the difference between the cat and print functions?
A. In this section, we make heavy use of the cat
function to demonstrate the flow of a function when using if/else statements. The cat
function simply prints the value of an object without printing the structure of that object. The print
function also returns the structure of the object. This can be seen with a simple example:
> cat("Hello")
Hello
> print("Hello")
[1] "Hello"
Q. How do missing values impact “conditions”?
A. If the condition results in a single missing value, then an error is returned:
> testMissing <- function(X) {
+ if (X > 0) cat("Success")
+ }
> testMissing(NA)
Error in if (X > 0) cat("Success") :
missing value where TRUE/FALSE needed
If you use the all
function with a condition that contains any missing values, the result is missing, which will also result in an error (because you do not know if “all” the conditions are met):
> allMissings <- rep(NA, 5) # All missing values
> someMissings <- c(NA, 1:4) # Some missing values
> all(allMissings > 0)
[1] NA
> all(someMissings > 0)
[1] NA
If you use the any
function with a condition that contains all missing values, the result is a missing value. If, however, you use the any
function with a vector where not all values are missing, some conditions may be met:
> any(allMissings > 0)
[1] NA
> any(someMissings > 0)
[1] TRUE
The workshop contains quiz questions and exercises to help you solidify your understanding of the material covered. Try to answer all questions before looking at the “Answers” section that follows.
1. How do you specify default inputs to a function?
2. What value will be held in the result1
object when the following code is executed?
> qaFun <- function(X) {
+ addOne <- X + 1
+ minusOne <- X - 1
+ addOne
+ minusOne
+ }
> result1 <- qaFun(1)
3. What value will be held in the result2
object when the following code is executed?
> qaFun <- function(X) {
+ addOne <- X + 1
+ minusOne <- X - 1
+ c(ADD = addOne, MINUS = minusOne)
+ }
> result2 <- qaFun(1)
4. When you specify an if/else statement, what object should the “condition” (that is, the statement within the if
call) return?
5. What is the difference between all(X > 0)
and !all(X > 0)
?
6. What is the difference between &
and &&
when used in a condition?
7. What function can you use to return an object early (that is, before the last line of the function)?
1. You specify default values directly in the input statement with “equals” (for example, function(x = 1)
).
2. The result1
object will contain a 0
, because only the last line is returned (the value of minusOne
, created as X – 1 = 0
).
3. The result2
object will contain a vector of length 2, containing the values 2
and 0
. The elements of the vector will be named ADD
and MINUS
.
4. The condition should return a single logical value. If multiple logical values are returned, unexpected behaviors can occur.
5. The all
function returns a TRUE
value if all the values of X
are greater than 0 (and non-missing). The !
prefix in !all
reverses the logical values, so this would return a TRUE
if “not all” values of X
are greater than 0 (that is, at least one is less than or equal to 0).
6. When you use a single &
, the conditions each side of the &
are evaluated and the outputs compared to see whether both conditions are met. Therefore, if you have test1 & test2
, both test1
and test2
are evaluated, then they are compared. If instead you use the “control” &&
(for example, in test1 && test2
), then the first condition (test1
) is evaluated, and the second condition (test2
) is only evaluated if the first condition is TRUE
.
7. You can use the return
function to return a result earlier in the function call.
1. Create a function that accepts two inputs (X and Y) and returns the value of X + Y. Test your function by calling it with X and Y inputs.
2. Update your function so that Y has a default value. Test your function by calling it with only an X input, then try specifying a value for Y.
3. Create a function called firstLast
that accepts a vector and returns the first and last values. Test your function.
4. Update your firstLast
function so that, if the vector input only has a single value (that is, it is of length 1), only that single value is returned.
5. Update your firstLast
function so that, if all values of the vector are less than zero, a message is printed to the user informing him or her of this fact.
6. Update your firstLast
function so that, if any values of the vector are missing, the first value, last value, and the number of missing values are returned to the user.