Introduction to Programming in R

Office of Advanced Research Computing (OARC)
Statistical Methods and Data Analytics

1 Basic concepts

1.1 Introduction

Interpreted Execution: Unlike languages like C++ or FORTRAN, R doesn’t require compilation. It operates as an interpreted language, executing code line by line.

Versatility Beyond Statistics: While R serves as statistical software like SAS, Stata, or SPSS, it stands out by offering more. R isn’t limited to statistics; it’s a dynamic programming language that enables users to craft customized routines and functions tailored to their unique requirements.

Key Points:

R’s popularity rests on its adaptability to data analysis and statistical tasks.

R’s interpreter allows interactive coding and rapid testing.

R bridges the gap between statistical analysis and programming.

Why Learn R Programming?

Closing Thought:

Mastering R empowers you to streamline processes, customize analyses, and contribute to a community of data-driven professionals.

1.2 Objects in R

Key Takeaway:

Objects are the building blocks of R programming, enabling efficient data storage, manipulation, and analysis.

1.3 Considerations for Naming Objects in R

Names must not begin with a number.

Avoid special symbols like ^, !, $, @, +, -, /, :, or * in names, promoting code readability.

R object names are case sensitive, distinguishing between lowercase and uppercase characters.

you can’t use any of the reserved words like TRUE, NULL, if, and function (see the complete list in ?Reserved)

When we pass the value of an object to a new object, we are not generating a new value in the computer memory; rather, we are establishing a new reference to the same object.

By typing an object , we print values into RStudio console.

Examples of objects in R

#creates an object names 'a' with value 1
a <- 1
# Print the value of 'a'
a
## [1] 1
#creates an object names 'my.number' with value 7
my.number <- 7
# Assign the value of 'my.number' to 'x'
x <- my.number
# Print the values of 'x' and 'my.number'
#c combine values into one vector and then print, prints it
c(x, my.number)
## [1] 7 7
#object D and d are two different objects
D <- c(5, 6)
d <- 5
# Print the values of 'D' and 'd'
D
## [1] 5 6
d
## [1] 5

Exercise 1

Create and object name die to store all possible value of a die. Pass the value of die to a new object, call it new.die. Print die and new.die in RStudio console.

1.4 Solution Exercise 1

####################  Exercise 1  ######################
#creating a die vector from 1 to 6
die <- c(1,2,3,4,5,6)
#assign it to new object name new.die
new.die <- die
#print
die
## [1] 1 2 3 4 5 6
new.die
## [1] 1 2 3 4 5 6
####################  End of Exercise 1  ###############

1.5 Atomic vector

There are six basic types of vectors:

1.6 Examples of atomic vector

here is some examples of types of vector.

# object die is already a vector
is.vector(die)
## [1] TRUE
# type of die is double
typeof(die)
## [1] "double"
#creating an integer vector
a <- c(1L, -2L)
a
## [1]  1 -2
typeof(a)
## [1] "integer"
#If in mathematical calculations of integers we include a double the result is double.  
typeof(a * die)
## [1] "double"
# create a vector of type text
text <- c("john", "mary", 2)
text
## [1] "john" "mary" "2"
typeof(text)
## [1] "character"
#logical
#create a vector of type logical
my.logic <- c(TRUE, FALSE, FALSE)
typeof(my.logic)
## [1] "logical"
my.logic
## [1]  TRUE FALSE FALSE
# the result of a condition statement is logical
2 < 1
## [1] FALSE
#the result of combining 
#complex number
comp <- c(1 + 1i, 2 - 2i)
typeof(comp)
## [1] "complex"
#raw vector of lenght 3
raw(3)
## [1] 00 00 00

1.7 Attribute

Attributes are used to store additional properties beyond the main data of an object.

Key Takeaway:

Attributes are used to store additional information beyond the main data of an object. You can access attributes using functions like attr(), names(), dim(), class(), and so on. Understanding and utilizing attributes is crucial for effective data manipulation and analysis in R.

1.8 Example for attributes

#Before assigning attributes to a vector, the attributes are NULL
attributes(die)
## NULL
# Assign names to the 'die' vector

names(die) <- c("one","two", "three", "four", "five", "six")
#Print the values and attributes of 'die'
die
##   one   two three  four  five   six 
##     1     2     3     4     5     6
attributes(die)
## $names
## [1] "one"   "two"   "three" "four"  "five"  "six"
#set the attribute of die to NULL 
attributes(die) <- NULL
die
## [1] 1 2 3 4 5 6
dim(die)
## NULL
# By changing dim we Reshapes 'die' into a matrix of 2 by 3
dim(die) <- c(2,3)
# Print the reshaped 'die' matrix
die
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
# Print the attributes of 'die' matrix
attributes(die)
## $dim
## [1] 2 3
# Reshape 'die' into a 1 by 2 by 3 array
dim(die) <- c(1,2,3)
# Print the reshaped 'die' array
die
## , , 1
## 
##      [,1] [,2]
## [1,]    1    2
## 
## , , 2
## 
##      [,1] [,2]
## [1,]    3    4
## 
## , , 3
## 
##      [,1] [,2]
## [1,]    5    6
#Reset attributes of die to NULL makes it back to vector
attributes(die) <- NULL
#print die
die
## [1] 1 2 3 4 5 6

1.9 Class

Key Takeaway:

Classes are crucial because they define how objects respond to various operations and functions. Different classes may have different behavior even if they store similar underlying data. Understanding the class of an object is important for effective programming and data analysis in R.

1.10 factor

1.11 Example for factor

# Create a factor variable 'gender' with two levels: "male" and "female"
gender <- factor(c("male", "female", "female", "male"))
# Print the values and attributes of 'gender'
gender
## [1] male   female female male  
## Levels: female male
# or we label a vector of integers 1 and 2
factor(c(1,2,2,1), labels = c("male", "female"))
## [1] male   female female male  
## Levels: male female
typeof(gender)
## [1] "integer"
attributes(gender)
## $levels
## [1] "female" "male"  
## 
## $class
## [1] "factor"

1.12 Matrices

1.13 Example for matrix

# Create a 2 by 3 matrix 'm' using 'die'
m <- matrix(die, nrow = 2, ncol = 3)
# Print the matrix 'm'
m
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
# Create a 2 by 3 matrix 'm' using 'die', filling by rows
m <- matrix(die, nrow = 2, ncol = 3, byrow = TRUE)
# Print the matrix 'm'
m
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6

1.14 Arrays in R

Example for arrays in R

# Create a 2 by 2 by 3 array 'ar' with values 1 to 12
ar <- array(1:12, dim = c(2, 2, 3))
# Print the array 'ar'
ar
## , , 1
## 
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
## 
## , , 2
## 
##      [,1] [,2]
## [1,]    5    7
## [2,]    6    8
## 
## , , 3
## 
##      [,1] [,2]
## [1,]    9   11
## [2,]   10   12

1.15 data frame

For example if we need to represent 3 students of a class with their gender and enrollment status and GPA score we use a data.frame:

Example for data frame in R

# Create a data frame 'gpa.data' with various variables
gpa.data <- data.frame(student_number = c(1, 2, 3), Gender = c("F", "F", "M"),
                       GPA = c(3.5, 3.7, 3.6), Enroll = c(TRUE, TRUE, FALSE))
# Print the data frame 'gpa.data'
gpa.data
##   student_number Gender GPA Enroll
## 1              1      F 3.5   TRUE
## 2              2      F 3.7   TRUE
## 3              3      M 3.6  FALSE

Creating a data frame to represent a deck of playing cards

Now we are ready to Create a data frame that represent a deck of playing cards. This data frame has 52 rows, each row represent a single card and 3 columns, one for card suit, one for value of card from 1 to 13, and one for face of card.

#create a vector of 4 suits
suit <- c("spades", "heart", "clubs", "dimonds")
#create a vector of 13 faces
face <- c("king", "queen", "jack", "ten", "nine", "eight", "seven", "six",
"five", "four", "three", "two", "ace")
#create a vector of values 13 to 1 
value <- 13:1
#putting together all variables
deck <- data.frame( suit = rep(suit, each = 13), face = rep(face, times = 4), value = rep(value, times = 4))
deck
##       suit  face value
## 1   spades  king    13
## 2   spades queen    12
## 3   spades  jack    11
## 4   spades   ten    10
## 5   spades  nine     9
## 6   spades eight     8
## 7   spades seven     7
## 8   spades   six     6
## 9   spades  five     5
## 10  spades  four     4
## 11  spades three     3
## 12  spades   two     2
## 13  spades   ace     1
## 14   heart  king    13
## 15   heart queen    12
## 16   heart  jack    11
## 17   heart   ten    10
## 18   heart  nine     9
## 19   heart eight     8
## 20   heart seven     7
## 21   heart   six     6
## 22   heart  five     5
## 23   heart  four     4
## 24   heart three     3
## 25   heart   two     2
## 26   heart   ace     1
## 27   clubs  king    13
## 28   clubs queen    12
## 29   clubs  jack    11
## 30   clubs   ten    10
## 31   clubs  nine     9
## 32   clubs eight     8
## 33   clubs seven     7
## 34   clubs   six     6
## 35   clubs  five     5
## 36   clubs  four     4
## 37   clubs three     3
## 38   clubs   two     2
## 39   clubs   ace     1
## 40 dimonds  king    13
## 41 dimonds queen    12
## 42 dimonds  jack    11
## 43 dimonds   ten    10
## 44 dimonds  nine     9
## 45 dimonds eight     8
## 46 dimonds seven     7
## 47 dimonds   six     6
## 48 dimonds  five     5
## 49 dimonds  four     4
## 50 dimonds three     3
## 51 dimonds   two     2
## 52 dimonds   ace     1

1.16 list

# Create a list 'l' containing various objects
l <- list(die, ar, m, gpa.data, gender, list("a", "b"))
# Print the list 'l'
l
## [[1]]
## [1] 1 2 3 4 5 6
## 
## [[2]]
## , , 1
## 
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
## 
## , , 2
## 
##      [,1] [,2]
## [1,]    5    7
## [2,]    6    8
## 
## , , 3
## 
##      [,1] [,2]
## [1,]    9   11
## [2,]   10   12
## 
## 
## [[3]]
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## 
## [[4]]
##   student_number Gender GPA Enroll
## 1              1      F 3.5   TRUE
## 2              2      F 3.7   TRUE
## 3              3      M 3.6  FALSE
## 
## [[5]]
## [1] male   female female male  
## Levels: female male
## 
## [[6]]
## [[6]][[1]]
## [1] "a"
## 
## [[6]][[2]]
## [1] "b"

Comparing vector, matrix, array, list, and data.frame

1.17 Extract values from R objects

To extract value from an object of data we use brackets and the indices within the brackets, separated by commas, specify which values to extract.

Indices are usually integer vectors, but they can also be:

Negative Integers

Blank Spaces

Logical Values

Names

1.18 Extracting Values from Different R Object Types

Example

# Extract the value in row 3 and column 1 of 'deck'
deck[3, 1]
## [1] "spades"
# Extract values in row 2 and 1, and columns 1, 2, and 3 of 'deck'
deck[c(2, 1), c(1, 2, 3)]
##     suit  face value
## 2 spades queen    12
## 1 spades  king    13
# Extract all columns of row 3 from 'deck'
deck[3, ]
##     suit face value
## 3 spades jack    11
# Extract values in row 3 and columns 1 to 3 of 'deck'
deck[3, 1:3]
##     suit face value
## 3 spades jack    11
# Exclude rows 1 to 2 and 4 to 52, and extract columns 1 to 3 of 'deck'
deck[-c(1:2, 4:52), 1:3]
##     suit face value
## 3 spades jack    11
# Extract values in row 3 and columns 1 and 2 of 'deck' using logical indices
deck[3, c(TRUE, TRUE, FALSE)]
##     suit face
## 3 spades jack

1.19 Subsetting a data using logical index.

1.20 Example for Subsetting

# Test which values in 'deck' are greater than or equal to 7
deck$value >= 7
##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE
## [49] FALSE FALSE FALSE FALSE
# Create a new 'deck7' with values greater than or equal to 7
deck7 <- deck[deck$value >= 7, ]
deck7
##       suit  face value
## 1   spades  king    13
## 2   spades queen    12
## 3   spades  jack    11
## 4   spades   ten    10
## 5   spades  nine     9
## 6   spades eight     8
## 7   spades seven     7
## 14   heart  king    13
## 15   heart queen    12
## 16   heart  jack    11
## 17   heart   ten    10
## 18   heart  nine     9
## 19   heart eight     8
## 20   heart seven     7
## 27   clubs  king    13
## 28   clubs queen    12
## 29   clubs  jack    11
## 30   clubs   ten    10
## 31   clubs  nine     9
## 32   clubs eight     8
## 33   clubs seven     7
## 40 dimonds  king    13
## 41 dimonds queen    12
## 42 dimonds  jack    11
## 43 dimonds   ten    10
## 44 dimonds  nine     9
## 45 dimonds eight     8
## 46 dimonds seven     7

1.21 Modifying values

In R, you can easily modify values within objects, such as vectors, matrices, and data frames.

Use the assignment operator (<- or =) to replace values.

Modify multiple elements simultaneously.

Apply conditions for selective modifications.

Test if elements in one vector are present in another.

Syntax: element %in% vector_to_check

1.22 Example for Modifying values

In gpa.data we add 0.1 to everyone’s GPA who have enrolled equal TRUE

# Print the original 'gpa.data'
gpa.data
##   student_number Gender GPA Enroll
## 1              1      F 3.5   TRUE
## 2              2      F 3.7   TRUE
## 3              3      M 3.6  FALSE
# Identify rows where 'Enroll' is TRUE
ind.enroll <- gpa.data[, 4] == TRUE
# Add 0.1 to 'GPA' for rows where 'Enroll' is TRUE
gpa.data[ind.enroll, "GPA"] <- gpa.data[ind.enroll, "GPA"] + 0.1
# Print the modified 'gpa.data'
gpa.data
##   student_number Gender GPA Enroll
## 1              1      F 3.6   TRUE
## 2              2      F 3.8   TRUE
## 3              3      M 3.6  FALSE

1.23 Exercise 2

  1. Shuffle a deck of playing cards randomly and deal the first five cards. Keep the remaining cards and call them the dealer, and refer to the five selected cards as the player.

Hint: This task may seem a bit tricky, but it’s not overly difficult. Try using the sample function without replacement, choosing from numbers 1 to 52, and using these as new indices for the rows.

  1. Modify the deck of playing cards to suit the game of Blackjack. To achieve this, replace the values for king, queen, jack, and ace with a value of 10.

  2. Shuffle the modified deck of playing cards randomly and deal the first card.

solution to Exercise 2

# Sample without replacement from 1 to 52 and extract rows from 'deck'
shuffled.deck <- deck[sample(1:52), ]
# Print the shuffled 'deck'
shuffled.deck
##       suit  face value
## 42 dimonds  jack    11
## 33   clubs seven     7
## 8   spades   six     6
## 5   spades  nine     9
## 4   spades   ten    10
## 51 dimonds   two     2
## 15   heart queen    12
## 18   heart  nine     9
## 19   heart eight     8
## 28   clubs queen    12
## 34   clubs   six     6
## 20   heart seven     7
## 24   heart three     3
## 9   spades  five     5
## 48 dimonds  five     5
## 3   spades  jack    11
## 17   heart   ten    10
## 25   heart   two     2
## 27   clubs  king    13
## 50 dimonds three     3
## 35   clubs  five     5
## 7   spades seven     7
## 46 dimonds seven     7
## 41 dimonds queen    12
## 47 dimonds   six     6
## 37   clubs three     3
## 14   heart  king    13
## 49 dimonds  four     4
## 52 dimonds   ace     1
## 1   spades  king    13
## 22   heart  five     5
## 45 dimonds eight     8
## 13  spades   ace     1
## 32   clubs eight     8
## 30   clubs   ten    10
## 12  spades   two     2
## 16   heart  jack    11
## 39   clubs   ace     1
## 43 dimonds   ten    10
## 21   heart   six     6
## 31   clubs  nine     9
## 29   clubs  jack    11
## 6   spades eight     8
## 2   spades queen    12
## 38   clubs   two     2
## 10  spades  four     4
## 11  spades three     3
## 44 dimonds  nine     9
## 40 dimonds  king    13
## 36   clubs  four     4
## 26   heart   ace     1
## 23   heart  four     4
# Select the first five cards as 'player'
player <- shuffled.deck[1:5,]
# Print the first five cards for 'player'
head(player)
##       suit  face value
## 42 dimonds  jack    11
## 33   clubs seven     7
## 8   spades   six     6
## 5   spades  nine     9
## 4   spades   ten    10
# Keep the remaining cards as 'dealer'
dealer <- shuffled.deck[-(1:5),]
# Print the remaining cards for 'dealer'
head(dealer)
##       suit  face value
## 51 dimonds   two     2
## 15   heart queen    12
## 18   heart  nine     9
## 19   heart eight     8
## 28   clubs queen    12
## 34   clubs   six     6
# Create a copy of 'deck' named 'deck2'
deck2 <- deck

# Identify cards that are king, queen, jack, or ace and set their values to 10
ind <- deck2$face == "king" | deck2$face == "queen" | deck2$face == "jack" | deck2$face == "ace"
deck2$value[ind] <- 10

# Shorter approach using '%in%'
deck2$value[deck$face %in% c("king", "queen", "jack", "ace")] <- 10

# Shuffle 'deck2'
shuffled.deck2 <- deck2[sample(1:52), ]
# Extract the first card from shuffled 'deck2'
shuffled.deck2[1, ]
##     suit  face value
## 33 clubs seven     7

1.24 Handling Missing Information in R

You can replace NA with a specified value using assignment.

Example:

data[data_column_name][is.na(data[data_column_name])] <- new_value

Removing NA values can be done with the na.omit() function.

Example:

clean_data <- na.omit(data)

2 Programming elements in R

2.1 Functions

# Calculate the rounded value of pi
round(3.141593)
## [1] 3
# Calculate the value of pi rounded to two decimal places
round(3.141593, digits = 2)
## [1] 3.14
# Calculate the factorial of 3
factorial(3)
## [1] 6
# Calculate the mean of 'die'
mean(die)
## [1] 3.5
# Calculate the mean of 'die' and round the result
round(mean(die))
## [1] 4

Random sample

To roll the die randomly, you can use the sample function.

To access the help for the sample function, use ?sample.

# Sample values from 'die' two times (sampling without replacement)
sample(die, size = 6)
## [1] 6 2 3 5 4 1
# Sample values from 'die' six times with replacement (simulating dice rolling)
sample(die, size = 6, replace = TRUE)
## [1] 1 2 3 6 4 5

2.2 Generic Functions in R

Writing a Function in R

1. Name: name for the function.

2. Arguments: Input information given to function

3. Function body code: The core code that processes the inputs and produces the output.

Key Takeaway:

Creating your own functions in R helps you build step-by-step instructions to solve problems and keeps your code neat and tidy.

Function Example

# Define a function 'average_two_number' to calculate the average of two numbers
average_two_number <- function(a, b){
  x <- (a + b) / 2
  print(x)
}
# Calculate and print the average of 2 and 4
average_two_number(2, 4)
## [1] 3
# Define a function 'C_to_F' to convert Celsius to Fahrenheit
C_to_F <- function(c = 0){
  f <- c * 9/5 + 32
  return(f)
}
# Convert 30 degrees Celsius to Fahrenheit
f <- C_to_F(30)
# Print the converted temperature
f
## [1] 86
# Convert 0 degrees Celsius to Fahrenheit using the default value
C_to_F()
## [1] 32
# Print the class and structure of the function 'C_to_F'
class(C_to_F)
## [1] "function"
C_to_F
## function(c = 0){
##   f <- c * 9/5 + 32
##   return(f)
## }
## <bytecode: 0x00000281005fbee8>

Exercise 3

a- Create a function called roll_die that rolls a die and returns its value.

b- Create a function that rolls two six sided dice and return sum of two dice, call it roll_die_2.

c- Modify your function to rolls any given sided die for a given number of times and return sum of dice, call it roll_die_k.

solution to Exercise 3

# Define a function 'roll_die' to simulate rolling a single die
roll_die <- function(){
  #Create a vector of 1 to 6 for each side of die
  die <- 1:6
  #Sample one number from 1 to 6
  die <- sample(die, size = 1, replace = TRUE)
  #return the result
  return(die)
}
# Roll the die using the defined function
roll_die()
## [1] 3
# Define a function 'roll_die_2' to simulate rolling two dice and getting their sum
roll_die_2 <- function(){
  #create die
  die <- 1:6
  #sample from 1 to 6 with replacement  
  dice <- sample(die, size = 2, replace = TRUE)
  #sum/ I did not used return so the last line will return 
  sum(dice)
}
# Roll two dice and calculate their sum using the defined function
roll_die_2()
## [1] 8
# Define a function 'roll_die_k' to simulate rolling a die 'k' times
roll_die_k <- function(side = 6, k = 1){
  #create a die with side number of sides
  die <- 1:side
  #sample die k times
  dice <- sample(die, size = k, replace = TRUE)
  #sum
  sum(dice)
}
# Roll a 4-sided die 3 times using the defined function
roll_die_k(side = 4, k = 3)
## [1] 7

Control-Flow

2.3 Boolean operators in R

Before introducing Control-Flow constructs of the R language, we review how R enable logical comparisons.

Common Boolean Operators are:

Logical Operators:

2.4 The if Statement in R

The if statement directs the R program to perform a task only if a given condition is TRUE.

The body can contain a single line or multiple lines of code. If the body is single line the curly braces is optional.

Example for if statment

The following code checks if a number is even and print “The number is even” if it is even.

# Set the initial value of 'number' to 6
number <- 6
# Use an if statement to check if 'number' is even
if (number %% 2 == 0) {
  print("The number is even.")
}
## [1] "The number is even."
# Update 'number' to 7 and check again
number <- 7
if (number %% 2 == 0) {
  print("The number is even.")
}

Now we create a function to take a number and print “The number is even” if it is even and “The number is odd” if it is odd.

# Define a function 'even_odd' to determine if a number is even or odd
even_odd <- function(a){
#is the remainder zero 
if (a %% 2 == 0) {
  print("The number is even")
}  
# if the remainder is not zero
if (a %% 2 != 0) {
  print("The number is odd")
}  
}
# Check if 5 is even or odd using the function
even_odd(5)
## [1] "The number is odd"

2.5 The else statement in R

The else statement complements the if statement, allowing user to specify an alternative course of action when the condition is not met.

While the if statement handles the case when the condition is TRUE, the else statement handles what to do when the condition is FALSE.

Example for else statment

we create a function to take a number and print “The number is even” if it is even and else print “The number is odd”.

# Define a function 'even_odd' to determine if a number is even or odd
even_odd <- function(a){
#is the remainder zero 
if (a %% 2 == 0) {
  print("The number is even")
} else {
  print("The number is odd")
}  
}
# Check if 6 is even or odd using the function
even_odd(6)
## [1] "The number is even"
#check 11
even_odd(11)
## [1] "The number is odd"

2.6 The ifelse Statement in R

ifelse(test, yes, no)

1- test A logical test or condition.

2- yes Value to return when the condition is TRUE.

3- no Value to return when the condition is FALSE.

ifelse evaluates the test condition for each element, returning elements from yes or no based on the corresponding TRUE or FALSE result.

Example for ifelse

# Given grades for 4 students
grades <- c(85, 72, 94, 60)
# Determine whether each student passed or failed
pass_fail <- ifelse(grades >= 70, "Pass", "Fail")
# Print the results
pass_fail
## [1] "Pass" "Pass" "Pass" "Fail"
# Simulate rolling a die 10 times and determine win/loss for each roll
dice <- sample(die, size = 10, replace = TRUE)
win_loss <- ifelse(dice == 6, "win", "loss")
# Print the results
win_loss
##  [1] "loss" "win"  "loss" "loss" "loss" "win"  "loss" "win"  "win"  "loss"
# What proportion of wins do you expect if you roll the die many times?

2.7 loops in R

2.8 for Loop in R

A for loop is a valuable tool for automating repetitive tasks by executing a specific block of code multiple times.

It’s particularly useful when you know the exact number of iterations needed.

The structure of for loop in R is shown below:

Example for for loop

# Use a for loop to calculate the square of numbers from 1 to 10
for(i in 1:10) {    
  x1 <- i^2
  print(x1)
}
## [1] 1
## [1] 4
## [1] 9
## [1] 16
## [1] 25
## [1] 36
## [1] 49
## [1] 64
## [1] 81
## [1] 100
# Calculate factorial using a for loop
calculate_factorial <- function(n) {
  #initiate factorial to be 1
  factorial <- 1
#factorial of 0 is 1!
#prints warning if the number is not non negative!
 if (n < 0) print("Warning message: Factorial can only calculated for non negative integers") 
 if (n > 0){
  for (i in 1:n) {
    factorial <- factorial * i
 }#end for
}#end if
  return(factorial)
}#end function
# Calculate factorial of 3
calculate_factorial(3)
## [1] 6
# Calculate factorial of 0
calculate_factorial(0)
## [1] 1

2.9 The while Loop in R

A while loop is a control structure that allows you to repeatedly execute a block of code as long as a specified condition is true.

The loop continues to run as long as the condition remains true, and it stops once the condition becomes false.

The condition is a logical expression that is checked before each iteration of the loop.

Somewhere in the body of the loop, there should be a mechanism to change the condition.

Be careful with while loops to ensure that the loop termination condition will eventually be met, or else you could end up in an infinite loop, causing the program to run indefinitely.

Example for while loop

We want to find the smallest power of 2 greater than 1000

# Find the smallest power of 2 greater than 1000 using while loop
# Current number (initialized to 2^0)
number <- 1
# Current power (initialized to 0)
power <- 0
# Start a while loop
while (number <= 1000) {
  #Calculate the next power of 2
  number <- 2 ^ power
  # Increment the power
  power <- power + 1
}
#I use paste to combine a text with the value of power
print(paste("The smallest power of 2 greater than 1000 is:", number))
## [1] "The smallest power of 2 greater than 1000 is: 1024"

2.10 repeat Loop in R

2.11 Example repeat loop

We want to find the smallest power of 2 greater than 1000 this time using repeat.

# Find the smallest power of 2 greater than 1000
# Initialize variables
number <- 1   # Current number (initialized to 2^0)
power <- 0    # Current power (initialized to 0)

# Start a repeat loop
repeat {
  # Calculate the next power of 2
  number <- 2^power
  
  # Check if the number is greater than 1000
  if (number > 1000) {
    break  # Exit the loop if condition is met
  }
  
  # Increment the power
  power <- power + 1
}

# Print the result
print(paste("The smallest power of 2 greater than 1000 is:", number))
## [1] "The smallest power of 2 greater than 1000 is: 1024"

2.12 Nested Loops in R

Nested loops in R involve placing one loop inside another, creating a powerful way to solve complex problems by breaking them down into smaller steps.

In a nested loop, the inner loop runs completely for each iteration of the outer loop.

Be cautious of potential performance impacts with deeply nested loops.

2.13 Example for nested loop

Example: Let’s say you want to print a multiplication table. You can achieve this using nested loops.

# multiply a combination of 1 to 5 to 1 to 5
for (i in 1:5) {
  for (j in 1:5) {
    result <- i * j
  #print current multiplication 
    cat(i, "x", j, "=", result, "\t")
  }
  #print in the next line
  cat("\n")
}
## 1 x 1 = 1    1 x 2 = 2   1 x 3 = 3   1 x 4 = 4   1 x 5 = 5   
## 2 x 1 = 2    2 x 2 = 4   2 x 3 = 6   2 x 4 = 8   2 x 5 = 10  
## 3 x 1 = 3    3 x 2 = 6   3 x 3 = 9   3 x 4 = 12  3 x 5 = 15  
## 4 x 1 = 4    4 x 2 = 8   4 x 3 = 12  4 x 4 = 16  4 x 5 = 20  
## 5 x 1 = 5    5 x 2 = 10  5 x 3 = 15  5 x 4 = 20  5 x 5 = 25  

Exercise 4 loop

  1. Run the die_roll_2 function for the sum of two dice multiple times and calculate the long-term (1000 times) average of the sum of two dice. Additionally, compute the standard deviation of the long-term average using the sd() function.

  2. Run the die_roll_2 function for the sum of two dice repeatedly until you roll a total of 12. Determine how many times you need to roll to achieve a sum of 12.

  3. Execute part b 1000 times and calculate the average number of times it takes to roll two dice and obtain a sum of 12 over the long term.

Solution to Exercise 4 loop

# Define the function to roll two dice and get the sum
die_roll_2 <- function() {
  die1 <- sample(1:6, 1)
  die2 <- sample(1:6, 1)
  return(die1 + die2)
}

# a) Calculate average and standard deviation of sum of two dice
#set the total runs to 1000
total_runs <- 1000
#Initialize sums to save result of each iterations
sums <- rep(NA, total_runs)

for (i in 1:total_runs) {
  sums[i] <- die_roll_2()
}

average_sum <- mean(sums)
std_dev <- sd(sums)

cat("a) Long-run average of sum of two dice:", average_sum, "\n")
## a) Long-run average of sum of two dice: 7.086
cat("   Standard deviation of sum of two dice:", std_dev, "\n")
##    Standard deviation of sum of two dice: 2.362665
# b) Find number of rolls to get sum of 12
#Initialize the while loop
rolls_to_get_12 <- 0
sum_result <- 0

while (sum_result != 12) {
  sum_result <- die_roll_2()
  rolls_to_get_12 <- rolls_to_get_12 + 1
}

cat("b) Number of rolls to get sum of 12:", rolls_to_get_12, "\n")
## b) Number of rolls to get sum of 12: 69
# c) Calculate average number of rolls to get sum of 12 over 1000 runs
# set the total iterations to 1000
total_runs_c <- 1000
# Create a vector of size total_runs_c to keep result of each iteration
rolls_to_get_12_c <- rep(NA, total_runs_c)
#run for total_runs_c
for (i in 1:total_runs_c) {
#Initialize the while loop
  rolls <- 0
  sum_result <- 0
#Start while loop
  while (sum_result != 12) {
    sum_result <- die_roll_2()
    rolls <- rolls + 1
  }
# save the result of each iterations
  rolls_to_get_12_c[i] <- rolls
}

average_rolls_c <- mean(rolls_to_get_12_c)

cat("c) Average number of rolls to get sum of 12 over 1000 runs:", average_rolls_c, "\n")
## c) Average number of rolls to get sum of 12 over 1000 runs: 35.489

2.14 Interruption and Exit Loops

break

next

2.15 Efficency in loops

Loops in R, while powerful, can be time-consuming operations. It’s crucial to consider their performance implications, especially when dealing with large datasets.

Factors Affecting Efficiency:

Whenever possible, replace explicit loops with vectorized operations. R’s built-in vectorized functions are optimized for speed.

Avoid repeating the same calculations within a loop. Instead, compute the value once and reuse it.

In some cases, it’s possible to parallelize loops to utilize multiple CPU cores, improving performance significantly.

Example for Efficency of for Loop

#efficient loop
system.time({
  output <- NA
  for(i in 1:1000000) {
    output[i] <- i + 1
    }
}
)
##    user  system elapsed 
##    0.17    0.03    0.22
system.time({
  #loop with Preallocation
  output <- rep(NA, 1000000)
  for (i in 1:1000000) {
    output[i] <- i + 1
  }
}
)
##    user  system elapsed 
##    0.03    0.00    0.05

2.16 Vectorization of Operation in R

Vectorization is a fundamental concept in R that simplifies performing operations on multiple data points (vectors or arrays) at once.

Vectorized operations eliminate the need for explicit loops, making code concise, efficient, and easier to read.

Improved Performance: Vectorized operations use R’s built-in capabilities for faster execution.

Simplicity: Code becomes more compact, like using simple math.

Readability: Vectorized code is easier to understand, as it processes entire data structures.

Example for Vectorization of Operation in R

# Without Vectorization
x <- c(1, 2, 3)
y <- c(4, 5, 6)
#set z to be a vector of zeros
z <- numeric(length(x))
#running a for loop to add x and y element by element
for (i in 1:length(x)) {
  z[i] <- x[i] + y[i]
}

# With Vectorization
x <- c(1, 2, 3)
y <- c(4, 5, 6)
z <- x + y

2.17 Mathematical operation in R

When working with vectors, R employs element-wise execution for standard mathematical operations. element-wise execution.

If you provide R with two vectors of unequal lengths, it will repeat the shorter vector until it matches the length of the longer vector before performing the math.

# To subtracted each element of die by 1
die - 1
## [1] 0 1 2 3 4 5
# To multiply each element of die by 2
die * 2
## [1]  2  4  6  8 10 12
#element by element operation of two vector
# returns 1 + 1 2 + 2 3 + 1 4 + 2 5 + 1 6 + 2 
die + die[1:2]
## [1] 2 4 4 6 6 8
#returns 1 * 1 2 * 2 3 * 3 4 * 4 5 * 1 6 * 2 with Warning
die * die[1:4] 
## Warning in die * die[1:4]: longer object length is not a multiple of shorter
## object length
## [1]  1  4  9 16  5 12
#inner multiplication of two vector
die %*% die
##      [,1]
## [1,]   91
#outer multiplication of two vector  
die %o% die
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    2    3    4    5    6
## [2,]    2    4    6    8   10   12
## [3,]    3    6    9   12   15   18
## [4,]    4    8   12   16   20   24
## [5,]    5   10   15   20   25   30
## [6,]    6   12   18   24   30   36

2.18 The apply Family of Functions in R

The apply family of functions in R provides an efficient way to apply a function to the rows or columns of matrices and arrays.

Instead of using loops, apply functions simplify complex operations, enhancing code readability and performance.

apply(): Apply a function to rows or columns of a matrix or array.

sapply(): Simplify the result of apply() into a vector or array.

lapply(): Apply a function to each element of a list and return a list.

tapply(): Apply a function to subsets of a vector based on factors.

Compact and readable code.

Avoids explicit loops.

Compatible with various data structures.

replicate is a wrapper for the common use of sapply for repeated evaluation of an expression (which will usually involve random number generation).

vapply

is similar to sapply, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use.

The apply family offers an elegant solution for applying functions across rows, columns, and elements, streamlining code and enhancing performance in R.

2.19 Example for apply

## -------------------------------
# Calculate the mean of each column of a matrix using 'apply'
matrix_data <- matrix(1:12, nrow = 3)
col_sums <- apply(matrix_data, 2, mean)

# Calculate the square root of each element in a vector using 'sapply'
sqrt_12 <- sapply(1:12, sqrt)
sqrt_12
##  [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
##  [9] 3.000000 3.162278 3.316625 3.464102
# Alternatively, calculate the square root using the exponentiation operator
(1:12) ^ .5
##  [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
##  [9] 3.000000 3.162278 3.316625 3.464102

2.20 Exercise 5 loop using replicate

Use the function replicate or sapply to solve Exercise 4 for loop

2.21 Solution Exercise 5 replicate

# Define the function to roll two dice and get the sum
die_roll_2 <- function() {
  die1 <- sample(1:6, 1)
  die2 <- sample(1:6, 1)
  return(die1 + die2)
}

# a) Calculate average and standard deviation of sum of two dice
total_runs <- 1000
sums <- replicate(total_runs, die_roll_2())
#or using sapply but does not work unless function gets an argument
#for example die number of side
#sums <- sapply(rep(6, total_runs), FUN = die_roll_2)
average_sum <- mean(sums)
std_dev <- sd(sums)

cat("a) Long-run average of sum of two dice:", average_sum, "\n")
## a) Long-run average of sum of two dice: 7.048
cat("   Standard deviation of long-run average:", std_dev, "\n")
##    Standard deviation of long-run average: 2.412782
# b) Find number of rolls to get sum of 12
get_rolls_to_get_12 <- function() {
  #Initialize the while loop
  rolls_to_get_12 <- 0
  sum_result <- 0
  #start while
  while (sum_result != 12) {
    sum_result <- die_roll_2()
    rolls_to_get_12 <- rolls_to_get_12 + 1
  }
  
  return(rolls_to_get_12)
}

rolls_to_get_12 <- get_rolls_to_get_12()
cat("b) Number of rolls to get sum of 12:", rolls_to_get_12, "\n")
## b) Number of rolls to get sum of 12: 34
# c) Calculate average number of rolls to get sum of 12 over 1000 runs
total_runs_c <- 1000
rolls_to_get_12_c <- replicate(total_runs_c, get_rolls_to_get_12())

average_rolls_c <- mean(rolls_to_get_12_c)

cat("c) Average number of rolls to get sum of 12 over 1000 runs:", average_rolls_c, "\n")
## c) Average number of rolls to get sum of 12 over 1000 runs: 35.267