In this section of the course, you get to spend time learning more about programming in R with RStudio. You get to learn core functions and variables that define R programming as well as how well to use them. Also, you will have a hands-on experience on R packages, for instance, how they can be used in the context of programming in R.
This is such an important knowledge area to be learned for obtaining Coursera’s Google Data Analytics Professional Certification and, indeed, for all other industry-recognized certifications concerning R programming. This will prepare you to use R efficiently and accurately to do data analysis.
Learning Objectives:
Understand what comes in and with the tidyverse package in R.
Grasp the notional concept and alertness for the value of packages in R programming.
Employ R operators for computation exercises.
Master fundamental ideas of R programming, that includes functions, variables, data types, pipes, and vectors.
Install and load the tidyverse package for data analysis.
Use the browseVignettes(“packagename”) function to explore documentation and examples for a loaded package.
Find resources to seek help and guidance while using R.
Test your knowledge on programming concepts
1. Why do analysts use comments In R programming? Select all that apply.
To explain their code (Correct)
To provide names for variables
To act as functions
To make an R Script more readable (Correct)
Correct: Comments in R programming are used in understanding a particular code and providing its readability into the script of an R file. It helps clarify the purpose of that code, making it simpler and easy to understand with maintenance aspects, especially during cooperation with other people and revisiting that script in the future. Comments in R are created by putting a # symbol before the text. For Example:
2. What should you use to assign a value to a variable in R?
A vector
An argument
An operator (Correct)
A comment
Correct: In R, you utilize an operator to assign a value to a variable. In fact, you put the name of the variable before this operator and use it for the assignment of a value.
3. Which of the following examples is the proper syntax for calling a function in R?
#first
data_1
<- 20
print() (Correct)
Correct: In R, the syntax of function is the function name followed by parentheses. For example, the function print comes with the parentheses: print(). When an argument is inside the parentheses of the print() function, then that defined argument will be displayed in the console pane of the RStudio.
4. Which of the following examples can you use in R for date/time data? Select all that apply.
seven-24-2018
2019-04-16 (Correct)
06:11:13 UTC (Correct)
2018-12-21 16:35:28 UTC (Correct)
Correct: Be that as it may, such specifications of date/time data types as 06:11:13 UTC, 2019-04-16, and 2018-12-21 16:35:28 UTC can be used in R. The format of the date/time data value can be recognized by R and accepted as valid.
Test your knowledge on coding in R
1. An analyst includes the following calculation in their R programming:
Which variable will the total from this calculation be assigned to?
midyear_sales (Correct)
quarter_1_sales
quarter_2_sales
overhead_costs
Correct: The result is assigned to midyear_sales, as calculated. The assignment operator is being used to store the value of the calculated total in this variable after the variable name, midyear_sales.
2. An analyst is checking the value of the variable x using a logical operator, so they run the following code:
x > 35 & x < 65
Which values of x would return TRUE when the analyst runs the code? Select all that apply.
35
50 (Correct)
60 (Correct)
70
Correct: The numbers 50 and 60 will return TRUE if executed among the values in the code x>35 & x<65. This logical operator & combines the two conditions such that the server will return the value TRUE when x is at once greater than 35 and less than 65.
3. A data analyst inputs the following code in RStudio:
sales_1 <- 100 * sales_2
Which of the following types of operators does the analyst use in the code? Select all the apply.
relational
assignment
arithmetic (Correct)
logical (Correct)
Correct: Both arithmetic and assignment operators are used by the analyst in the code. The result of the expression 100 * sales_2 gets assigned to sales_1 with the help of the assignment operator <-. The arithmetic operation, i.e. multiplication, is done through the multiplication operator * which performs multiplying the sales_2’s value with 100.
test your knowledge on R packages
1. When using RStudio, what does the installed.packages() function do?
Creates code for analysts to use to edit their packages
Installs all available packages for use in an RStudio session
Selects the best packages to use based on an analyst’s current needs
Presents a list of packages currently installed in an RStudio session (Correct)
Correct: By utilizing the installed.packages() function, you can get a display of all the packages that are currently installed in that RStudio session and see which names of packages are potentially resources you would need to use functions from the respective package.
2. In data analytics, what is CRAN?
A commonly used online archive with R packages and other R resources (Correct)
An R interface that has many of the same functions as RStudio
A function for finding packages to use for analysis in RStudio
A collection of packages that function together to make analysis in R more efficient
Correct: The Comprehensive R Archive Network or CRAN is a complete online repository that holds all R packages and sorts of other resources. This network guarantees that all the R resources it offers are quality worthy and authentic.
3. What are ggplot2, tidyr, dplyr, and forcats all a part of?
A list of functions that clean data efficiently
A collection of core tidyverse packages (Correct)
A list of variables for use in programming in RStudio
A collection of commonly used, CRAN-based data sets
Correct: Packages such as ggplot2, tidyr, dplyr, and forcats are a few of the eight core tidyverse packages. Other core tidyverse packages include: tibble, readr, purrr, and stringr. They have been used often in R to perform data manipulation, visualization, and analysis.
Test your knowledge on the tidyverse
1. When working in R, for which part of the data analysis process do analysts use the tidyr package?
Data cleaning (Correct)
Data security
Data calculations
Data visualization
Correct: Analysts apply the tidyr package in data cleaning. It is tailored for the manipulation of wide and long data formats, such that each part of a data table or data frame is in the most appropriate data type and organized in its respective place.
2. Which tidyverse package contains a set of functions, such as select(), that help with data manipulation?
ggplot2
dplyr (Correct)
readr
forcats
Correct: Dplyr is a tidyverse package that provides functionalities for data manipulation. For instance, the dplyr select() function enables one to focus on only certain variables in a data frame by their names, simplifying the required data in the analysis.
3. An analyst is organizing a dataset in RStudio using the following code:
Which of the following examples is a nested function in the code?
inventory
filter (Correct)
count
arrange
Correct: It works like this: filter() is an example of a nested function within the analyst’s code, which is contained in the argument of a larger function called arrange(). This would enable the analyst to use the filter function as many times as necessary and then use the output to be sorted or arranged according to some dictates.
Data Analysis with R Programming Weekly Challenge 2
1. Which of the following are examples of variable names that can be used in R? Select all that apply.
3_sales
_red_1
autos_5 (Correct)
Correct: Names of variables that are valid in the R programming example may include autos_5, utility2 and some more. Variable names can start with any letter and consist of numbers, underscores, or letters. However, every variable must start with a letter; it cannot start with a number.
2. You want to create a vector with the values 12, 23, 51, in that exact order. After specifying the variable, what R code chunk allows you to create the vector?
v(12, 23, 51)
c(12, 23, 51) (Correct)
v(51, 23, 12)
c(51, 23, 12)
Correct: The c(12, 23, 51) code fragment creates a vector with the values contained within its brackets, in this case, 12, 23, and 51. A vector in R is a sequence of same-typed data. Vectors are created through the c() function, where c stands for combine while the values you want to group in a vector are listed in parentheses.
3. An analyst comes across dates listed as strings in a dataset, for example December 10th, 2020. To convert the strings to a date/time data type, which function should the analyst use?
now()
lubridate()
datetime()
mdy() (Correct)
Correct: The analyst can use the mdy() function to convert strings to date/time data types. The mdy() function and its applicable variations of the ymd() function convert all string-based dates and times to those compatible with R as date/time data types. The mdy() function is for dates in the format “month-day-year.”
4. A data analyst inputs the following code in RStudio:
sales_1 <- (3500.00 * 12)
Which of the following types of operators does the analyst use in the code? Select all that apply.
Logical
Relational
Arithmetic (Correct)
Assignment (Correct)
Correct: In the command sales_1 <- (3500.00 * 12), either an assignment (using the operator <-) or an arithmetic operator (*) is used. The assignment operator has actually already done its job since it assigns the calculated value placed within the parenthesis to a variable called sales_1. An arithmetic operator is involved in the concatenation of the two values and completes the computation by multiplying 3500.00 and 12.
5. Which of the following files in R have names that follow widely accepted naming convention rules? Select all that apply.
patient_details_1.R (Correct)
patient_data.R (Correct)
p1+infoonpatients.R
title*123.R
Correct: The set of files of widely accepted naming convention-compliant file names include patient_data.R and patient_details_1.R. This file type is definitely best practice as it conforms to .R extensions, all lowercase letters, numbers, and underscores. It remains unambiguous, concise, and meaningful, therefore saying everything about what the files contain.
6. In R, what includes reusable functions and documentation about how to use the functions?
Vectors
Packages (Correct)
Pipes
Comments
Correct: R packages allow the storage of reusable R functions, as well as documentation of how to use those functions. R packages can also contain sample data and tests to check the correctness of your code; therefore, these packages are really attractive for analysis and development.
7. Packages installed in RStudio are called from CRAN. CRAN is an online archive with R packages and other R-related resources.
True (Correct)
False
Correct: In RStudio, packages are usually obtained from CRAN (Comprehensive R Archive Network). It is a virtual archive through which R packages and some other related resources are made available. CRAN makes sure that the quality of the packages falls within the purified levels and, thus, available for use in RStudio.
8. A data analyst is reviewing some code and finds the following code chunk:
What is this code chunk an example of?
Pipe (Correct)
Vector
Nested function
Data frame
Correct: This is an instance of a pipe. A pipe is a mechanism to communicate a series of multiple operations in R. For instance, you can filter and group operations, with the output of the first operation being directly passed to the next. The operator for a pipe is %>%, and it is provided by the dplyr package, which achieves greater readability and conciseness for the operation by chaining together the operations.
9. Fill in the blank: When creating a variable for use in R, your variable name should begin with _____.
a letter (CORRECT)
a number
an operator
an underscore
10. Which of the following statements about vectors in R are correct? Select all that apply.
Data elements are defined using curly braces.
All data must be stored in vectors.
All data elements must have the same data type. (CORRECT)
Data elements are stored in a sequence. (CORRECT)
11. An analyst runs code to convert string data into a date/time data type that results in the following: “2020-07-10”. Which of the following are examples of code that would lead to this return? Select all that apply.
myd(2020, July 10)
dmy(“7-10-2020”)
mdy(“July 10th, 2020”) (CORRECT)
ymd(20200710) (CORRECT)
12. A data analyst needs a system of packages that use a common design philosophy for data manipulation, exploration, and visualization. What set of packages fulfills their need?
Base
CRAN
tidyverse (CORRECT)
Recommended
13. A data analyst wants to take a data frame named people and filter the data where age is 10, arranged by height, and grouped by gender. Which code snippet would perform those operations in the specified order?
14. A data analyst wants to create functions, documentation, sample data sets, and code test that they can share and reuse in other projects. What should they create to help them accomplish this?
A tidyverse
A data frame
A data type
A package (CORRECT)
15. A data analyst is reviewing some code and finds the following code chunk:
mtcars %>%
filter(carb > 1) %>%
group_by(cyl) %>%
What is this code chunk an example of?
Data frame
Vector
Nested function
Pipe (CORRECT)
16. Which of the following are examples of variable names that can be used in R? Select all that apply.
_21alpha
21_alpha
alpha_21 (CORRECT)
alpha21 (CORRECT)
17. Which of the following are examples of variable names that can be used in R? Select all that apply.
person_1
1person
person1 (CORRECT)
person(1) (CORRECT)
18. If you use the mdy() function in R to convert the string “April 10, 2019”, what will return when you run your code?
“2019-4-10” (CORRECT)
“4.10.19”
“2019-10-4”
“4/10/2019”
19. A data analyst wants to combine values using mathematical operations. What type of operator would they use to do this?
Assignment
Conditional
Arithmetic (CORRECT)
Logical
20. A data analyst needs to find a package that offers a consistent set of functions that help them complete common data manipulation tasks like selecting and filtering. What tidyverse package provides this functionality?
tidyr
dplyr (CORRECT)
readr
ggplot2
21. A data analyst previously created a series of nested functions that carry out multiple operations on some data in R. The analyst wants to complete the same operations but make the code easier to understand for their stakeholders. Which of the following can the analyst use to accomplish this?
Argument
Vector
Comment
Pipe (CORRECT)
22. A data analyst is assigning a variable to a value in their company’s sales data set for 2020. Which variable name uses the correct syntax?
-sales-2020
2020_sales
sales_2020 (CORRECT)
_2020sales
23. A data analyst has a dataset that contains date strings like “January 10th, 2022.” What lubridate function can they use to convert these strings to dates?
myd()
dmy()
mdy() (CORRECT)
ymd()
Programming Using Rstudio INTRODUCTION
However, you will be able to perform your analysis much more efficiently and effectively through R. By now, you must have reached a point in your course wherein you will be able to explore the most basic and fundamental aspects related to R. You would also learn the basics tied to functions and variables in performing calculations and other programming activities.
Also, you will understand R packages- many of its functions to work together-to make your analytics sharper. With this fantastic tool and many others available on Coursera, think about continuing your learning journey.