R Notes
R is popular among statisticians and biologists. It is one of the most commonly used languages in data analysis.
Basics
R’s basic syntax is C-like. There are a few special usages compared to other languages:
<-
is R’s way to express assign a value to a variable. It can also direct the other way, i.e.->
. Sometimes it is equivalent to=
, but in some places only<-
is allowed. The recommended way is to use<-
and forget that equals is ever allowed.<--
is R’s global assigner. It can set global variables in a local scope.c()
is for combine scalars into vectors.list()
is for creating a collection with multiple object types (similar totuple
in other languages).- R’s vectorization is similar to MATLAB, but not strictly consistent. If you attempt to use a non-vectorized function on a vector, you will get warnings.
apply()
is likemap()
in Julia. There are more functions in the apply family. - R uses one-based indexes (GREAT!).
- Integers in R are shown as e.g.
2L
.3
is assumed to be the same as3.0
, i.e.numeric
. - Terminating R expressions: R doesn’t need semicolons to end a line of code (while it’s possible to put multiple commands on a single line separated by semicolons, you don’t see that very often). Instead, R uses line breaks (i.e., new line characters) to determine when an expression has ended.
- Last but not least, R is a case-sensitive language, consistent with the trend in most modern languages.
Popular Packages
ggplot2
: grammar of graphics. The most famous package in the R community.
library(ggplot2)
# Delete the points outside the limits
g + xlim(c(0, 0.1)) + ylim(c(0, 1000000))
ggplot(midwest, aes(x=area, y=poptotal)) +
geom_point(col="steelblue", size=3) + # Add scatter points
geom_smooth(method="lm") + # Add smoothing layer with a linear model "lm"
coord_cartesian(xlim=c(0,0.1), ylim=c(0, 1000000)) + # Zoom in
labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics") # Add titles and axis labels
tidyverse
: providing a complete and consistent set of tools for working with functions and vectors.
library(purrr)
mtcars |>
split(mtcars$cyl) |> # from base R
map(\(df) lm(mpg ~ wt, data = df)) |>
map(summary) %>%
map_dbl("r.squared")
#> 4 6 8
#> 0.5086326 0.4645102 0.4229655
This example illustrates some of the advantages of purrr functions over the equivalents in base R:
-
The first argument is always the data, so purrr works naturally with the pipe.
-
All purrr functions are type-stable. They always return the advertised output type (map() returns lists; map_dbl() returns double vectors), or they throw an error.
-
All
map()
functions accept functions (named, anonymous, and lambda), character vector (used to extract components by name), or numeric vectors (used to extract by position).
Advanced R
The tricks are more-or-less similar to MATLAB and Python, such as vectorization, lazy evaluation, and integrating faster languages like C++. See more in Advanced R.