📊 Sheetly

Home → Data Science & ML → R Programming - Complete Reference Guide

R Programming - Complete Reference Guide Cheat Sheet

Last Updated: November 21, 2025

Basic Data Types

Type	Example
Numeric	`x <- 42.5`
Integer	`y <- 42L`
Character	`name <- "John"`
Logical	`flag <- TRUE`
Complex	`z <- 3 + 2i`
Check type	`class(x) # typeof(x) for detailed type`
Convert to numeric	`as.numeric("42")`
Convert to character	`as.character(42)`

Data Structures

Structure	Example
Vector	`v <- c(1, 2, 3, 4, 5)`
List	`lst <- list(name="John", age=30, scores=c(85,90,92))`
Matrix	`m <- matrix(1:9, nrow=3, ncol=3)`
Data frame	`df <- data.frame(name=c("A","B"), value=c(1,2))`
Factor	`f <- factor(c("low","med","high"), levels=c("low","med","high"))`
Access vector	`v[1] # First element (1-indexed)`
Access list	`lst$name # or lst[[1]]`
Access data frame	`df$name # or df[1,] or df[,1]`

dplyr Verbs (Data Manipulation)

Function	Description
select()	`df %>% select(name, age) # Choose columns`
filter()	`df %>% filter(age > 25) # Filter rows`
arrange()	`df %>% arrange(age) # Sort ascending`
arrange(desc())	`df %>% arrange(desc(age)) # Sort descending`
mutate()	`df %>% mutate(age_10 = age + 10) # Add column`
summarise()	`df %>% summarise(mean_age = mean(age))`
group_by()	`df %>% group_by(category) %>% summarise(avg = mean(value))`
count()	`df %>% count(category) # Count occurrences`
distinct()	`df %>% distinct(name) # Unique values`
rename()	`df %>% rename(new_name = old_name)`
slice()	`df %>% slice(1:5) # First 5 rows`
pull()	`df %>% pull(age) # Extract column as vector`

ggplot2 Basics

Layer	Example
Initialize plot	`ggplot(data = df, aes(x = var1, y = var2))`
Scatter plot	`+ geom_point()`
Line plot	`+ geom_line()`
Bar plot	`+ geom_bar(stat="identity")`
Histogram	`+ geom_histogram(binwidth=5)`
Box plot	`+ geom_boxplot()`
Add title	`+ labs(title="My Plot", x="X Label", y="Y Label")`
Change theme	`+ theme_minimal() # theme_bw(), theme_classic()`
Color by variable	`aes(color = category)`
Facet wrap	`+ facet_wrap(~category)`
Save plot	`ggsave("plot.png", width=8, height=6)`

Statistical Functions

Function	Description
mean()	`mean(c(1,2,3,4,5)) # Average`
median()	`median(c(1,2,3,4,5))`
sd()	`sd(values) # Standard deviation`
var()	`var(values) # Variance`
min() / max()	`min(values) # Minimum value`
sum()	`sum(c(1,2,3,4,5)) # Sum all values`
range()	`range(values) # Min and max`
quantile()	`quantile(values, c(0.25, 0.75)) # Quartiles`
IQR()	`IQR(values) # Interquartile range`
cor()	`cor(x, y) # Correlation coefficient`
cov()	`cov(x, y) # Covariance`
scale()	`scale(values) # Standardize (z-scores)`

Statistical Tests

Test	Code
T-test	`t.test(x, y) # Two-sample t-test`
One-sample t-test	`t.test(x, mu=0) # Test against mean`
Paired t-test	`t.test(x, y, paired=TRUE)`
Chi-square test	`chisq.test(table(x, y))`
ANOVA	`aov(value ~ group, data=df)`
Linear regression	`lm(y ~ x, data=df)`
Shapiro-Wilk test	`shapiro.test(values) # Test normality`
Wilcoxon test	`wilcox.test(x, y) # Non-parametric alternative`

Data Import/Export

Operation	Code
Read CSV	`df <- read.csv("data.csv")`
Read CSV (readr)	`df <- read_csv("data.csv") # Faster, better`
Write CSV	`write.csv(df, "output.csv", row.names=FALSE)`
Read Excel	`library(readxl) df <- read_excel("data.xlsx")`
Write Excel	`library(writexl) write_xlsx(df, "output.xlsx")`
Read RDS	`df <- readRDS("data.rds") # R native format`
Write RDS	`saveRDS(df, "data.rds")`

String Manipulation (stringr)

Function	Example
str_length()	`str_length("hello") # Returns 5`
str_to_upper()	`str_to_upper("hello") # "HELLO"`
str_to_lower()	`str_to_lower("HELLO") # "hello"`
str_trim()	`str_trim(" hello ") # "hello"`
str_replace()	`str_replace("hello", "l", "r") # "herlo"`
str_replace_all()	`str_replace_all("hello", "l", "r") # "herro"`
str_detect()	`str_detect("hello", "ell") # TRUE`
str_subset()	`str_subset(c("apple","banana"), "a") # Both`
str_split()	`str_split("a,b,c", ",") # List of vectors`

Control Flow

Statement	Example
If statement	`if(x > 5) { print("Big") }`
If-else	`if(x > 5) { print("Big") } else { print("Small") }`
If-else if	`if(x > 10) { } else if(x > 5) { } else { }`
For loop	`for(i in 1:10) { print(i) }`
While loop	`while(x < 10) { x <- x + 1 }`
Vectorized if	`ifelse(x > 5, "Big", "Small")`

Apply Functions (Avoiding Loops)

Function	Use Case
apply()	`apply(matrix, 1, sum) # Apply to rows (1) or cols (2)`
lapply()	`lapply(list, function(x) x*2) # Returns list`
sapply()	`sapply(list, function(x) x*2) # Returns vector`
mapply()	`mapply(sum, list1, list2) # Multivariate apply`
tapply()	`tapply(values, groups, mean) # Apply by group`

Tidyr Functions (Reshaping Data)

Function	Description
pivot_longer()	`df %>% pivot_longer(cols=c(col1,col2), names_to="var", values_to="val")`
pivot_wider()	`df %>% pivot_wider(names_from=var, values_from=val)`
separate()	`df %>% separate(col, into=c("part1","part2"), sep="-")`
unite()	`df %>% unite("new_col", col1, col2, sep="-")`
drop_na()	`df %>% drop_na() # Remove rows with NA`
replace_na()	`df %>% replace_na(list(col1=0))`

💡 Pro Tips:

Use <- for assignment (not =) to follow R conventions
Install tidyverse with install.packages("tidyverse") for dplyr, ggplot2, and more
Use %>% pipe operator to chain operations for readable code
View(df) opens data frame in spreadsheet-like viewer
Use head(df) and tail(df) to preview data
str(df) shows structure of data frame
summary(df) provides statistical summary
Use RStudio for best R programming experience

← Back to Data Science & ML | Browse all categories | View all cheat sheets