An Introduction to Statistical Learning: with Applications in R
2020-12-22
Preface
I (not an author) am compiling this book for myself as a learning exercise for both the contents of ISLR and the use of the bookdown package.
From the book’s official website:
This book provides an introduction to statistical learning methods. It is aimed for upper level undergraduate students, masters students and Ph.D. students in the non-mathematical sciences. The book also contains a number of R labs with detailed explanations on how to implement the various methods in real life settings, and should be a valuable resource for a practicing data scientist.
Resources used in making this book
- The first place you should go to learn about bookdown.
- Getting code folding to work using a variety of StackOverflow posts
Custom functions used throughout the book
In order to make some of the tables and plots, I wrote some custom functions.
They are contained in the file visualization_functions.R
, but here is a description of each function and why it is used.
theme_islr()
This is a plot theming function to mimic the figure style found in the book.
theme_islr = function(){
theme_bw(base_size = 14, base_family = "Roboto") %+replace% #replace elements we want to change
theme(
#grid elements
panel.grid.major = element_blank(), #strip major gridlines
panel.grid.minor = element_blank(), #strip minor gridlines
#text elements
plot.title = element_text( #title
face = 'bold', #bold typeface
hjust = 0, #left align
vjust = 2), #raise slightly
plot.caption = element_text( #caption
hjust = 1), #right align
axis.text.x = element_text( #margin for axis text
margin=margin(5, b = 10))
#since the legend often requires manual tweaking
#based on plot content, don't define it here
)
}
prep_reg_table()
This function takes the output from an lm()
object and formats it for regression tables.
prep_reg_table = function(reg_model){
# Select the coefficients object from the summary of the regression model
summary(reg_model)$coefficients %>%
# Convert it to a tibble, and make the rownames a column
as_tibble(rownames = "term") %>%
# Rename the columns to match ISLR's tables
rename(Coefficient = Estimate,
`t-statistic` = `t value`,
pval = `Pr(>|t|)`) %>%
mutate(
# Replace the shorthand formulas given to lm() with regression publishing style
term = str_replace_all(term, c("\\:" = " X ", # Interaction terms
"I(?=\\()|[\\(\\)]" = "")), # Identity function
# Create a p-value column which shows a less than sign in cases of very small p-values
`p-value` = case_when(pval < .0001 ~ "<0.0001",
T ~ str_c(round(pval, 4)))) %>%
select(-pval)
}