---
title: "Regression Tables with huxreg"
author: "David Hugh-Jones"
date: '`r Sys.Date()`'
output:
html_document: default
vignette: >
%\VignetteIndexEntry{Regression Tables with huxreg}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(huxtable)
is_latex <- guess_knitr_output_format() == "latex"
# knitr::knit_hooks$set(
# barrier = function(before, options, envir) {
# if (! before && is_latex) knitr::asis_output("\\FloatBarrier")
# }
# )
#
# if (is_latex) knitr::opts_chunk$set(barrier = TRUE)
options(huxtable.latex_siunitx_align = FALSE)
```
## Regression tables with `huxreg`
Huxtable includes the function `huxreg` to build a table of regressions.
You call `huxreg` with a list of models. These models can be of any class
which has a `tidy` method defined in the
[broom](https://cran.r-project.org/?package=broom) package. The method should
return a list of regression coefficients with names `term`, `estimate`,
`std.error` and `p.value`. That covers most standard regression packages.
Let's start by running some regressions to predict a diamond's price.
```{r}
data(diamonds, package = "ggplot2")
diamonds <- diamonds[1:100,]
lm1 <- lm(price ~ carat + depth, diamonds)
lm2 <- lm(price ~ depth + factor(color, ordered = FALSE), diamonds)
lm3 <- lm(log(price) ~ carat + depth, diamonds)
```
Now, we use `huxreg` to display the regression output side by side.
```{r}
huxreg(lm1, lm2, lm3)
```
The basic output includes estimates, standard errors and summary statistics.
Some of those variable names are hard to read. We can change them by providing
a named vector of variables in the `coefs` argument.
```{r}
color_names <- grep("factor", names(coef(lm2)), value = TRUE)
names(color_names) <- gsub(".*)(.)", "Color: \\1", color_names)
huxreg(lm1, lm2, lm3, coefs = c("Carat" = "carat", "Depth" = "depth", color_names))
```
Or, since the output from `huxreg` is just a huxtable, we could just
edit its contents directly.
```{r}
diamond_regs <- huxreg(lm1, lm2, lm3)
diamond_regs[seq(8, 18, 2), 1] <- paste("Color:", LETTERS[5:10])
# prints the same as above
```
Of course, we aren't limited to just changing names. We can also make our table
prettier. Let's put our footnote in italic, add a caption, and highlight
the cell background of significant coefficients. All of these are just standard
huxtable commands.
```{r}
suppressPackageStartupMessages(library(dplyr))
diamond_regs %>%
map_background_color(-1, -1, by_regex(
"\\*" = "yellow"
)) %>%
set_italic(final(1), 1) %>%
set_caption("Linear regressions of diamond prices")
```
By default, standard errors are shown below coefficient estimates. To display
them in a column to the right, use `error_pos = "right"`:
```{r}
huxreg(lm1, lm3, error_pos = "right")
```
This will give column headings a column span of 2.
To display standard errors in the same cell as estimates, use `error_pos = "same"`:
```{r}
huxreg(lm1, lm3, error_pos = "same")
```
You can change the default column headings by naming the model arguments:
```{r}
huxreg("Price" = lm1, "Log price" = lm3)
```
To display a particular row of summary statistics, use the `statistics`
parameter. This should be a character vector. Valid values are anything returned
from your models by `broom::glance`:
```{r}
gl <- as_hux(broom::glance(lm1))
gl %>%
restack_down(cols = 3, on_remainder = "fill") %>%
set_bold(odds, everywhere)
```
Another value you can use is `"nobs"`, which
returns the number of observations from the regression. If the `statistics`
vector has names, these will be used for row headings:
```{r}
huxreg(lm1, lm3, statistics = c("N. obs." = "nobs",
"R squared" = "r.squared", "F statistic" = "statistic",
"P value" = "p.value"))
```
By default, `huxreg` displays significance stars. You can alter the symbols used
and significance levels with the `stars` parameter, or set `stars = NULL` to
turn off significance stars completely.
```{r}
huxreg(lm1, lm3, stars = c(`*` = 0.1, `**` = 0.05, `***` = 0.01)) # a little boastful?
```
You aren't limited to displaying standard errors of the estimates. If you
prefer, you can display t statistics or p values, using the `error_format`
option. Any column from `tidy` can be used by putting it in curly brackets:
```{r}
# Another useful column: p.value
huxreg(
lm1, lm3,
error_format = "[{statistic}]",
note = "{stars}. T statistics in brackets."
)
```
Here we also changed the footnote, using `note`. If `note` contains the string
`"{stars}"` it will be replaced by a description of the significance stars used.
If you don't want a footnote, just set `note = NULL`.
Alternatively, you can display confidence intervals. Use `ci_level` to set the
confidence level for the interval, then use `{conf.low}` and `{conf.high}` in
`error_format`:
```{r}
huxreg(lm1, lm3, ci_level = .99, error_format = "({conf.low} -- {conf.high})")
```
To change number formatting, set the `number_format` parameter. This works the
same as the `number_format` property for a huxtable - if it is numeric, numbers
will be rounded to that many decimal places; if it is character, it will be
taken as a format to the base R `sprintf` function. `huxreg` tries to be smart
and to format summary statistics like `nobs` as integers.
```{r}
huxreg(lm1, lm3, number_format = 2)
```
Lastly, if you want to bold all significant coefficients, set the parameter
`bold_signif` to a maximum significance level:
```{r}
huxreg(lm1, lm3, bold_signif = 0.05)
```
## Altering data
Sometimes, you want to report different statistics for a model. For example, you
might want to use robust standard errors.
One way to do this is to pass a `tidy`-able test object into `huxreg`. The
function `coeftest` in the "lmtest" package has `tidy` methods defined:
```{r}
library(lmtest)
library(sandwich)
lm_robust <- coeftest(lm1, vcov = vcovHC, save = TRUE)
huxreg("Normal SEs" = lm1, "Robust SEs" = lm_robust)
```
If that is not possible, you can compute statistics yourself and add them to
your model using the `tidy_override` function:
```{r}
lm_fixed <- tidy_override(lm1, p.value = c(0.5, 0.2, 0.06))
huxreg("Normal p values" = lm1, "Supplied p values" = lm_fixed)
```
You can override any statistics returned by `tidy` or `glance`.
If you want to completely replace the output of tidy, use the `tidy_replace()`
function. For example, here's how to print different coefficients for a
multinomial model.
```{r}
mnl <- nnet::multinom(gear ~ mpg, mtcars)
tidied <- broom::tidy(mnl)
models <- list()
models[["4 gears"]] <- tidy_replace(mnl, tidied[tidied$y.level == 4, ])
models[["5 gears"]] <- tidy_replace(mnl, tidied[tidied$y.level == 5, ])
huxreg(models, statistics = "AIC")
```