```{r}
#| label: setup-block
# Modularly load functions
box::use(
modelsummary = modelsummary[datasummary_skim],
ggplot2 = ggplot2[ggplot, geom_histogram, aes, theme_minimal, labs]
)
# Create empty list object for datasets
penguins_list <- list()
# Load my dataset
penguins_list[['original']] <- palmerpenguins::penguins
```
Quarto is an excellent expansion upon RMarkdown. For those that have not had experience with either, it might be a little overwhelming. However, I hope to make the case in this post for why Quarto is worth learning and is particularly useful for increasing the replicability of your academic projects (or for various projects that you have if you are an analyst!).
Imagine you are able to write your R (or python) script, execute the code to make the tables and figures presenting the results of your analyses but you can also do your write up of the analysis all in the same document. If you’ve ever used a Jupyter notebook, it is kind of the same idea. What RMarkdown and Quarto add are that you are able to compile this document into a PDF document, presentation, or even in a blog post! With Quarto in particular, each of these outputs are extremely customizable and aren’t all too painful1 to put together and to look really nice.
As there are tons of options about what you can do with Quarto, this also means that it can appear rather daunting to get started with it. There is a wonderful community that offers great resources for how to get started, however, I thought I might put down my thoughts about it in the hope that this may be useful to someone who may be a bit overwhelmed getting started.
Getting Quarto set up
The first thing you should do is make sure that RStudio and R are both up-to-date. You can download the latest version of RStudio here and can download the latest version of R here.
If you use R with VSCode or some other IDE, you can use Quarto still! I use VSCode and there is a great extension for Quarto on VSCode that you can use!
The next thing you’ll need to do is to install Quarto itself. You can do so by going to Quarto’s Get Started page. Once you’ve downloaded Quarto, make sure to open up your IDE - either RStudio or whichever other IDE you use - and enter into your terminal:
Terminal
quarto install tools tinytex
You’ll need to install tinytex if you are wanting to compile any of your Quarto documents into a PDF. The reason you are doing this is that Quarto takes all of your code and the stuff you wrote around it, executes the code, temporarily stores the tables and figures and places them in a \(\LaTeX\) document, and then compiles it. \(\verb|tinytex|\) is a low-overhead TeX distribution that installs the packages, on-demand, it needs to compile the document into a PDF document.
If you are on a Windows machine, you may need to add \(\verb|.cmd|\) to your quarto command. I.e.,
Terminal
quarto.cmd install tools tinytex
Once you’ve completed this step, restart your IDE.
Now you are ready to start writing some documents. Let’s go through some examples for a presentation and for a manuscript (or report) you might be writing.
Writing a manuscript with Quarto
If you are an academic, you are probably familiar with the \(\LaTeX\) versus Word debate. Well, it appears that there is a challenger! You can create a PDF document that has all the wonderful signal-boosting of a \(\LaTeX\)-looking PDF document, without having to deal with Overleaf or some distribution to compile it locally.
The other benefit is that as you are writing both your R code and your manuscript in the same document, you are helping with making sure that your paper is extremely replicable! They not only are able to replicate your tables and figures, but also the manuscript itself! Awesome!
So, let’s start with the basics. The first thing that you need to do is specify your YAML section. With a PDF document, it’ll look something like this:
---
title: "Title of research project"
author:
- Author 1
- Author 2
- Author 3
abstract: |
The text of the abstract goes here. I put the horizontal line above so that I can then just write the body of the abstract on the next line like so.format:
pdf:
documentclass: article
self-contained: true
bibliography: project_bibliography_file.bib
execute:
echo: false
warnings: false
messages: false
---
Let’s break this down:
We are first going to provide some information about the title of the paper, the authors (we can list multiple), and the abstract.
Next I am going to provide some information about the format of the output. Since Quarto can create a bunch of different types of files, we should specify that the output we want is going to be a pdf document.
We are then going to specify some options for the pdf document by telling it that the \(\verb|documentclass|\) will be an article so that it kind of looks like a basic manuscript draft. I am also going to specify that I want it all to be self-contained so that, so long as the document compiles correctly, all the intermediary files it creates will be deleted and just leave me with my original .qmd file and my output, .pdf, file.
I am also going to specify the bibtex file that I am using for my references.
If you use a citation manager like Zotero, you can export parts of your library to a bibtex file.
- The last set of options I am going to specify in my YAML section of the document is that for the code that I am going to include that I want to be executed, I want to not have the underlying code show up in the final document (echo: false), if there are any warnings that show up to not show them in the final document (warnings: false), and if the code has any messages for those to also not show up in the final document (messages: false). I am then going to close the YAML section with the three dashed lines. The indentation of the lines matters! An indentation let’s me know that I am specifying an option for the above setting.
Now, let’s start writing our document. What I like to be my very first thing after the YAML section is to set up everything for R by creating an R coding block. In this section I usually load my packages, datasets, etc. To create a coding block, you should let Quarto know what it’s processing. So you do this with three of these `, and then close the code block with three of these `. When you start the coding block, you should also specify that the code is R code. The other thing you should do is label the code block. This will be helpful later for when you are generating tables and figures, but it is also helpful so that if, when compiling, there is an issue, then you know which chunk of code has the problem. Here is an example of a common setup block for me.
Now, what I can do is write my introduction, literature review, and theory sections.
# Introduction
Penguins are cute, adorable creatures, and are misunderstood.
# Literature review
@doe_2022 said, "Penguins are cute."
Whenever you want to reference something in Quarto, you can simply use the @ symbol. In the instance above, I am referencing a citation that is stored as doe_2022 in my .bib file. To start a new section, I can use the pound (hashtag, if you are a youngster) symbol. If I want to create a sub-section, I can do a double pound symbol. And surprise, if I want to create a sub-sub-section, I can do a triple pound symbol.
I will then include the text beneath the section. I do not need to indent the text as Quarto will do it for me when it compiles the document.
Now, I have gotten to the data section of my paper. I will need to start doing some analyses.
'original']] |>
penguins_list[[datasummary_skim(notes = 'Data source: Palmer Penguins')
Unique (#) | Missing (%) | Mean | SD | Min | Median | Max | ||
---|---|---|---|---|---|---|---|---|
bill_length_mm | 165 | 1 | 43.9 | 5.5 | 32.1 | 44.5 | 59.6 | |
bill_depth_mm | 81 | 1 | 17.2 | 2.0 | 13.1 | 17.3 | 21.5 | |
flipper_length_mm | 56 | 1 | 200.9 | 14.1 | 172.0 | 197.0 | 231.0 | |
body_mass_g | 95 | 1 | 4201.8 | 802.0 | 2700.0 | 4050.0 | 6300.0 | |
year | 3 | 0 | 2008.0 | 0.8 | 2007.0 | 2008.0 | 2009.0 | |
Data source: Palmer Penguins |
In the previous code block, I label the codeblock, but since I am also generating a table, I am also labeling the output of the codeblock as well. Since I am going to be generating a table, I am going to specify this in the label by using the tbl prefix. I also want to provide a caption for the table I am creating and will do this by specifying the option tbl-cap.
Now what I can do, is I can reference this table, Table 1, easily by simply using the @ symbol; the universal symbol for references.
Another awesome thing that I can do is run in-line code which allows me to discuss the results I have from a computation. For example, Table 1 demonstrates that the average bill length, in millimeters, is 43.92
. Here is this same sentence:
['original']]$bill_length_mm, na.rm = TRUE))```. For example, @tbl-descriptive-statistics demonstrates that the average bill length, in millimeters, is ```{r} sprintf('%.2f', mean(penguins_list[
In the above sentence I referenced my table and I was also able to include the result of a computation into my writing by using an in-line code block. This opens a ton of doors in terms of allowing my writing to adjust to the results of a computation. You can get really creative here - I’ve even done it for discussing my using if-else statements.
I can also create and display a figure!
'original']] |>
penguins_list[[ggplot() +
geom_histogram(aes(x = bill_length_mm), fill = '#eb6864', alpha = 0.3) +
theme_minimal() +
labs(x = 'Bill length (in mm)', y = 'Density', caption = 'Data source: Palmer Penguins')
I can reference the figure, Figure 1, again by using the @ symbol.
@fig-histogram shows...
Presentations with Quarto
Okay, so say that I wrote a draft of my manuscript (or report) and am going to need to present it to folks. What changes? Honestly, not much. The bulk of what changes happens in the YAML section.
There are a few options in terms of outputs for your presentation. My favorite is to use RevealJS. How it works is it essentially converts your Quarto document to Javascript and then converts that into a pretty HTML document. You can then open your presentation with a web browser (such as Chrome). The nice thing about RevealJS rather than specifying a Powerpoint or Beamer (PDF) option, is that the presentation includes more interactivity and it is a much more dynamic document.
Here’s what a standard YAML section for a presentation of mine looks like.
---
title: "Title of research project"
author:
- Author 1
- Author 2
format:
revealjs:
self-contained: true
theme: serif
incremental: true
scrollable: true
slide-number: true
toc: true
toc-depth: 1
slide-level: 3
code-overflow: wrap
bibliography: project_bibliography_file.bib
---
As you might be able to tell, the main thing that changes is in the format option.
I specify that my output is going to be a revealjs document (which automatically is converted to HTML).
I also specify the theme of the presentation - I find serif to be one that looks nice and has a number of nice properties such as a readable font.
I also specify that I want incremental to be true. This allows me to not show all of my bullet points on a given slide all at once.
I also specify that I want scrollable to be true. If one of my slides run long, the standard behavior is to just cut off the text that runs past the borders of the slide. However, I can change this by setting scrollable to true so that I can scroll through my slide and can display text that may be cut off otherwise.
I additionally include a table of contents (toc) as I often like to walk through the different milestones of a presentation I am doing. I specify the toc-depth so that not everything in my slides will be considered one of those milestones. The 1 refers to the section level. I’ll elaborate on what this means in a little bit.
I may want my slides to be broken up as sub-sub-sections rather than as sections or as sub-sections and so I specify the slide level as three. This means that whenever I want a new slide, I can just specify a new sub-sub-section. Again, I’ll explain this a bit more in a second.
I sometimes include my code in a slide (say its for teaching undergraduates in my methods classes) and I want the code to wrap if a given line is too long. I also specify where my bibtex file is as well so that I can include any references in my presentation as well.
After I have specified the YAML section, I will start to construct my slides. As I mentioned before, my slides are defined by specifying a new sub-sub-section rather than a new section. However, I use a new section as a way to identify a milestone in my presentation. So here is what my slides often look like under the hood:
# Motivation for the project
### Research Question
- I wonder why some penguins have longer bills than others.
### Motivation for the project
- I once watched *Happy Feet* and felt that some penguins would probably have an advantage with a longer bill than the others, so long as it wasn't too long.
### What we (collectively) know already
- Bills are useful tools for a variety of species.
### What is the puzzle
- While we know that bills are useful, I am not sure we understand what explains the variation in the bill length.
- NOTE: All of this is just a silly example that I am writing while I'm waiting for some of my Monte Carlo simulations to run.
# My expectations
- Blah blah blah
### What I argue
- Blah blah blah
### Justify my argument
- Blah blah blah
### My hypotheses
- Blah blah blah
# Evidence
### Data
- I use the Palmer Penguins dataset to examine heterogeneity in bill length.
### Methods
- I am going to first establish that there is some degree of heterogeneity in bill length.
### Results
```{r}
#| label: fig-presentation-histogram
#| fig-cap: Distribution of Bill length
penguins_list[['original']] |>
ggplot() +
geom_histogram(aes(x = bill_length_mm), fill = '#eb6864', alpha = 0.3) +
theme_minimal() +
labs(x = 'Bill length (in mm)', y = 'Density', caption = 'Data source: Palmer Penguins')
```
:::{.notes}
- Here is a histogram demonstrating that there is some heterogeneity!
:::
# Concluding thoughts
### Conclusions
### Contact
There is a lot in here. But let me try to explain it all:
- The # represents a milestone that will show up on my table of contents. So on my table of contents slide, it will list:
- Motivation for the project
- My expectations
- Evidence
- Concluding thoughts
- My slides are identified by using three # symbols.
- Within my slides, I can list bullet points by using the dash symbol. If I want to nest a bullet point, I hit enter and then the tab button and then add a dash symbol.
- The same way that I could execute R code to compile my PDF I can do that with a presentation as well!
- The nice thing about RevealJS is that I can use a speaker view for my slides by hitting the S key once I’ve opened my presentation html file. With the speaker view, I can see the current slide that I am on, the next slide, the duration of my presentation, and I can also see any speaker notes I include.
- You’ll notice that I added notes for myself on the Results slide by using three colons and curly brackets that enclose “.notes”. Within the notes block, I can add bullet points of things I want to make sure to mention on a particular slide. This is pretty awesome!
Closing up this long blog post
As Quarto is designed to be able to compile into a number of different formats and is extremely customizable, this allows for you to use Quarto for a whole smattering of documentation that you might need for a given project or for a job. This makes it a wonderful tool to learn and to sink some time into to learn.
Also, as you might tell from the ability to run in-line code blocks and your ability to execute and report on your analyses all in one document, this is a huge step-forward in the replication crisis. No more are the days that authors write a manuscript, submit it to a journal, then once accepted change all of the R scripts. It makes sure that authors are thinking about the reproducibility of their code while they are writing it. Sweet.
There is tons more to cover about Quarto and its different uses that you can find on their main website. Also, if you’d like to see examples of these things all in action, you’re welcome to look at these github repositories of mine which include the code along with the resulting output. I will note that for the example of a manuscript, I am using Andrew Heiss’ fancy template so this’ll look a bit different than the default stuff I covered in this post.
- Manuscript (PDF) example: Elite norms and mass polarization pre-analysis plan draft
- Slides example: Elite norms and mass polarization presentation
Footnotes
Term used loosely here.↩︎