Research Methods: Open Science and Reproducible Research in Linguistics

class: center, middle, inverse, title-slide

# Research Methods: Open Science and Reproducible Research in Linguistics
## Registered reports<br>Communicating/sharing I: RMarkdown
### Joseph V. Casillas, PhD
### Rutgers University</br>Spring 2019</br>Last update: 2019-02-07

---

background-image: url(./assets/img/psych_fail.png)

---
background-image: url(https://www.sbs.com.au/guide/sites/sbs.com.au.guide/files/styles/body_image/public/rocky.jpg?itok=D1SCRjAe&mtime=1528594528)
background-size: contain
background-color: black

---
background-image: url(https://static01.nyt.com/images/2016/08/05/us/05onfire1_xp/05onfire1_xp-superJumbo-v2.jpg?quality=90&auto=webp)
background-size: contain

---
class: center, middle

# The current model

Generate and specify hypotheses

⬇︎

Design study

⬇︎

Conduct study and collect data

⬇︎

Analyze data and test hypotheses

⬇︎

Interpret results

⬇︎

Publishing pipeline

---
background-image: url(https://cdn.cos.io/media/images/Hypothetico-deductive_scientific_method-1.original.png)
background-size: contain

---
background-image: url(./assets/img/publish_pipeline.png)
background-size: contain

# Publishing pipeline

---
class: title-slide-section-grey, middle, center

# Attempts at reform

### .lightgrey[(the comeback)]

---

# Attempts at reform

### Meta-analysis

.pull-left[

- Statistical approach

- Aggregation of results from many studies

- Inferences based on larger and potentially more diverse samples

- Attempt to increase power (over individual studies)

- Improve estimates of the size of the effect

- Resolve uncertainty when reports disagree

- Promotes collaboration among scientists, and incentivizes more 
systematic research programs

]

.pull-right[

]

---

# Attempts at reform

### Meta-analysis

- Cannot fix problems of p-hacking, reporting errors, and fraud

- Some researchers believe it dramatically exacerbates them

- May increase type I error

- Researcher is "further" from data, difficult to judge quality (especially when attempting to include unpublished studies)

- The popularity of meta-analysis has served to emphasize the size of 
effects and by thus raising the consciousness of behavioral scientists has promoted the cause of power analysis

- If we didn't have the problems we currently have, this approach would be 
more productive
--
... we will see more meta-analyses in the future

---
background-image: url(./assets/img/ma1.png), url(./assets/img/ma2.png)
background-position: 5% 50%, 95% 50%
background-size: 500px, 600px

---
background-image: url(./assets/img/ma3.png), url(./assets/img/ma4.png)
background-position: 5% 50%, 95% 50%
background-size: 550px, 550px

---
background-image: url(./assets/img/ma5.png), url(./assets/img/ma6.png)
background-position: 5% 50%, 95% 50%
background-size: 500px, 600px

---

# Attempts at reform

### p-bashing

#### Old statistics critiques

.pull-left[

- "a meaningless ordeal of pedantic computations"<sup>1</sup>

- "the test of significance has been carrying too much of the burden of scientific inference"<sup>2</sup>

- "A potent but sterile intellectual rake who leaves in his merry 
path a long train of ravished maidens but no viable scientific 
offspring"<sup>3</sup>

]

.pull-right[

- "perhaps the least important attribute of a good experiment"<sup>4</sup>

- "that .grey[a great deal of mischief has been associated] with the 
test of significance[...] is .grey[what  everybody knows]"<sup>2</sup>

- "one of the worst things that ever happened in the history of psychology"<sup>5</sup>

]

.footnote[
<sup>1</sup>Stevens (1960), 
<sup>2</sup>Bakan (1966), 
<sup>3</sup>Meehl (1967), 
<sup>4</sup>Lykken (1968), 
<sup>5</sup>Meehl (1978)
]

---

# Attempts at reform

### p-bashing

#### New(ish) statistics critiques<sup>1</sup>

.footnote[
<sup>[1](https://www.jstor.org/stable/pdf/20182143.pdf?refreqid=excelsior%3Aaa46f47b6cb642339d0f407c02d3070c)</sup>Cohen (1994), <sup>[2](https://psyarxiv.com/mky9j/)</sup>Benjamin et al. (2018), <sup>[3](https://psyarxiv.com/9s3y6)</sup>Lakens et al. (2018), <sup>4</sup>Cumming (2014), <sup>5</sup>Kruschke (2013), <sup>6</sup>Wagenmakers et al. (2011)
]

.pull-left[

- "Because NHST p-values have become the coin of the realm in much of psychology, they have served to inhibit its development as a science."
- Lower alpha to 0.005<sup>2</sup>
- Justify your alpha<sup>3</sup>
- Use point estimation and confidence intervals instead of p-values<sup>4</sup>
- Bayesian estimation instead of p-values<sup>5</sup>
- Bayes factors over p-values<sup>6</sup>

]

.pull-right[

]

---
background-image: url(./assets/img/prr.png)
background-size: contain

https://osf.io/2dxu5/

---

# Attempts at reform

### Registered reports

.pull-left[

- Started in medicine

- Current trend in psych

- Coming to ling (Language Learning, Language and Speech)

- Some journals have started offering badges

]

.pull-right[

]

---
background-image: url(./assets/img/roettger_2019_00.png)
background-size: contain

.footnote[Roettger (2019)]

---
class: center, middle

# What is a registered report? </br>What is it designed to do?

---

# Attempts at reform

### Registered reports

- Publish your experimental design first

- Receive open peer review based of the theoretical grounds 
and methods

- Reviewers suggest amendments that can still be incorporated before 
  the study is run

- The peer review process grants In Principle Acceptance (IPA)

- Only then carry out the experiment, analyze data, finish manuscript

- Resubmit for second peer review

- Publish results, regardless of the outcome

- Goal: reduce the number of papers reporting statistically significant results that are actually false positives

---
background-image: url(https://cdn.cos.io/media/images/registered_reports.width-800.png)
background-size: contain

# New publishing pipeline

---

# Attempts at reform

### Unreviewed pre-registration

- A second type of pre-registration,

- Does not involve reviewers before data collection

- Authors write plan and it is time-stamped before conducting the study

- In theory the process is similar to the standard model, but one can be 
(more) confident that there is no HARKing/p-hacking

- Two models (registered report vs. unreviewed pre-registration) are not 
mutually exclusive

- They can have different priorities in the research cycle (i.e., 
exploratory vs. confirmatory research)

---
background-image: url(./assets/img/reg_report_template.png)
background-size: contain

---

# Attempts at reform

### Pre-registration

#### Negatives

- More work?

- Too restrictive?

- Idea theft?

#### Limitations

- Flexibility (sometimes we think of things pos facto)?

- Fraud (multiple pre-registrations?)?

- Irrelevant for certain types of research?

---

# Attempts at reform

### Registered reports

#### How? Online platforms for pre-registration

- the Open Science Framework (OSF)

- aspredicted.org

---
background-image: url(./assets/img/osf.png)
background-size: 900px

.footnote[https://cos.io/rr/]

---

---
class: title-slide-section-grey, center, middle

# **Where are we now?**

---
background-image: url(./assets/img/roettger_2019_01.png)
background-size: contain

.footnote[Roettger (2019)]

---
background-image: url(./assets/img/roettger_2019_02.png)
background-size: contain

.footnote[Roettger (2019)]

---
background-image: url(./assets/img/roettger_2019_03.png)
background-size: contain

.footnote[Roettger (2019)]

---
class: title-slide-section-grey, center, middle

# **Where are we now?**

background-image: url(https://justseriesandstuff.files.wordpress.com/2015/07/618_movies_rocky_10.jpg)
background-size: contain

---

# Where are we now?

- Registered reports are becoming the norm in Psychology

- Linguistics slow to follow

---
class: title-slide-section-grey, center, middle

# **Where are we heading?**

background-image: url(https://www.telegraph.co.uk/content/dam/films/2018/11/23/rocky_trans_NvBQzQNjv4Bq04zWM7lESoHlZcET6IbVrgsXXAS7_VrfHdozeI5gQBU.PNG?imwidth=1400)
background-size: contain

---

# Where are we heading?

### What does it mean for the field?

- Researchers have to adapt, learn new methods of open science
- Journals have to adapt, adjust model of publishing

### What does it mean for you?

- Registered dissertation experiments?
- Published replications before graduation?
- Increased knowledge of coding?
- Increased sharing of materials (code, data, stimuli)?

---
class: title-slide-section-grey, middle

.big[

<ru-blockquote>
That said, I have two main reservations about the manuscript. First, the potential value of this manuscript to serve as a fully-worked-out example of GAMMs or Bayesian analysis is limited by the unavailability of the raw data and R code. I actually think the primary value of this manuscript is its demonstration of these statistical techniques, and so if the authors are unable or unwilling to make their raw data and R code available, I cannot recommend this manuscript for publication.
</ru-blockquote>
]

---
class: title-slide-section-red
background-image: url(https://cdn-images-1.medium.com/max/1600/1*gYQhlM7v6GyRuxaL8JtPIQ.png), url(https://upload.wikimedia.org/wikipedia/commons/thumb/4/48/Markdown-mark.svg/2000px-Markdown-mark.svg.png), url(https://www.rstudio.com/wp-content/uploads/2017/05/rmarkdown.png), url(https://upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Tab_plus.svg/2000px-Tab_plus.svg.png), url(https://upload.wikimedia.org/wikipedia/commons/thumb/c/cf/Kennzeichnung_für_Äquivalenzglied.svg/2000px-Kennzeichnung_für_Äquivalenzglied.svg.png)
background-position: 5% 60%, 43% 60%, 95% 60%, 27% 60%, 62% 63%
background-size: 250px, 250px, 375px, 100px, 175px

# Communicating/sharing I

---

# What is markdown?

- Markdown is a language used to format text

- Rather than click a button to format (like in word), you use markdown 
syntax

- Lightweight markup language (like HTML but simple)

- Easy to read and write because it uses simple tags (e.g. #)

.pull-left[

```
# This is a subsection header

This is **bold** text.  
This is *italic* text.

- This is 
- a list

1. This is a 
2. numbered list
```

]

.pull-right[
## This is a subsection header

This is **bold** text.  
This is *italic* text.

- This is 
- a list

1. This is a 
2. numbered list
]

---

# Exercise I

- Open RStudio
- File > New file > RMarkdown (then click "ok")
- Select all (cmd + a) and delete everthing
- Type "hello world"
- Click "knit" (You will be asked to save. Save the file to your desktop)
--

- Try to add the following: 
  - a section header
  - bold text
  - an ordered list
  - an unordered list
  - a link to your favorite website

---
background-image: url(https://learn.r-journalism.com/publishing/rmarkdown/images/rmdfiles.png)
background-position: 95% 50%

# What is R Markdown?

- An authoring format that combines markdown syntax and R code 
(R + markdown)

- An RMarkdown file consists of 3 components...

- front matter

- plain text

- R code

- How does it work?

---
background-image: url(https://cdn-images-1.medium.com/max/1600/1*gYQhlM7v6GyRuxaL8JtPIQ.png), url(https://upload.wikimedia.org/wikipedia/commons/thumb/4/48/Markdown-mark.svg/2000px-Markdown-mark.svg.png), url(https://www.rstudio.com/wp-content/uploads/2017/05/rmarkdown.png), url(https://upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Tab_plus.svg/2000px-Tab_plus.svg.png), url(https://upload.wikimedia.org/wikipedia/commons/thumb/c/cf/Kennzeichnung_für_Äquivalenzglied.svg/2000px-Kennzeichnung_für_Äquivalenzglied.svg.png), url(https://www.rstudio.com/wp-content/uploads/2014/04/knitr-200x232.png)
background-position: 5% 30%, 43% 30%, 95% 10%, 27% 30%, 62% 23%, 30% 70%
background-size: 250px, 250px, 375px, 100px, 175px, 150px

.footnote[.big[`knitr` is used to 'knit' r code into the markdown text file]]

---

# What is R Markdown?

- An authoring format that combines markdown syntax and R code 
(R + markdown)

- An RMarkdown file consists of 3 components...

- front matter

- plain text

- R code

- How does it work?

- **What can it do**?

---

# Exercise II

- Open RStudio
- File > New file > RMarkdown (then click "ok")
- Take a look at the text. What markup do you see?
- Click "knit" (You will be asked to save. Save the file to your desktop)
- What do you see? What section represents the `front matter`? How can you distinguish plain markdown text from r code?
--

- Create a new `knitr` code chunk, add the following and click `knit`:  
`x <- 5; 2 * x`
--

- Create a new `knitr` code chunk and add the following (note: you may have to install the package):

.pull-left[
```
library(tidyverse)
mtcars %>% 
  ggplot(aes(x = drat, y = mpg)) + 
    geom_point() + 
    geom_smooth()
```
]

.pull-right[
<img src="index_files/figure-html/unnamed-chunk-2-1.png" width="504" />
]

---
class: center, middle

# Why use it?

### An RMardown file is a **dynamic document** that is fully reproducible

### It can be regenerated automatically whenever the R code or data changes

### It allows you to easily share your results

---

class: middle
background-image: url(./assets/img/rmd_01.png)
background-size: contain
background-position: 100% 50%

.pull-left[
.big[
RMarkdown allows you to write simple text documents that can be 
converted to many differnt output formats

- HTML
- PDF
- Word
- HTML5 slides
- websites/blogs
- .grey[Beamer]
- .grey[Tufte handouts]
- .grey[Books]
- .grey[dashboards]

]
]

---

# Exercise III

- Open the github desktop app
- You should still have the `github_practice` repo
- Pull in the newest changes (click 'pull')
- **If you don't have the `github_practice` repo, go to github.com, 
search for `jvcasillas`, search for the `github_practice` repo, 
fork it again, and clone it to your desktop**. 
--

- Open the `rmarkdown_ex` folder and double click "rmarkdown_ex.Rproj"
--

- Find the "Files" tab in one of the 4 window panes, click on `ex3.Rmd` 
- Inspect the file, notice the front matter and the code chunks. 
- Click 'knit'
--

- Change the front matter from what you see on the left to what you see on 
the right and click `knit`:

.pull-left[

```r
---
title: "More complex RMardown example"
author: "Joseph Casillas"
date: "`r Sys.Date()`"
output: 
  html_document: 
    highlight: kate
    number_sections: yes
    theme: spacelab
    toc: true
    toc_float:
      toc_collapsed: true
---
```
]

.pull-right[

```r
---
title: "More complex RMardown example"
author: "Joseph Casillas"
date: "`r Sys.Date()`"
output: word_document
---
```
]

---
background-image: url(https://www.r-project.org/Rlogo.png), url(../assets/img/prohibited.png), url(https://www.mcdwayne.com/wp-content/uploads/2018/05/I-love-markdown-syntax-language.png)
background-size: 200px, 350px, contain
background-position: 0% 80%, 66% 26%, 60% 50%
background-color: #e6e6e6

---
class: title-slide-final, middle
background-image: url(https://github.com/jvcasillas/ru_xaringan/raw/master/img/logo/ru_shield.png), url(https://www.r-project.org/Rlogo.png)
background-size: 55px, 100px
background-position: 9% 15%, 89% 15%

# Getting help

## If you have problems using RMarkdown (or github) 
## ask for help in the slack channel

### You can find some very basic tutorials related to 
### R, RStudio, RMarkdown, GitHub, and Slack [here][here]

[here]: http://www.jvcasillas.com/ru_teaching/ru_spanish_589/589_01_s2018/sources/tuts/index.html