Research Methods: Open Science and Reproducible Research in Linguistics

class: center, middle, inverse, title-slide

# Research Methods: Open Science and Reproducible Research in Linguistics
## Reproducibility crisis Version control I: github
### Joseph V. Casillas, PhD
### Rutgers UniversitySpring 2019Last update: 2019-01-31

---

class: title-slide-section-grey, middle, center

# .white[Do we know what we think we know?]

---
background-image: url(../assets/img/pensar2.png)
background-size: 400px
background-position: 90% 50%

# A psychological paradox

### We know 2 things

#### The majority of published studies...

1. ...include findings that are "statistically significant"

2. ...are underpowered due to low sample size1

.footnote[1Chase & Chase 1976, Cohen 1962, Sedlmeier & Gigerenzer 1989]

.pull-left[
.center[

#### **What does this imply**?

#### **What is "significance"**?

#### **What is power**?

]
]

---
background-image: url(https://i.imgflip.com/2sdkd6.jpg)
background-size: contain
background-position: 50% 50%
background-color: black

---
background-image: url(https://static-19.sinclairstoryline.com/resources/media/a079367c-a58f-4130-b01b-5d5d320655d2-large16x9_genericcourtroomgavel.JPG?1531677910817)
background-size: contain

# .white[NHST]

.pull-left[
.Large[.white[In a trial, the 
.big[.yellow[.bold[null hypothesis]] (.yellow[H0])] 
is innocence.]]]

.footnote[
.Large[
.white[
The objective is to see if the evidence .big[(**the data**)] contradicts this 
hypothesis, supporting an .big[.bold[.green[alternative hypothesis]] (.green[H1])], guilt. 
]
]
]

---
background-image: url(https://thehill.com/sites/default/files/styles/thumb_small_article/public/ginsburgruth_041017gn_lead.jpg?itok=TFxW052n)
background-size: contain

# .white[NHST]

.pull-left[
.footnote[
.big[
.white[If there is no evidence, the accused cannot be found 
guilty.]

.white[We .big[**fail to reject** .yellow[.bold[H0]]].]
]
]
]

.pull-right[
.footnote[
.big[
.white[If there is evidence of guilt beyond a reasonable doubt, 
the accused is found guilty.

We .big[.bold[.lightgrey[reject] .yellow[H0]]].]
]
]
]

---
class: middle
background-color: black
background-image: url(https://www.snopes.com/tachyon/2017/03/Ruth_Bader_Ginsburg_fb.jpg?resize=1200,630&quality=65)
background-size: contain

# .white[So what's a] .yellow[p-value].white[?]

### **p-value**: .white[the probability of obtaining your data, if H0 is TRUE.]

--

# .white[...and "].yellow[significant].white["?]

### **Significance**: .white[obtaining a p-value below an arbitrary threshold]

---
class: center, middle
background-color: black

# .lightgrey[High p-value: your data are likely with a true null.]

# .lightgrey[Low p-value: your data are unlikely with a true null.]

.pull-left[
# Low p-values do not tell you how likely your hypothesis is!
]

.pull-right[
# Low p-values are not "more significant" than high p-values!
]

---
background-color: black
background-image: url(https://larspsyll.files.wordpress.com/2014/07/i_do_not_think_it_significant_means_what_you_think_it_means.jpg)
background-size: contain

---
class: middle
background-color: black
background-image: url(https://www.freepngimg.com/thumb/arnold_schwarzenegger/29411-8-arnold-schwarzenegger-file.png)
background-size: contain
background-position: 100% 50%

# .white[What is] **power**.white[?]

### .lightgrey[The probability of (correctly) rejecting] 
### **H0** .lightgrey[when] .yellow[H1] .lightgrey[is true.]

---

# Back to the paradox...

### The majority of published studies...

1. ...include findings that are "statistically significant"

2. ...are underpowered due to low sample size

.center[
### How can this be? 
### If power is low, we are unlikely to correctly reject H0 when H1 is true. 
]

---
background-image: url(https://mchankins.files.wordpress.com/2013/04/filedrawer1.jpg)
background-size: contain

---
class: title-slide-section-grey, middle, center

# The .Large["**file drawer problem**"] has contributed to the creation of .big[.yellow[even bigger]] problems

---

# Reproducibility crisis in Psychological science

### What is the Open Science Collaboration?

### What is the reproducibility crisis?

- 100 prominent papers analyzed, only 39% could be replicated

> "Scientists and journal editors are biased--consciously or not--in  
> what they publish"

### What exactly is the concern?

- Type I error

---
background-color: lightgrey
background-image: url(https://raw.githubusercontent.com/jvcasillas/media/master/rstats/memes/rstats_type12_error1.png)
background-size: contain

---

# Reproducibility crisis in Psychological science

### Is it all type I error? (probably not)

### p-hacking

> refers to the many different decisions that researchers make during data 
> collection, preparation, and analyses that maximize the chance of obtaining 
> significant effects

- p-hacking is a logical explanation for why we consistently see 
underpowered studies that are statistically significant

- p-hacking can turn any false hypothesis into one that has statistically significant support

- "researcher degrees of freedom"

- not necessarily done to be deceptive or manipulative

---

# Reproducibility crisis in Psychological science

### Examples of p-hacking (from Le, 2018)

- Collect data and doing exploratory analyses during data collection. If results are significant, stop collecting data; if not, continue until it becomes significant.

- Collect data on several DVs, but only report results from the DVs that show significant effects. Similarly, create DVs in different ways (e.g., a subset of items) to maximize significant effects.

- Running multiple tests and only reporting on those that yielded significant results.

- If initial results do not show significant effects, re-run analyses with control variables or interaction terms, or doing subgroups analyses.

- Eliminating data (e.g., “outliers” or particular demographic groups) in an attempt to find stronger results.

- Designing an experiment with multiple conditions, but ignoring or collapsing conditions if analyses based on all conditions is not significant.

- Run multiple experiments and only report those that “worked.”

---

# Reproducibility crisis in Psychological science

### Harking

- "hypothesizing after results are known"

- collect data

- do a wide range of exploratory analyses

- "cherry pick" from those results and (mis)represent those as being derived from a priori hypotheses.

- This is also known as “data dredging” and “fishing”

- May seem surprising that this practice occurs, but is rather common and has even been recommended as the preferred way to frame a research project (e.g., Bem, 2002).

---
background-image: url(./assets/img/osc_fig0.png)
background-size: contain

---
background-image: url(./assets/img/osc_fig1.png)
background-size: contain

---
background-image: url(https://raw.githubusercontent.com/jvcasillas/media/master/teaching/gifs/cant_believe.gif)
background-size: contain

---
class: middle, center

---
class: middle, center

---

# So what?

### Is psychology doomed? What does this mean for linguistics?

### What if your paper does not replicate? What does this mean for the authors?

> Boldness in Conjecture, Austerity in Refutation  
> 
> \- Karl Popper (*The Logic of Scientific Discovery*, 1935)

- Make risky predictions (boldness in conjecture) and then try to severely kill/disprove your theory (austerity in refutation)
- Science cannot progress as long as you play it safe

### What is direct replication? Are there other types of replications?

---
background-image: url(https://raw.githubusercontent.com/jvcasillas/media/master/rstats/memes/os_replication_types.png)
background-size: contain

---
class: title-slide-section-grey, middle, center

# Do we know what we think we know? (probably not)

.big[
"**Innovation points out paths that are possible;  
replication points out paths that are likely;  
progress relies on both**" (OSC, 2015)
]

---
background-image: url(./assets/img/gonzales_lotto_fig2.png)
background-size: contain

---
class: title-slide-section-red
background-image: url(https://octodex.github.com/images/daftpunktocat-thomas.gif), url(https://github.githubassets.com/images/modules/logos_page/GitHub-Logo.png)
background-size: contain, 400px
background-position: 100% 50%, 20% 85%

# Version control I

---
background-image: url(https://octodex.github.com/images/puddle_jumper_octodex.jpg)
background-size: 300px
background-position: 90% 10%

# Github

### Motivation

.large[

- What is version control?

- Why is it important?

- How do we do it (normally)?

]

box, onedrive, dropbox, googledrive

---
background-image: url(./assets/img/vc1_github_01.png)
background-size: contain

---
background-image: url(./assets/img/vc1_github_02.png)
background-size: contain

---
background-image: url(./assets/img/vc1_github_03.png)
background-size: contain

---
background-image: url(./assets/img/vc1_github_04.png)
background-size: contain

---
background-image: url(https://octodex.github.com/images/hula_loop_octodex03.gif)
background-size: 300px
background-position: 90% 10%

# Github

### How do we do it for reproducible research?

.big[
- git

- github

- gitlab

- bitbucket
]

---
background-image: url(https://octodex.github.com/images/labtocat.png)
background-size: 300px
background-position: 90% 10%

# Github

### Metaphors and terminology

- repo
- commit
- push
- pull
- pull request
- fetch
- fork
- clone
- issue

---
background-image: url(https://octodex.github.com/images/steroidtocat.png)
background-size: 300px
background-position: 90% 10%

# Exercise I

### Working online/forking

- Fork the github_practice repo
- Edit README.md
- Commit changes (make a comment)
- Submit pull request

--
- Add your definition of one of the terms in the glossary
- Commit changes (make a comment)
- Submit pull request

---
background-image: url(https://octodex.github.com/images/snowoctocat.png)
background-size: 300px
background-position: 90% 10%

# Exercise II

### Cloning

- Clone the github_practice repo on to your computer

--
- Add a folder whose name is your last name (e.g. 'casillas') 
- Create a README.md file using a text editor and put your   
favorite link in the body
- Save everything, commit the changes with an informative message
- Push changes to your cloned repo
- Check github to see if it worked

--
- Submit pull request
- When everybody has finished, fetch changes

---
background-image: url(./assets/img/vc1_github_05.png)
background-size: contain

---
background-image: url(./assets/img/vc1_github_06.png)
background-size: contain

---
background-image: url(./assets/img/vc1_github_07.png)
background-size: contain

---
background-image: url(./assets/img/vc1_github_08.png)
background-size: contain

---
background-image: url(./assets/img/vc1_github_09.png)
background-size: contain

---
background-image: url(./assets/img/vc1_github_10.png)
background-size: contain

---
background-image: url(./assets/img/vc1_github_11.png)
background-size: contain

---
background-image: url(./assets/img/vc1_github_12.png)
background-size: contain

---
background-image: url(./assets/img/vc1_github_13.png)
background-size: contain

---
background-image: url(./assets/img/vc1_github_14.png)
background-size: contain

---
background-image: url(./assets/img/vc1_github_15.png)
background-size: contain

---
background-image: url(./assets/img/vc1_github_16.png)
background-size: contain

---
background-image: url(./assets/img/vc1_github_17.png)
background-size: contain

---
background-image: url(https://octodex.github.com/images/socialite.jpg)
background-size: contain

---
background-image: url(https://octodex.github.com/images/daftpunktocat-guy.gif)
background-size: 300px
background-position: 90% 10%

# Exercise III

### Research project

- Join my RAP group 
--
(seriously... @rap-group)

- Fork and clone `dpbe_l2_replication`

- With a partner, find an L2 journal that allows replication studies    
and include the link in the main README.md

---
layout:false
class: title-slide-final, middle
background-image: url(https://github.com/jvcasillas/ru_xaringan/raw/master/img/logo/ru_shield.png), url(https://cdn.freebiesupply.com/logos/large/2x/github-octocat-logo-png-transparent.png)
background-size: 55px, 100px
background-position: 9% 15%, 89% 15%

# Getting help

## If you have problems using github 
## ask for help in the slack channel

### You can find some very basic tutorials related to 
### R, RStudio, RMarkdown, GitHub, and Slack [here][here]

[here]: http://www.jvcasillas.com/ru_teaching/ru_spanish_589/589_01_s2018/sources/tuts/index.html