From 9bd4cf7b5937047fb22c494f34ed21fa11306968 Mon Sep 17 00:00:00 2001 From: "Luke W. Johnston" Date: Tue, 7 May 2024 23:25:14 +0200 Subject: [PATCH] docs(slides): update slides with some more science and more "next steps" --- slides/what-next.qmd | 262 +++++++++++++++++++++++++------------------ 1 file changed, 155 insertions(+), 107 deletions(-) diff --git a/slides/what-next.qmd b/slides/what-next.qmd index 7cade97..5c8c446 100644 --- a/slides/what-next.qmd +++ b/slides/what-next.qmd @@ -3,6 +3,11 @@ search: false # Relative to main project bibliography: includes/references.bib csl: includes/vancouver.csl +knitr: + opts_chunk: + dev: svg + dev.args: + bg: "transparent" --- # What next? Reproducibility in research @@ -27,13 +32,78 @@ understand the logic of what you are doing, even if they can't directly reproduce the results. ::: -# Code sharing is abysmal across health sciences [@Considine2017a, @Rauh2019, @Evans2019, @Rauh2019a, @Hughes2019, @Peng2006a, @Seibold2021] +## Few share code within health sciences [@Considine2017a, @Rauh2019, @Evans2019, @Rauh2019a, @Hughes2019, @Peng2006a, @Seibold2021] + +::: aside +Few studies on extent of code and data availability, and whether study +could be reproduced. Figure shows results of some "meta" studies: 1) +[10.1177/2515245920918872](https://doi.org/10.1177/2515245920918872), 2) +[10.1007/s11306-017-1299-3](https://link.springer.com/article/10.1007/s11306-017-1299-3), +3) +[10.1371/journal.pone.0251194](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0251194). +::: + +```{r sharing} +#| fig-width: 10 +#| fig-height: 4 +library(tidyverse) +tribble( + ~study, ~text, ~value, ~total, + 1,"Data available", 41, 62, + 1,"Code available", 37, 62, + 1,"Could reproduce", 21, 62, + 2,"Data available", 2, 27, + 2,"Code available", 1, 27, + 2,"Could reproduce", 0, 27, + 3,"Data available", 14, 57, + 3,"Code available", 1, 57, + 3,"Could reproduce", 7, 57, +) %>% + mutate( + study = fct_recode( + as_factor(study), + "Registered Reports\nin Psychology (n=62) [1]" = "1", + "Systematic review of\nmetabolomics studies\n(n=27) [2]" = "2", + "Reproducing longitudinal\nanalyses in PLOS ONE\n(n=57) [3]" = "3" + ), + text = fct_rev(fct_inorder(text)), + value = round((value / total) * 100, 0) + ) %>% + ggplot(aes(x = text, y = value, label = paste0(value, "%"))) + + geom_col(fill = "gray20", width = 0.7) + + geom_text(nudge_y = 10) + + theme_minimal() + + coord_flip(ylim = c(0, 100)) + + facet_grid(cols = vars(study), scales = "free") + + labs(y = "Percent of articles", + title = "") + + theme( + + text = element_text(size = 16), + axis.title.y = element_blank(), + panel.grid = element_blank(), + axis.text.x = element_blank(), + axis.title.x = element_text(colour = "grey20", size = 10), + axis.line = element_blank(), + panel.border = element_blank(), + panel.background = element_rect(fill = 'transparent', colour = NA), + plot.background = element_rect(fill = 'transparent', colour = NA) + ) +``` + +::: notes +- Estimating the reproducibility of scientific studies is currently + very difficult because of: + - Nearly non-existent publishing of code/data + - General lack of awareness of and training in it +::: ## How can we check reproducibility if no code is given? {.center} Possible role models as research groups: [Jeff Leek](http://jtleek.com/codedata.html) and [Ben -Marwick](https://faculty.washington.edu/bmarwick/#publications). +Marwick](https://faculty.washington.edu/bmarwick/#publications). Or +[Steno Aarhus' GitHub account](https://github.com/steno-aarhus/)! ::: notes But that doesn't even matter, because we can't have reproducibility if @@ -53,161 +123,139 @@ very very few people who do. And this isn't a niche, this is a gaping hole in our modern scientific process. A huge hole. ::: -# Not going to lie, there are very strong... +# Multiple benefits, from personal to philosophical -## Institutional barriers +## It's a core principle of the scientific method: Verification {.center} -. . . +## Learning more from others: For PhD students to senior researchers {.center} -- Lack of adequate awareness, support, infrastructure, training +## More exposure and visibility: More output to show and be seen {.center} -. . . +## So few are doing open science, this is a great niche! {.center} -- Research culture values publications over all else +## Easier and quicker collaboration (aside from the learning part) {.center} -. . . +## Finding better opportunities outside of academia {.center} -- More traditional academics don't understand or resist change +::: notes +And the last one is that one reason you don't see a lot of researchers +sharing their code or being more reproducible is.. they end up getting +picked up by industry and paid really well or decide to leave academia +because of the barriers. -. . . +Just as an example, I found a Norwegian group who had a really +inefficient workflow and decided to re-build their workflow to make use +of programming, to be reproducible, to have a pipeline. I looked up the +lead author as well as several other of the co-authors and guess what... +many of them now work in really great companies as data scientists or +software engineers, probably making a lot of money and having +potentially a less stressful life. +::: -- 'Business as usual' is easier +# Strong instutional barriers, such as ... {.center} ::: notes You will encounter a lot of resistance, a lot of barriers and hardship. +::: + +## Lack of adequate awareness, support, infrastructure, training {.center} +::: notes At the institutional level, there is no real awareness of this, no support or infrastructure. You're basically doing this on your own. Which probably isn't that uncommon anyway. -Research culture and incentives pretty much only care about publishing -journal articles. Creating software tools, meh. Making teaching -materials to help other researchers, meh. Communicating your science to -the public, meh. Doing actual science that might take years and not lead -to any "hard papers", meh. - -We have a large portion of traditional academics who have benefitted -from and succeeded in this system and are invested in continuing it. -Probably because they don't understand the scope of the problem or just -resist change. - -We have a system that favours each individual person repeating the same -mistakes that others make because the system doesn't allow for us to -take the time to create tools and infrastructure that helps ourselves -and others out. - -Because business as usual is the easiest way in the short term. Our -current scientific culture is just not prepared for this, for the rising -modern analytic and computational era. +Your organization is moving in the right direction to resolve this +issue, but actions tell more than words. ::: -## ...and personal barriers +## Research culture values publications over all else {.center} -- Fear of: - - Fear of being scooped or ideas being stolen - - Not being credited for ideas - - Errors and public humiliation - - Risk to reputation - -. . . +What would you spend your time on if we didn't have this +publication-obsession? -- Need to constantly stay updated +::: notes +Research culture and incentives pretty much only care about publishing +journal articles. Creating software tools or datasets to be shared, meh. +Making teaching materials to help other researchers, meh. Communicating +your science to the public and doing outreach, meh. Doing actual science +that might take years and not lead to any "hard papers", meh. + +Imagine if the number of publications and where you published didn't +matter for getting funding or getting a research job. What would you +spend your time on? What would you do differently compared to now? +::: -- Finding better opportunities outside of academia +## Legal and privacy concerns about sharing data, intellectual property protection, patents {.center} -::: aside -More detail on barriers here: [Tennant -(2017)](https://doi.org/10.6084/m9.figshare.5383711.v1) +::: notes +Legal and privacy concerns are big topics that institutions in +particular focus on a lot, about ownership and so on, since research can +lead to commercialization and the potential for profit. For individual +researchers, we often worry about these concerns too much and sometimes +stops us from doing work because we're afraid we're doing something +wrong ::: +# Strong personal barriers like ... {.center} + +## Fear of ... {.center} + +- Fear of being scooped or ideas being stolen +- Errors and public humiliation + ::: notes And there aren't just institutional barriers. We as researchers have fears of being scooped, of embarrassment and humiliation for your methods being *gasp* wrong. Which is actually just part of science. +::: -You also have to constantly stay updated, and that can be tiring. - -And the last barrier, which may actually be a benefit, is that one -reason you don't see a lot of researchers sharing their code or being -more reproducible is.. they end up getting picked up by industry and -paid really well or decide to leave academia for the reasons I -mentioned. +## Overwhelmed with everything that we *should* do better {.center} -Just as an example, I found a Norwegian group who had a really -inefficient workflow and decided to re-build their workflow to make use -of programming, to be reproducible, to have a pipeline. I looked up the -lead author as well as several other of the co-authors and guess what... -many of them now work in really great companies as data scientists or -software engineers, probably making a lot of money and having -potentially a less stressful life. +::: notes +It is also really overwhelming, having so many things to think about to +make sure you're doing solid science. No researcher in the past had to +consider and think and know as much as we have to know and to do. +Another reason why we need more team science, to distribute the tasks +and skills. ::: -# So... what you can do right now? - -## Easiest thing: Start sharing your code {.center} +## Need to constantly stay updated {.center} ::: notes -If you do nothing else: share your code. - -If its ugly, that's fine! The point is you start and that you get more -comfortable doing it until it becomes second nature to share and in the -process, your code gets better because you know someone might look at -your code. +You also have to constantly stay updated, and that can be tiring. ::: -## How do you share? +# So... what you can do right now to be more open and reproducible? -- [GitHub](https://github.com/) +## Follow some core principles {.center} -- [Zenodo](https://zenodo.org/) +- Use open source tools wherever possible -- [figshare](https://figshare.com/) +- Use plain text as often as possible -- [Open Science Framework](https://osf.io/) +- Upload and share publicly early and often (e.g. to GitHub or + [Zenodo](https://zenodo.org)) -::: notes -How do you share? Put your code up on any of these sites. I prefer a -combination of GitHub and Zenodo, but the others are also quite good as -well. - -When do you share? I say right away. As soon as I have an analysis -project, my code is up on either GitHub or GitLab (another service like -GitHub). Alternatively, you can upload it when you also finish your -manuscript. -::: +- Upload and share publicly as many things as possible -## What else can you do? +- Archive to get a DOI/version for major milestones -- Find or start building a community of people using R +## Use social actions to be more open -- Start doing code reviews within your group +- Do code/paper reviews through *through GitHub* -- Start new projects or collaborations by: +. . . - - Using R, Quarto / R Markdown, Git, and GitHub - - Aiming to share and be reproducible +- Require writing *everything* in Markdown -::: aside -Code reviews: Reviewing each others code like you would review a -manuscript. -::: +. . . -::: notes -The other things you can start doing is find or start building a -community of people who also use R or are doing reproducibility or any -other computational work. Use them as support and help and also give -back too. - -Start doing code reviews in your research group. Code review would be -where you look over each others code, check that it works, check that it -makes sense, that it's readable and understandable. The nice thing with -doing code reviews is that it dispels the mystery around code and about -criticising it and trying to improve it. We review manuscripts, why not -code? I personally though have had a really hard time getting groups -I've been part of now and in the past to do this, but baby steps. -::: +- Agree on a standard folder and file structure for projects + +## Teach others! {.center} -## And teaching others is a great way to learn :wink: {.center} +It's also a great way to learn :wink: :wink: ::: notes Lastly.. you can teach. Teach others. Use these teaching materials. Or