[en] Data analysis workflows in many scientific domains have become increasingly complex and flexible. Here we assess the effect of this flexibility on the results of functional magnetic resonance imaging by asking 70 independent teams to analyse the same dataset, testing the same 9 ex-ante hypotheses(1). The flexibility of analytical approaches is exemplified by the fact that no two teams chose identical workflows to analyse the data. This flexibility resulted in sizeable variation in the results of hypothesis tests, even for teams whose statistical maps were highly correlated at intermediate stages of the analysis pipeline. Variation in reported results was related to several aspects of analysis methodology. Notably, a meta-analytical approach that aggregated information across teams yielded a significant consensus in activated regions. Furthermore, prediction markets of researchers in the field revealed an overestimation of the likelihood of significant findings, even by researchers with direct knowledge of the dataset(2-5). Our findings show that analytical flexibility can have substantial effects on scientific conclusions, and identify factors that may be related to variability in the analysis of functional magnetic resonance imaging. The results emphasize the importance of validating and sharing complex analysis workflows, and demonstrate the need for performing and reporting multiple analyses of the same data. Potential approaches that could be used to mitigate issues related to analytical variability are discussed.
Disciplines :
Neurosciences & behavior
Author, co-author :
Botvinik-Nezer, Rotem
Holzmeister, Felix
Camerer, Colin F.
Dreber, Anna
Huber, Juergen
Johannesson, Magnus
Kirchler, Michael
Iwanir, Roni
Mumford, Jeanette A.
Adcock, R. Alison
Avesani, Paolo
Baczkowski, Blazej M.
Bajracharya, Aahana
Bakst, Leah
Ball, Sheryl
Barilari, Marco
Bault, Nadège
Beaton, Derek
Beitner, Julia
Benoit, Roland G.
Berkers, Ruud M. W. J.
Bhanji, Jamil P.
Biswal, Bharat B.
Bobadilla-Suarez, Sebastian
Bortolini, Tiago
Bottenhorn, Katherine L.
Bowring, Alexander
Braem, Senne
Brooks, Hayley R.
Brudner, Emily G.
Calderon, Cristian B.
Camilleri, Julia A.
Castrellon, Jaime J.
Cecchetti, Luca
Cieslik, Edna C.
Cole, Zachary J.
Collignon, Olivier ; Université de Liège - ULiège > Département des sciences cliniques > Département des sciences cliniques
Cox, Robert W.
Cunningham, William A.
Czoschke, Stefan
Dadi, Kamalaker
Davis, Charles P.
Luca, Alberto De
Delgado, Mauricio R.
Demetriou, Lysia
Dennison, Jeffrey B.
Di, Xin
Dickie, Erin W.
Dobryakova, Ekaterina
Donnat, Claire L.
Dukart, Juergen
Duncan, Niall W.
Durnez, Joke
Eed, Amr
Eickhoff, Simon B.
Erhart, Andrew
Fontanesi, Laura
Fricke, G. Matthew
Fu, Shiguang
Galván, Adriana
Gau, Remi
Genon, Sarah ; Université de Liège - ULiège > CRC In vivo Imaging-Aging & Memory
Botvinik-Nezer, R. et al. fMRI data of mixed gambles from the Neuroimaging Analysis Replication and Prediction Study. Sci. Data 6, 106 (2019). DOI: 10.1038/s41597-019-0113-7
Dreber, A. et al. Using prediction markets to estimate the reproducibility of scientific research. Proc. Natl Acad. Sci. USA 112, 15343–15347 (2015). DOI: 10.1073/pnas.1516179112
Camerer, C. F. et al. Evaluating replicability of laboratory experiments in economics. Science 351, 1433–1436 (2016). DOI: 10.1126/science.aaf0918
Camerer, C. F. et al. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nat. Hum. Behav. 2, 637–644 (2018). DOI: 10.1038/s41562-018-0399-z
Forsell, E. et al. Predicting replication outcomes in the Many Labs 2 study. J. Econ. Psychol. 75, 102117 (2019). DOI: 10.1016/j.joep.2018.10.009
Wicherts, J. M. et al. Degrees of freedom in planning, running, analyzing, and reporting psychological studies: a checklist to avoid P-hacking. Front. Psychol. 7, 1832 (2016). DOI: 10.3389/fpsyg.2016.01832
Simmons, J. P., Nelson, L. D. & Simonsohn, U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22, 1359–1366 (2011). DOI: 10.1177/0956797611417632
Carp, J. On the plurality of (methodological) worlds: estimating the analytic flexibility of FMRI experiments. Front. Neurosci. 6, 149 (2012). DOI: 10.3389/fnins.2012.00149
Silberzahn, R. et al. Many analysts, one data set: making transparent how variations in analytic choices affect results. Adv. Methods Pract. Psychol. Sci. 1, 337–356 (2018).
Tom, S. M., Fox, C. R., Trepel, C. & Poldrack, R. A. The neural basis of loss aversion in decision-making under risk. Science 315, 515–518 (2007). DOI: 10.1126/science.1134239
De Martino, B., Camerer, C. F. & Adolphs, R. Amygdala damage eliminates monetary loss aversion. Proc. Natl Acad. Sci. USA 107, 3788–3792 (2010). DOI: 10.1073/pnas.0910230107
Canessa, N. et al. The functional and structural neural basis of individual differences in loss aversion. J. Neurosci. 33, 14307–14317 (2013). DOI: 10.1523/JNEUROSCI.0497-13.2013
Esteban, O. et al. fMRIPrep: a robust preprocessing pipeline for functional MRI. Nat. Methods 16, 111–116 (2019). DOI: 10.1038/s41592-018-0235-4
Acikalin, M. Y., Gorgolewski, K. J. & Poldrack, R. A. A coordinate-based meta-analysis of overlaps in regional specialization and functional connectivity across subjective value and default mode networks. Front. Neurosci. 11, 1 (2017). DOI: 10.3389/fnins.2017.00001
Gorgolewski, K. J. et al. NeuroVault.org: a web-based repository for collecting and sharing unthresholded statistical maps of the human brain. Front. Neuroinform. 9, 8 (2015). DOI: 10.3389/fninf.2015.00008
Nosek, B. A., Ebersole, C. R., DeHaven, A. C. & Mellor, D. T. The preregistration revolution. Proc. Natl Acad. Sci. USA 115, 2600–2606 (2018). DOI: 10.1073/pnas.1708274114
Nosek, B. A. & Lakens, D. Registered reports: a method to increase the credibility of published results. Soc. Psychol. 45, 137–141 (2014). DOI: 10.1027/1864-9335/a000192
Markiewicz, C., De La Vega, A., Yarkoni, T., Poldrack, R. & Gorgolewski, K. FitLins: reproducible model estimation for fMRI. Poster W621 in 25th Annual Meeting of the Organization for Human Brain Mapping (OHBM, 2019).
Simonsohn, U., Simmons, J. P. & Nelson, L. D. Specification curve: descriptive and inferential statistics on all reasonable specifications. 10.2139/ssrn.2694998 (2015).
Patel, C. J., Burford, B. & Ioannidis, J. P. A. Assessment of vibration of effects due to model specification can demonstrate the instability of observational associations. J. Clin. Epidemiol. 68, 1046–1058 (2015). DOI: 10.1016/j.jclinepi.2015.05.029
Steegen, S., Tuerlinckx, F., Gelman, A. & Vanpaemel, W. Increasing transparency through a multiverse analysis. Perspect. Psychol. Sci. 11, 702–712 (2016). DOI: 10.1177/1745691616658637
LaConte, S. et al. The evaluation of preprocessing choices in single-subject BOLD fMRI using NPAIRS performance metrics. Neuroimage 18, 10–27 (2003). DOI: 10.1006/nimg.2002.1300
Gorgolewski, K. J. et al. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Sci. Data 3, 160044 (2016). DOI: 10.1038/sdata.2016.44
Tversky, A. & Kahneman, D. Advances in prospect theory: cumulative representation of uncertainty. J. Risk Uncertain. 5, 297–323 (1992). DOI: 10.1007/BF00122574
Nichols, T. E. et al. Best practices in data analysis and sharing in neuroimaging using MRI. Nat. Neurosci. 20, 299–303 (2017). DOI: 10.1038/nn.4500
Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67, 1–48 (2015). DOI: 10.18637/jss.v067.i01
Lubke, G. H. et al. Assessing model selection uncertainty using a bootstrap approach: an update. Struct. Equ. Modeling 24, 230–245 (2017). DOI: 10.1080/10705511.2016.1252265
Abraham, A. et al. Machine learning for neuroimaging with scikit-learn. Front. Neuroinform. 8, 14 (2014). DOI: 10.3389/fninf.2014.00014
Hughett, P. Accurate computation of the F-to-z and t-to-z transforms for large arguments. J. Stat. Softw. 23, 1–5 (2007). DOI: 10.18637/jss.v023.c01
Turkeltaub, P. E., Eden, G. F., Jones, K. M. & Zeffiro, T. A. Meta-analysis of the functional neuroanatomy of single-word reading: method and validation. Neuroimage 16, 765–780 (2002). DOI: 10.1006/nimg.2002.1131
Eickhoff, S. B. et al. Behavior, sensitivity, and power of activation likelihood estimation characterized by massive empirical simulation. Neuroimage 137, 70–85 (2016). DOI: 10.1016/j.neuroimage.2016.04.072
Eklund, A., Nichols, T. E. & Knutsson, H. Cluster failure: why fMRI inferences for spatial extent have inflated false-positive rates. Proc. Natl Acad. Sci. USA 113, 7900–7905 (2016). DOI: 10.1073/pnas.1602413113
Yarkoni, T., Poldrack, R. A., Nichols, T. E., Van Essen, D. C. & Wager, T. D. Large-scale automated synthesis of human functional neuroimaging data. Nat. Methods 8, 665–670 (2011). DOI: 10.1038/nmeth.1635
Open Science Collaboration. Estimating the reproducibility of psychological science. Science 349, aac4716 (2015). DOI: 10.1126/science.aac4716
Arrow, K. J. et al. Economics. The promise of prediction markets. Science 320, 877–878 (2008). DOI: 10.1126/science.1157679
Wolfers, J. & Zitzewitz, E. Interpreting prediction market prices as probabilities. https://doi.org/10.3386/w12200 (NBER, 2006).
Manski, C. F. Interpreting the predictions of prediction markets. Econ. Lett. 91, 425–429 (2006). DOI: 10.1016/j.econlet.2006.01.004
Fountain, J. & Harrison, G. W. What do prediction markets predict? Appl. Econ. Lett. 18, 267–272 (2011). DOI: 10.1080/13504850903559575
Hanson, R. Logarithmic market scoring rules for modular combinatorial information aggregation. J. Prediction Markets 1, 3–15 (2007).
Chen, Y. Markets as an Information Aggregation Mechanism for Decision Support. PhD thesis, Penn State Univ. (2005).