Statistical Analysis
The misuse of statistical analyses can cause irreproducible and misleading results (1). These resources have been selected to help researchers better understand the importance of choosing appropriate statistical analyses and are not intended to replace formal statistical training or consultation services.
1.) Weak statistical standards implicated in scientific irreproducibility by Ericka Check Hayden and Scientific Methods: Statistical Errors by Regina Nuzzo
Guidelines, Literature and Blogs
- Ten Simple Rules for Effective Statistical Practice by Robert E. Kass, et. al.
- Beyond Rigor: Appropriate Analysis by Patricia Campbell and Eric Jolly
-
A statistical definition for reproducibility and replicability by Prasad Patil, Roger D. Peng, Jeffrey Leek
-
Beyond subjective and objective in statistics by Andrew Gelman and Christian Hennig
-
The Statistics Decision Tree : The Decision Tree helps select statistics or statistical techniques appropriate for the purpose and conditions of a particular analysis and to select the MicrOsiris commands which produce them or find the corresponding SPSS and SAS commands.
-
Ten common statistical mistakes to watch out for when writing or reviewing a manuscript by Tamar R Makin and Jean-Jacques Orban de Xivry
-
Understanding Statistics and Experimental Design (How to not lie with statistics) by Michael H. Herzog, Gregory Francis, and Aaron Clarke
-
Reproducing Statistical Results by Victoria Stodden
-
The fickle P value generates irreproducible results by Halsey, LG et. al. (Nature Methods)
-
The ASA's Statement on p-Values: Context, Process, and Purpose by Wasserstein, RL and Lazar, NA
-
Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations by Greenland, S. et. al.
-
Statisticians issue warning over misuse of P values by Monya Barker (Nature)
-
Some natural solutions to the p-value communication problem—and why they won’t work by Andrew Gelman and John Carlin
- The Interactive Statistical Pages project represents an ongoing effort to develop and disseminate statistical analysis software in the form of web pages.
Utilizing HTML forms, CGI and Perl scripts, Java, JavaScript and other browser-based technologies, each web page contains within it (or invokes) all the programming needed to perform a particular computation or analysis.
- Statistical Modeling, Causal Inference, and Social Science from Professor Andrew Gelman, a professor of statistics and political science and director of the Applied Statistics Center at Columbia University.
- Unbiased Research: Statistical Design and Analysis of Experiments. This is a blog of mostly biomedical PhD students, at Emory University, taking a course on "Statistical Design and Analysis of Experiments"
- StatsBlogs: syndicates posts from statistics related blogs and brings traffic and user interaction to contributing blogs. It is a service by Talk Stats Forum, the #1 statistics forum with >10k visitors daily. We are dedicated to facilitate information sharing and exchange in the statistics community
- Simply Statistics: We are three biostatistics professors (Jeff Leek, Roger Peng, and Rafa Irizarry) who are fired up about the new era where data are abundant and statisticians are scientists.

Consulting Services
Services below are provided to Columbia researchers ranging from no-cost to fee-for-service.
- The Biostatistics, Epidemiology and Research Design Resource (BERD) provides a wide range of design, statistical, and analytical support services to assist CUIMC faculty members in garnering grant support and publishing study results. In conjunction with the Department of Biostatistics of the Mailman School of Public Health, BERD provides support through consultations and educational initiatives.
- Biostatistics Resource in Design, Grants and Evaluation (BRIDGE)
- Consulting Service is no cost to faculty at CUIMC for projects that require 1-2 sessions
- Fee-for-service consulting available for work that can be completed during the consulting service session
- Collaborations with faculty members in Department of Biostatistics are available if a project requires a longer term statistical consulting relationship
- Department of Statistics Consulting Services
The Department of Statistics offers free statistical consulting to the Columbia community. Consulting is available by appointment only.
- Statistical Consulting Center
The Statistical Analysis Center (SAC), at Columbia University’s Mailman School of Public Health, is an experienced team of experts dedicated to providing state of the art statistical, data, logistical and regulatory support for clinical research. These services are available to anybody conducting clinical experiments and randomized clinical trials.
Courses and Lectures
-
Biostatistics in Action: Tips for Clinical Researchers Lecture Series - a series of monthly lectures geared towards clinical researchers and research staff who are interested in gaining insight into fundamental design and statistical concepts with a focus on practical knowledge. All interested are welcome to attend (no prior statistical experience required).
-
Statistical Software Mini-Courses - A two-part mini-course on getting started with statistical software. The mini-course covers the basics of statistical programming in R and SAS. Topics include data manipulation, descriptive statistics and basic analyses. Statistical Software Mini-courses are offered once per year. Open to Columbia community at no cost.
Center for Open Science (COS)
COS is part of the Open Science Framework (OSF), which has developed a series of online workshops as part of their statistical and methodological consulting services. These materials are free and can be found here, the webinars are also available on OSF's YouTube channel.
edX Course: Principles, Statistical and Computational Tools for Reproducible Science
Learn skills and tools that support data science and reproducible research to ensure you can trust your research results, reproduce them yourself, and communicate them to others.
This free course covers fundamentals of reproducible science, case studies, data provenance, statistical methods for reproducible science, computational tools for reproducible science, and reproducible reporting science. These concepts are intended to translate to fields throughout the data sciences: physical and life sciences, applied mathematics and statistics, and computing.
Consider this course a survey of best practices that will help you create an environment in which you can easily carry out reproducible research and integrate with similar situations for your collaborators and colleagues.
Johns Hopkins University Data Science Lab
The major educational initiative of the JHUDSL is to create open-source online courses delivered through a range of platforms including Youtube, Github, Leanpub, and Coursera. There are four active MOOC programs that you can enroll in at any time. Join over 8 million other students in taking a course produced by the Johns Hopkins Data Science Lab!
Statistics How To
A statistics guide for beginners. Statistics How To has more than 1,000 articles and hundreds of videos for elementary statistics, probability, AP statistics and advanced statistics topics.
Courses From Simply Stats
-
Data analysis for life sciences: A series of 7 classes that teach R and statistics for health sciences applications, with a particular focus on genomic technologies. The classes were built by Rafael Irizarry and Mike Love. You can find and sign up for all the classes on their web site.
-
Genomic Data Science Specialization on Coursera: A 7-course sequence focused on teaching tools for analyzing genomic data. The classes were built by Jeff Leek, Steven Salzberg, James Taylor, Ela Pertea, Liliana Florea, Ben Langmead, and Kasper Hansen. You can find and sign up for all the classes on Coursera.
Courses from Lynda (access with UNI)
Tips for working with a statistician

Statistical Resources by Discipline
- Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological Studies: A Checklist to Avoid p-Hacking by Wicherts et. al. (Frontiers in Psychology)
-
A practical solution to the pervasive problems of p values by Wagenmakers (Pyschonomic Bulletin & Review)
-
False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant by Simons, JP; Nelson, LD; Simonsohn, U (Psychological Science)
- The Misuse and Abuse of Statistics in Biomedical Research by Matthew S. Thiese, Zachary C. Arnold and Skyler D. Walker
- Know Your Chances: Understanding Health Statistics by Steven Woloshin, MD, MS, Lisa M. Schwartz, MD, MS, and H. Gilbert Welch, MD, MPH
- Statistics for Biologists is a collection of articles addressing important statistical issues that biologists should be aware of and provides practical advice to help them improve the rigor of their work (text adapted from Nature)
- Statistics for Experimental Biologists
This website was started to solve two related problems: 1. How to connect researchers with the information they need to do their jobs properly. 2. How to improve the quality of preclinical biomedical science. It has been developed specifically for laboratory-based experimental biologists, and therefore the examples will be familiar and relevant to anyone with such a background. The articles consist of "how-to" topics (including what not to do), key concepts and ideas, key papers and books (all suitable for biologists), and the occasional opinion piece. It is assumed that readers will have taken a first course in statistics and are familiar with t-tests, ANOVA, and regression.
- Computing Workflow for Biologist: A Roadmap by Ashley Shade and Tracey K. Teal
