Computational Research Training Opportunities

This page identifies an in-progress list of internal and external training opportunities available in computational methods, statistical programming, and data science. If you are aware of an internal or external resource that is not yet listed below, please email the URL and a brief description to [email protected].

Internal Resources

The Data Science and Analytics XSeries is a three course series offered through EdX. Taught by a distinguished team of professors at Columbia University’s Data Science Institute, this XSeries is perfect for anyone who wants to understand basic concepts in data science without getting into the weeds of programming. Aimed at organization leaders, business managers, health care professionals and anyone considering a career in data science, this series will steep learners in the fundamentals of statistics, machine learning and algorithms. It will also introduce emerging technologies such as the Internet of Things, or wirelessly connected products, and techniques that allow computers to summarize mountains of text, audio and video. Concrete examples provided throughout the series will ensure that learners fully grasp and master key concepts.

Contact: Brianne Cortese, Assistant Director of Admissions & Academic Affairs, Data Science Institute, [email protected]

Jointly founded by Columbia University’s Data Science Institute and Columbia Entrepreneurship, The Collaboratory@Columbia is a university-wide program dedicated to supporting collaborative curricula innovations designed to ensure that all Columbia University students receive the education and training that they need to succeed in today’s data rich world.  Previously offered programming includes the Data Science Clinic and Data Science Bootcamps

Contact: Michael Aresco, Associate Director of Finance and Operations, Columbia Entrepreneurship, [email protected]

The Computational Image Analysis (CIA) Lab in the Department of Radiology strives to advance radiology research and practice by developing advanced image analysis methods for multiple clinical applications in different therapeutic areas. Our primary research has focused on development and validation of new quantitative imaging biomarkers (e.g., tumor volume and texture features) derived from CT, MRI and/or PET, using radiomics and deep learning approaches, to improve prediction and assessment of tumor responses to novel therapies in oncology. It is also our mission to 1) familiarize our residents and students with quantitative radiology, and 2) provide image de-identification/transfer and quantitative image analysis services.

Contact: Dr. Binsheng Zhao, Professor and Director for CIA Lab, [email protected]

The Collaboratory@Columbia runs a free Data Science Bootcamp for Columbia University PhD students and post-doctoral scholars who are interested in extending their existing mathematical and programming skills to include training in data science. Designed by faculty and post-doctoral scholars from Columbia University’s Data Science Institute, the curriculum includes on-line learning material, introductory lectures, hands on laboratory experiences and a capstone project. Upcoming dates TBA.

Contact: Michael Aresco, Associate Director of Finance and Operations, Columbia Entrepreneurship, [email protected]

The Collaboratory Clinic is an appointment-based free consulting service for students and researchers at Columbia University that offers assistance with planning and executing data driven research projects, including help with data visualization, analysis and prediction, both in conceptual terms and with concrete software implementations. The Collaboratory Clinic will provide free in-person assistance with data science related queries to any member of the Columbia Community. We can provide help with:

  • Designing data-driven applications.
  • Data collection and storage issues.
  • Data analysis in Python, R, Matlab, SQL, C++, Java and other languages.
  • Choosing analysis and machine learning approaches.
  • Image recognition and image analysis.
  • Text processing, language understanding and topic modelling tasks.
  • Validating data driven analysis.

Contact: Michael Aresco, Associate Director of Finance and Operations, Columbia Entrepreneurship, [email protected]

This series of three workshops covers the basics of Linux, shell scripts, and accessing Columbia’s HPC cluster.

A series of workshops and discussion groups for faculty, staff, and graduate students to support development of critical digital literacy as a standard part of teaching digital skills.

Foundations for Research Computing provides informal training for Columbia University graduate students and postdoctoral scholars to develop fundamental skills for harnessing computation: core languages and libraries, software development tools, best practices, and computational problem-solving. Topics are covered from across the spectrum, from beginner to advanced. Beyond training, the Foundations program aims to create a computational community at Columbia, bringing disparate researchers together with the common thread of computation.

For more information, please email [email protected].

Python User Group is a twice-monthly meeting for those using Python in their research or who are curious about the Python programming language. Every two weeks, the group will present a workshop or lead a discussion about a specific use case for Python. At the end of the meeting, there will typically be time for collaborative work, questions, or discussion with fellow researchers or practitioners.

For a list of upcoming Python User Group events, please see:

Research Data Services understands that a research project is always a process. It requires different skills at different steps. We provide support and consultation for the research data needs of the members of Columbia University.

Open Labs

Map Club, provided by Research Data Services, is for those who want to learn more about mapping, spatial data and GIS. Open to beginners and experts, the club is a space to experiment with web-based libraries or frameworks and GIS tools.

Data Club, provided by Research Data Services, is for those who work with data and are interested in learning more about data analysis, management, and other aspects of the data life cycle. Open to beginners and experts, the club is a space to experiment with new libraries or techniques, usually in Python or R.

Online Resources

The Big Data to Knowledge (BD2K) program supports the research and development of innovative and transformative approaches and tools to maximize and accelerate the utility of big data and data science in biomedical research. As biomedical tools and technologies rapidly improve, researchers are producing and analyzing an ever-expanding amount of complex data called “big data.” Extracting useful knowledge from big data is a major limiting factor to understanding health and disease. BD2K is facilitating data-driven discovery by improving our ability to harvest the wealth of information contained in biomedical big data.

From the basecamp to the pinnacle of your journey, DataCamp is the first and foremost leader in Data Science Education, offering skill-based training, pioneering technical innovation, and courses from the world's best educators.

These series of tutorials on Data Science engineering will try to compare how different concepts in the discipline can be implemented in the two dominant ecosystems nowadays: R and Python.

We will do this from a neutral point of view. Our opinion is that each environment has good and bad things, and any data scientist should know how to use both in order to be as prepared as possible for job market or to start personal project.

To get a feeling of what is going on regarding this hot topic, we refer the reader to DataCamp's Data Science War infographic.Their infographic explores what the strengths of R are over Python and vice versa, and aims to provide a basic comparison between these two programming languages from a data science and statistics perspective.

Far from being a repetition from the previous, our series of tutorials will go hands-on into how to actually perform different data science tasks such as working with data frames, doing aggregations, or creating different statistical models such in the areas of supervised and unsupervised learning.

We will use real-world data sets, and we will build some real data products. This will help us to quickly transfer what we learn here to actual data analysis situations.

If your are interested in Big Data products, then you might find interesting our series of tutorials on using Apache Spark and Python or using R on Apache Spark (SparkR).

A “leading online learning platform that helps anyone learn business, software, technology, and creative skills to achieve personal and professional goals,” offers a variety of course options including introduction and training in R (, Python (, and Linux basics and bash scripting (

The open-source curriculum for learning Data Science. Foundational in both theory and technologies, the OSDSM breaks down the core competencies necessary to making use of data.

A selection of online research data management training resources, materials and programs, both locally hosted and other.