Data Management
Data management describes the actions taken during the different stages of the data lifecycle which define how data are collected, stored, secured, and disseminated. Data management best practices are defined by discipline, PI, or project.
A fundamental understanding of data management helps when writing a Data Management Plan (DMP), in addition to ensuring data accessibility and integrity. To learn more about writing a DMP, including templates and what federal agencies require, visit the Writing a Data Management Plan Webpage.
Data Management Questions
Data management begins with asking the right questions as to how data will be collected, stored, shared and organized. Below is a list of questions that can help you get started:
- What types of data are collected?
- How much data (file size) will be collected?
- How quickly will data accumulate?
- What are the likely file formats?
- How unique are the data and how often will backups be performed?
- Will the data be collected from a third-party source?
- What data tools are available?
- Are the data part of a collaboration that needs to be shared regularly and frequently?
- Who needs access to the data?
- How long will the data need to be kept?
- What are the data retention policies of the funder, journal, Columbia University?
- Who owns the data?
- What is classification of the data and what security measures need to be put in place?
- Will the data be shared to a public database?
- What sort of problems have been encountered previously with managing data?
- What kind of DMP does the funder require?
Data Management Resources
Tutorials and Guidelines
- The ReaDI Program has created several tutorials (below) and identified guidelines to aid in the management of data during the collection phase of research. The ReaDI Program is available for data management consulting and presentations (Columbia researchers only).
- Tutorial
- Best Practices for Data Management when Using Instrumentation
- Description
- Tips and best practices for collecting, saving and processing data collected from instruments
- Tutorial
- Good Laboratory Notebook Practices
- Description
- Tips and best practices for maintaining a laboratory notebook
- Tutorial
- Guidelines on the Organization of Samples in a Laboratory
- Description
- Tips on managing, identifying and preserving samples (non-clinical)
Columbia University Data Management Consulting Services
- Statistical Analysis Center Data Management Services are available to anybody at Columbia. They are able to help with all aspects of data management, including administrative systems. Their services include:
- Case report form design
- Database design
- Database hosting
- Custom user interface design (web, desktop, telephone, etc.)
- Data system design (data for analysis, logistical data, personnel data, financial data, etc.)
- Report design
- Database querying and data set generation
- RedCap host and development
- Research Data Services, part of the Columbia University Libraries, is available to help with many aspects of the research data lifecycle, including research data management, finding data, recommendations for cleaning and understanding data, mapping and visualizing your data.
- Irving Institute for Clinical and Translational Research offers free one-hour consultation to discuss data management requirements, help design a data management plan with associated budget requirements or provide guidelines for moving data into a properly formatted, secure environment.
- Resource
- Data Management for Researchers by Kristin Briney
- Description
- A comprehensive guide to everything scientists need to know about data management, this book is essential for researchers who need to learn how to organize, document and take care of their own data. Text adapted from Amazon.
- Resource
- Responsible Conduct of Research - Data Management Module from Office of Research Integrity
- Description
- The Office of Research Integrity (ORI) oversees and directs Public Health Service (PHS) research integrity activities on behalf of the Secretary of Health and Human Services with the exception of the regulatory research integrity activities of the Food and Drug Administration.
These modules have been compiled from multiple institutions.
- Resource
- Folder Hierarchy Best Practices for Digital Asset Management
- Description
- When a folder hierarchy is shared between multiple people or departments (such as a shared file server), things often get messy because everyone thinks about organizing and finding files in different ways. This article takes an in-depth look at why folder hierarchies are important and provides best practices for folder organization. (Text from article)
- Resource
- Research Data Management: A Primer by NISO
- Description
- This primer covers the basics of research data management, with the goal of helping researchers and those that support them become better data stewards. (adapted from text)
- Resource
- The FAIR Guiding Principles for scientific data management and stewardship
- Description
- Comment that appeared in Nature's Scientific Data on FAIR Principles which put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals.
- Resource
- DataONE Best Practices database
- Description
- The DataONE Best Practices database provides individuals with recommendations on how to effectively work with their data through all stages of the data lifecycle.
- File Naming Convention (University of Illinois)
- Guide to writing README.txt for metadata (Cornell University)
- Tool
- REDCap
- Description
- A secure web application for building and managing online surveys and databases.
Columbia data management consulting for REDCap (Columbia researchers only)
- Tool
- LabArchives
- Description
- A cloud-based Electronic Research Notebook which replaces traditional paper notebooks in professional research labs and higher education laboratory courses. LabArchives is available for free to any Columbia researcher with a valid UNI.
- Tool
- Globus
- Description
- Subscribers can move, share, publish and discover data via a single interface. Globus is available at no cost to Columbia researchers.
- Tool
- openrefine.org
- Description
- A tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.
- Tool
- Software Tools Catalog (from DataONE.org)
- Description
- The Software Tools database is the product of two NSF-funded Informatics Education Planning Workshops hosted by DataONE. The database provides a brief description of a wide range of tools that are recommended for use by scientists and students, as well as additional information and links to further resources. Users can access tools within the database by selecting keywords (under advanced search) or using free search.