Data Management

Data management describes the actions taken during the different stages of the data lifecycle which define how data are collected, stored, secured, and disseminated. Data management best practices are defined by discipline, PI, or project.

A fundamental understanding of data management helps when writing a Data Management Plan (DMP), in addition to ensuring data accessibility and integrity. To learn more about writing a DMP, including templates and what federal agencies require, visit the Writing a Data Management Plan Webpage.

Good Research Data Management (GRDM) is essential to rigorous and reproducible scientific research. Lapses in RDM can lead to questionable data and cause lasting damage to a researcher’s reputation and their ability to receive federal funds, as well an erosion of the public’s confidence in the scientific community.

Benefits of GRDM | Data management is a process that includes collecting, validating, storing, protecting, sharing, and processing data to enable accessibility and reliability of the data for its users. GRDM is important for several reasons:

  • Maintains your data integrity – Properly documenting and managing your data increases the reproducibility of your work and as a result, increases the validity of your results.
  • Improves your research impact – maintaining accessible and reliable data allows you to readily share your raw datasets and can improve your research impact by increasing the “relevance” of your research. Re-using and re-purposing data can lead to “unanticipated” new discoveries and can provide the raw material for researchers with little funding to work on.
  • Saves you time – Planning ahead and confronting obstacles early on prevents potential headaches down the road and will save you both time and money.
  • Guarantees long-term data longevity – Properly preserving your data in a data repository makes it accessible and discoverable for years to come; it safeguards your “research contribution” for the research community.
  • Allows you to meet funding/grant requirements – Most funding agencies, including the NIH, require that you properly manage, document, and share your data (starting Jan 2023).
  • Allows you to satisfy requirements for journal publications – Today, many journals require that published articles be accompanied by the underlying raw research data.

 

References

  1. Briney KA, Coates H, Goben A (2020) Foundational Practices of Research Data Management. Research Ideas and Outcomes 6: e56508.
  2. Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
  3. Some Simple Guidelines for Effective Data Management. Borer, Elizabeth T., Eric W. Seabloom, Matthew B. Jones, and Mark Schildhauer. Bull. Ecol. Soc. Am. 90(2)205-214. 2009. 

Data management begins with asking the right questions as to how data will be collected, stored, shared and organized. Below is a list of questions that can help you get started:

  • What types of data are collected?
     
  • How much data (file size) will be collected?
     
  • How quickly will data accumulate?
     
  • What are the likely file formats?
     
  • How unique are the data and how often will backups be performed?
     
  • Will the data be collected from a third-party source?
     
  • What data tools are available?
     
  • Are the data part of a collaboration that needs to be shared regularly and frequently?
     
  • Who needs access to the data?
     
  • How long will the data need to be kept?
     
  • What are the data retention policies of the funder, journal, Columbia University?
     
  • Who owns the data?
     
  • What is classification of the data and what security measures need to be put in place?
     
  • Will the data be shared to a public database?
     
  • What sort of problems have been encountered previously with managing data?
     
  • What kind of DMP does the funder require?

Data Management Resources

Tutorials and Guidelines
  • The ReaDI Program has created several tutorials (below) and identified guidelines to aid in the management of data during the collection phase of research. The ReaDI Program is available for data management consulting and presentations (Columbia researchers only).
Columbia University Data Management Consulting Services
  • Statistical Analysis Center Data Management Services are available to anybody at Columbia. They are able to help with all aspects of data management, including administrative systems. Their services include: 
    • Case report form design 
    • Database design 
    • Database hosting 
    • Custom user interface design (web, desktop, telephone, etc.) 
    • Data system design (data for analysis, logistical data, personnel data, financial data, etc.) 
    • Report design 
    • Database querying and data set generation 
    • RedCap host and development
       
  • Research Data Services, part of the Columbia University Libraries, is available to help with many aspects of the research data lifecycle, including research data management, finding data, recommendations for cleaning and understanding data, mapping and visualizing your data.
     
  • Irving Institute for Clinical and Translational Research offers free one-hour consultation to discuss data management requirements, help design a data management plan with associated budget requirements or provide guidelines for moving data into a properly formatted, secure environment.