Sharing, Publishing & Preserving Data
Repositories, Licensing & Publishing
 
"Open as possible, closed as necessary."

Preservation ensures your data survives beyond the lifespan of your hard drive. Sharing allows it to contribute to the global knowledge base.

Funding agencies and governments have recognized the need for national policies to support access. The goal is a culture change where data is seen as a legitimate research output.

Why not just leave it in OneDrive?

Active storage is not an archive.

Platforms like OneDrive, Dropbox, or Google Drive are prone to accidental deletion, account expiration when you leave the university, and "link rot."

The Recommended Solution is to deposit your final dataset in a recognized Data Repository. A repository manages long-term bit-level preservation, assigning a permanent identifier (DOI) so the link never breaks.

Policy Requirements

Understanding your obligations under Canadian funding mandates.

The overarching policy driving data practices in Canada, where grant recipients are expected to provide access to data where ethical, cultural, legal, and commercial requirements allow.

  • Grant applicants must include a Data Management Plan for specific funding calls.
  • Recipients should deposit data, metadata, and code supporting research conclusions into a digital repository.
  • Whenever possible, data should be linked to the publication via a persistent identifier (PID).
  • Access should align with Findable, Accessible, Interoperable, and Reusable principles.
Related Frameworks
Data should be stored using formats that ensure secure preservation beyond the duration of the project.
CIHR funded researchers must deposit specific data types (e.g., bioinformatics) into appropriate public databases.
Research data must be preserved and made available for use within two years of project completion.
Preparing Data for Deposit

Data sharing requires planning at the project outset. Before you upload, you must ensure the data is safe to share and useful to others.

Disciplinary Considerations

Fields like the Humanities may have outputs that do not fit traditional definitions of "data." Consider the specific culture of your field regarding:

Formats & Metadata
Are there standard tools or software in your discipline? Ensure you use metadata standards that make discovery easy for your peers.
Data Curation Support
Does the data require cleanup, anonymization, or transformation? General tools may lack the context required for your specific field.
Sanitization Checklist
Select
Filter out test files and scratchpads. Keep data that supports your publication.
Convert
Proprietary formats die. Convert files to Open Preservation Formats (e.g., .csv, .txt, .xml).
Anonymization
Remove direct names and "indirect identifiers" (combinations of age, profession, postal code) that could re-identify participants.
Indigenous Data Sovereignty

Distinct practices apply.

Indigenous data sovereignty recognizes the inherent rights of Indigenous communities to govern the collection, ownership, and use of their data.

Ensure you have explicit community permission to share. This may result in distinct practices regarding access and licensing.

Choosing a Repository
Why Deposit?

Using a formal repository provides distinct advantages over simply hosting a file on a personal website or cloud drive:

Global Discovery (Harvesting)
Repositories expose your metadata to external harvesters, making your data findable and accessible from global systems like Google Dataset Search.
Impact Metrics
Access detailed metrics on who is viewing and downloading your work to demonstrate impact for tenure or grant applications.
Data Citation & DOIs
Obtain a persistent identifier (DOI) and a generated citation, allowing others to cite your data in the same manner as a journal article.
Recommended Repositories
Discipline-Specific
(e.g., GenBank, ICPSR). Often preferred as they put your data directly in front of your specific community.
Best for "Big Data" (> 1TB). Uses the Digital Research Alliance of Canada’s infrastructure.
A general-purpose European repository (CERN). Excellent for long-tail data and orphans.
Licensing & Embargoes
License It

Without a license, other researchers legally cannot use your data.

Apply a clear license like Creative Commons (CC-BY 4.0), which allows reuse as long as you are credited.

Embargo It

Worried about being "scooped"? You can deposit data now but set an Embargo (e.g., 12 or 24 months).

The metadata will be visible (proving you have the data), but the files remain locked until the date you choose.

Questions?

For assistance with repository selection, licensing, or embargoes:

Contact John Bayhi

English