Organizing Data
Naming, Versioning & Documentation
 
Why Organization Matters

Data work can be messy. In time you will create multiple files, versions, and methodologies. Spending a little time upfront establishing a system can save hours of searching later.

 
Good file organization can help in a variety of ways:
  • Less time is spent searching for the right file.
  • Backups of data reduce the risk of data loss.
  • Work becomes well-documented: you know exactly what you did, how you did it, and when.
  • Files are created in formats that can be used now and in the future.
  • Progress reporting to teams, funders, and stakeholders becomes faster and easier.
  • Compliance with university and funder requirements is ensured.
  • Data become structured in ways that facilitate analysis and integration.
File Naming Conventions
Best Practices

Create meaningful names relevant to content, independent of where the file is stored.

Date Format
Use ISO 8601 (YYYY-MM-DD) or (YYYYMMDD). This ensures files sort chronologically by default.
Separators & Formatting
Use underscores (this_is_the_file) or "CamelCase" (ThisIsTheFile) to separate terms. Never use spaces.
Sorting (Zero-Padding)
If you have many files, use placeholder digits to maintain order. Use file_001.txt instead of file_1.txt.
No Special Characters
Avoid: ~ ! # & @ ( ) { } [ ] ‘ “ | % $ ; ^. These cause breakage in scripts and operating systems.
Version Control
Versioning Strategies

Don't overwrite the version you need. Establish a system to distinguish successive versions:

  • Dates: data_20230101, data_20230201
  • Ordinals: Use numbers for major changes and letters for minor changes (e.g., data_v1, data_v1.b).
Golden Rule: Never overwrite your raw master data. Save cleaned or analyzed versions as new files.
The Changelog

It is helpful to log what changed, who made the change, and why. Keep a basic text file in your folder:

# fileName_Changelog

## v2 YYYY-MM-DD J Doe <jdoe@ex.com>
* Adjusted variable labels for clarity
* Removed incomplete survey responses

## v1 YYYY-MM-DD J Doe <jdoe@ex.com>
* Initial data import
Documentation Levels

To make your data FAIR (Findable, Accessible, Interoperable, Reusable), ensure you have documentation at both the study and data level.

Study-Level (README)

Create a README.txt file at the root. Explain:

Context
Who collected the data, when, and why?
Software Requirements
What software (including version #) is needed to open these files?
Data-Level (Codebook)

Explain specific file contents:

Variable Labels
"P1_Q3" → "Participant 1, Question 3"
Codes
"999 = Missing", "0 = Control"
File Formats
Proprietary Formats

Use for analysis, but risky for long-term archiving.

  • Microsoft Excel (.xlsx)
  • SPSS (.sav)
  • Photoshop (.psd)
 
If you must use proprietary:

1. Include a README detailing the exact software/hardware version needed.
2. Share a secondary copy in an open format (e.g., Image_v1.psd AND Image_v1.tiff).

Open Formats

Preferred for preservation to ensure Interoperability.

  • Tables: .csv (Comma Separated Values)
  • Text: .txt (Plain Text)
  • Images: .tiff (Lossless Compression)

Note on Compression: If compression is necessary, always use a lossless format to prevent data degradation.

English