2 Considerations

2.2 Programming

For a general resource on programming practices geared toward open science, visit Russell Poldrack’s webbook Better Code, Better Science.

2.2.1 Project Folder Structure

A well-structured project is transparent, reproducible, and reusable. A clear and consistent folder structure makes collaboration easier and ensures reproducibility. Here’s a basic template for a data science project:

├── data/          # Raw & processed datasets  
├── scripts/       # Code and analysis scripts  
├── results/       # Figures, tables, and outputs  
├── docs/          # Documentation and notes  
├── env/           # Dependency files (requirements.txt, environment.yml)  
├── README.md      # Project overview  
└── LICENSE        # License for open-source sharing

For best practices in structuring projects, consider these templates:

2.2.2 Version Control

Using version control (e.g., Git) ensures traceability, collaboration, and reproducibility. A public repository allows easy access and contributions. Here are places where you can store your version-controlled code publicly:

2.2.3 Environment Setup

Reproducibility depends on properly defined environments:

Python: requirements.txt or environment.yml (for Conda)
R: renv.lock
Docker: Dockerfile for containerized workflows

2.2.4 File paths

Use relative paths in your code for better portability (../data/file.csv).
Avoid absolute paths (/home/user/project/data/file.csv) as they may break across systems.

2.3 Documentation: The Key to Reusability

Comprehensive documentation ensures that others can understand, reproduce, and extend your work.

2.3.1 Essential documentation

README: Overview of the project, setup instructions, and usage.
Data Dictionary: Describes datasets, variables, and formats.
Code Documentation: Use clear comments and docstrings ("""docstring""").
Version Control Logs: Track changes in a CHANGELOG.md or commit messages.

2.3.2 Three levels of documentation

User-level: Instructions for external users (README files, tutorials).
Developer-level: Internal notes for contributors (code comments, design docs).
Machine-readable: Metadata in structured formats (e.g., JSON, YAML) for automation.

2.4 Pre-registration & Study Design Transparency

Pre-registration strengthens research integrity by documenting hypotheses and methods before data collection. Pre-registration does not limit flexibility—it simply provides a record of initial research intentions.

2.4.1 What to pre-register

Research questions & hypotheses
Planned methods & analysis approach
Expected outcomes

2.4.2 Where to pre-register

AsPredicted – Simple pre-registration for hypothesis-driven studies.
Open Science Framework (OSF) – More detailed project documentation.
ClinicalTrials.gov – Required for clinical research.

2.5 Making Projects Citeable

We recommend establishing or creating a Digital Object Identifier (DOI) to enable researchers and the public to easily cite and access your work. A DOI is a permanent, unique identifier assigned to digital objects such as research papers, datasets, software, and code repositories. It provides a stable and citable link to the content, even if the location (URL) changes.

For example, a DOI link will look like this: https://doi.org/10.5281/zenodo.14984668 with 10.5281/zenodo.14984668 representing the DOI. It will always resolve to the same location.

Note

Note that 10.5281/zenodo.14984668 is in fact the DOI for this online book! Fun fact, with new pushes to the GitHub repository that hosts this book, Zenodo will automatically keep track of updates, while the DOI will always resolve to the latest version.

Here are some recommended places to create a DOI depending on where your Open Science project lives:

For a GitHub repository → Zenodo (automatic DOI for software releases).
For datasets → Figshare, Dryad, or Zenodo.
For a general research project → OSF.