2  Considerations

2.1 Data Sharing & Privacy

One component of Open Science is data sharing. The FAIR Data Principles provide a framework to enhance the reusability of data (Wilkinson et al. 2016).

FAIR Data Principles: Making data Findable, Accessible, Interoperable, and Reusable.

For an example, the INCF is employing FAIR Data Principles and has published a FAIR roadmap for neuroscience.

Careful consideration should be given to participants’ privacy when developing procedures (e.g., consent) for sharing data (Dennis et al. 2019).

2.1.1 Where to share your data

If data is publicly available, provide a link to the source. If data cannot be shared, consider providing a sample dataset in the repository.

2.1.2 Data privacy

While open science promotes transparency, some data must remain private:

  • Personally Identifiable Information (PII): Follow legal guidelines (e.g., GDPR, HIPAA).
  • Sensitive datasets: Use controlled-access repositories when needed.
  • Anonymization: If sharing is restricted, remove identifiable details or aggregate data.

2.2 Programming

For a general resource on programming practices geared toward open science, visit Russell Poldrack’s webbook Better Code, Better Science.

2.2.1 Project Folder Structure

A well-structured project is transparent, reproducible, and reusable. A clear and consistent folder structure makes collaboration easier and ensures reproducibility. Here’s a basic template for a data science project:

├── data/          # Raw & processed datasets  
├── scripts/       # Code and analysis scripts  
├── results/       # Figures, tables, and outputs  
├── docs/          # Documentation and notes  
├── env/           # Dependency files (requirements.txt, environment.yml)  
├── README.md      # Project overview  
└── LICENSE        # License for open-source sharing  

For best practices in structuring projects, consider these templates:

2.2.2 Version Control

Using version control (e.g., Git) ensures traceability, collaboration, and reproducibility. A public repository allows easy access and contributions. Here are places where you can store your version-controlled code publicly:

2.2.3 Environment Setup

Reproducibility depends on properly defined environments:

  • Python: requirements.txt or environment.yml (for Conda)
  • R: renv.lock
  • Docker: Dockerfile for containerized workflows

2.2.4 File paths

  • Use relative paths in your code for better portability (../data/file.csv).
  • Avoid absolute paths (/home/user/project/data/file.csv) as they may break across systems.

2.3 Documentation: The Key to Reusability

Comprehensive documentation ensures that others can understand, reproduce, and extend your work.

2.3.1 Essential documentation

  • README: Overview of the project, setup instructions, and usage.
  • Data Dictionary: Describes datasets, variables, and formats.
  • Code Documentation: Use clear comments and docstrings ("""docstring""").
  • Version Control Logs: Track changes in a CHANGELOG.md or commit messages.

2.3.2 Three levels of documentation

  1. User-level: Instructions for external users (README files, tutorials).
  2. Developer-level: Internal notes for contributors (code comments, design docs).
  3. Machine-readable: Metadata in structured formats (e.g., JSON, YAML) for automation.

2.4 Pre-registration & Study Design Transparency

Pre-registration strengthens research integrity by documenting hypotheses and methods before data collection. Pre-registration does not limit flexibility—it simply provides a record of initial research intentions.

2.4.1 What to pre-register

  • Research questions & hypotheses
  • Planned methods & analysis approach
  • Expected outcomes

2.4.2 Where to pre-register

2.5 Making Projects Citeable

We recommend establishing or creating a Digital Object Identifier (DOI) to enable researchers and the public to easily cite and access your work. A DOI is a permanent, unique identifier assigned to digital objects such as research papers, datasets, software, and code repositories. It provides a stable and citable link to the content, even if the location (URL) changes.

For example, a DOI link will look like this: https://doi.org/10.5281/zenodo.17108875 with 10.5281/zenodo.17108875 representing the DOI. It will always resolve to the same location.

Note

Note that 10.5281/zenodo.17108875 is in fact the DOI for this online book! Fun fact, with new pushes to the GitHub repository that hosts this book, Zenodo will automatically keep track of updates, while the DOI will always resolve to the latest version.

Here are some recommended places to create a DOI depending on where your Open Science project lives:

  • For a GitHub repository → Zenodo (automatic DOI for software releases).
  • For datasets → Figshare, Dryad, or Zenodo.
  • For a general research project → OSF.

We recommend the following guides for creating DOIs:

  • Creating DOIs for Open Science Framework (OSF) projects (guide)
  • Creating DOIs in Zenodo (guide)
  • Creating DOIs for GitHub repositories using Zenodo (guide)

2.5.1 A few notes on GitHub → Zenodo integration

We recommend including a CITATION.cff file in your GitHub repository to enhance the integration between GitHub and Zenodo, as well as clearly specify citation information. This file will help Zenodo fill in metadata from your repository so that the record is accurate. Otherwise, content may be inaccurate or not intended (e.g., your GitHub username “ebenes” might be used as the author name instead of the intended “Elaine Benes”).

The CITATION.cff file for this repository can be viewed here and we recommend reading the following guide for more information.

Can I create a DOI on Zenodo manually and then connect my GitHub repository to it?

  • No. After a DOI is created on Zenodo, you can include a link to your GitHub repository; however, new releases and code updates to your repository will not show on the Zenodo record as separate versions.

My research project already has a DOI using Zenodo, but I’d like to replace it with my GitHub repository using the GitHub → Zenodo integration. Do I need to create a new DOI?

  • Yes. Having multiple DOIs for the same project may cause confusion, so one method to “link” the DOIs is to use the “Related Works” section within the Zenodo record. This will clarify the relationship between different DOIs from the same project. For example, the Open Science Guide for Parkinson’s Research initially had a DOI created manually using Zenodo (here). This record now “Is obsoleted by” the GitHub repo DOI in the “Related Works” section. There are multiple relationship types to choose from. Pick the one that best represents your particular situation!