2 Considerations
2.1 Data Sharing & Privacy
One component of Open Science is data sharing. The FAIR Data Principles provide a framework to enhance the reusability of data (Wilkinson et al. 2016).
FAIR Data Principles: Making data Findable, Accessible, Interoperable, and Reusable.
For an example, the INCF is employing FAIR Data Principles and has published a FAIR roadmap for neuroscience.
Careful consideration should be given to participants’ privacy when developing procedures (e.g., consent) for sharing data (Dennis et al. 2019).
2.1.2 Data privacy
While open science promotes transparency, some data must remain private:
- Personally Identifiable Information (PII): Follow legal guidelines (e.g., GDPR, HIPAA).
- Sensitive datasets: Use controlled-access repositories when needed.
- Anonymization: If sharing is restricted, remove identifiable details or aggregate data.
2.2 Programming
For a general resource on programming practices geared toward open science, visit Russell Poldrack’s webbook Better Code, Better Science.
2.2.1 Project Folder Structure
A well-structured project is transparent, reproducible, and reusable. A clear and consistent folder structure makes collaboration easier and ensures reproducibility. Here’s a basic template for a data science project:
├── data/ # Raw & processed datasets
├── scripts/ # Code and analysis scripts
├── results/ # Figures, tables, and outputs
├── docs/ # Documentation and notes
├── env/ # Dependency files (requirements.txt, environment.yml)
├── README.md # Project overview
└── LICENSE # License for open-source sharing
For best practices in structuring projects, consider these templates:
2.2.2 Version Control
Using version control (e.g., Git) ensures traceability, collaboration, and reproducibility. A public repository allows easy access and contributions. Here are places where you can store your version-controlled code publicly:
2.2.3 Environment Setup
Reproducibility depends on properly defined environments:
- Python:
requirements.txt
orenvironment.yml
(for Conda)
- R:
renv.lock
- Docker:
Dockerfile
for containerized workflows
2.2.4 File paths
- Use relative paths in your code for better portability (
../data/file.csv
).
- Avoid absolute paths (
/home/user/project/data/file.csv
) as they may break across systems.
2.3 Documentation: The Key to Reusability
Comprehensive documentation ensures that others can understand, reproduce, and extend your work.
2.3.1 Essential documentation
- README: Overview of the project, setup instructions, and usage.
- Data Dictionary: Describes datasets, variables, and formats.
- Code Documentation: Use clear comments and docstrings (
"""docstring"""
).
- Version Control Logs: Track changes in a
CHANGELOG.md
or commit messages.
2.3.2 Three levels of documentation
- User-level: Instructions for external users (README files, tutorials).
- Developer-level: Internal notes for contributors (code comments, design docs).
- Machine-readable: Metadata in structured formats (e.g., JSON, YAML) for automation.
2.4 Pre-registration & Study Design Transparency
Pre-registration strengthens research integrity by documenting hypotheses and methods before data collection. Pre-registration does not limit flexibility—it simply provides a record of initial research intentions.
2.4.1 What to pre-register
- Research questions & hypotheses
- Planned methods & analysis approach
- Expected outcomes
2.4.2 Where to pre-register
- AsPredicted – Simple pre-registration for hypothesis-driven studies.
- Open Science Framework (OSF) – More detailed project documentation.
- ClinicalTrials.gov – Required for clinical research.
2.5 Making Projects Citeable
We recommend establishing or creating a Digital Object Identifier (DOI) to enable researchers and the public to easily cite and access your work. A DOI is a permanent, unique identifier assigned to digital objects such as research papers, datasets, software, and code repositories. It provides a stable and citable link to the content, even if the location (URL) changes.
For example, a DOI link will look like this: https://doi.org/10.5281/zenodo.14984668 with 10.5281/zenodo.14984668 representing the DOI. It will always resolve to the same location.
Note that 10.5281/zenodo.14984668 is in fact the DOI for this online book! Fun fact, with new pushes to the GitHub repository that hosts this book, Zenodo will automatically keep track of updates, while the DOI will always resolve to the latest version.
Here are some recommended places to create a DOI depending on where your Open Science project lives: