Skip to main content
Skip table of contents

Step 1.3: How to Organize Data

The EL DMCC will work with you to set up a repository for your data, known as a project, in Synapse. We will create a folder structure based on ELITE Portal community standards for you to use. Organizing data files in accordance with this structure will make annotation and other data management tasks easier both for you and EL DMCC staff.

Please consult with EL DMCC staff before you begin organizing your data files. While this page explains the general practices for organizing data and other materials within your Synapse Project, the EL DMCC will provide specific instructions on how to organize your files.


Project Folders Structure

Your Synapse project will usually be set up with this hierarchy of folders, in accordance with the conventions established by the EL DMCC:

  • Released - where files will be moved once they are ready for public release to the ELITE Portal.

Do not upload any files directly to the ‘Released’ folder

  • Staging - where files are uploaded and held while undergoing data curation. These include folders organized by study and/or data type, followed by subfolders for levels of data processing (“raw data” or “processed data”), analysis, and metadata.

    • Raw or Processed - raw data or processed data.

      • Consent Level(s) - controlled data must be split apart based on participant cohort and/or consent levels. This is required for us to apply appropriate access restrictions to the data.

    • Results - figures or other outputs not considered “raw data”.

    • Metadata - metadata templates, data dictionaries, and optional readme files.

Note: older or independent projects (not sponsored by one of our funders) may not have this exact top-level scheme.

This structure determines how easily others can find and understand your contributions, how easily you can annotate data, and how governance controls can be applied.

Example structure

CODE
staging_metabolomics
├── processed
    ├── 1_US_general_research_use
        ├── lipids_normalized_consent1.csv
    ├── 3_DK_disease-specific + non-profit
        ├── lipids_normalized_consent3DK.csv
├── raw
    ├── 1_US_general_research_use
        ├── abc_consent1.raw
    ├── 3_DK_disease-specific + non-profit
        ├── abc_consent3DK.raw
staging_metadata
├── individualhuman.csv
├── Biospecimenhuman.csv
├── metabolomicsassay.csv
└── manifest.csv

Finer organization

  • Within each data type, it is possible to group data with batches or certain other factors (i.e. tissue type, species). For example, the RNA-seq data folder may have subfolders “batch 1” and “batch 2” that were produced at different times during the project.

  • Please consult with EL DMCC staff regarding any additional levels of data organization that you need.

Analysis

This folder can house the protocols, code, and derived results that comprise an analysis performed on raw data.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.