About Data Sharing
Why share data?
Data sharing is central to open science. When you share your data, you show your support for the future of science—for its openness, its reproducibility, and its longevity. Responsible and open data sharing allows you to demonstrate the rigor and reliability of your work to others, and by doing so, you invite them to review, reproduce, and reuse your materials, potentially advancing new discoveries. For biomedical data, data sharing can lead to new treatments, therapies, and even cures, improving patient outcomes. Data sharing also contributes to the public good—it can build trust in science and increase access to scientific knowledge.
Beyond philosophical reasons, data sharing is also required by many funding organizations, including the U.S. National Institutes of Health (NIH), and many other governing bodies, foundations, and journal publications.
What is data sharing?
In general, data sharing means making data available to others in a responsible way. This can include the following additional steps:
providing information about the data such as abstracts, code, and protocols
ensuring access controls are in place for sensitive data
embargoing data for a specified period
de-identifying data
adding information about the data (called metadata or annotations) to enable data discovery and querying
These days, many researchers share data via online repositories equipped with features such as file storage, data annotation tools, security protocols, and search functionality. Some well-known scientific data repositories include GEO, cBioPortal, figshare, etc. At Sage, data are stored and curated in a platform called Synapse, and made explorable through data portals.
How do I share my data?
While the exact process has some variations across our data portals and repositories, data sharing at Sage consists of the following steps:
Get involved in a community. First, make contact with the appropriate community for your data. In some cases, this might involve contacting the portal maintainers, or it might involve joining a consortium, as well as securing funding and setting up a data sharing plan with a specific organization.
Prepare data. Before generating data, review your chosen community’s onboarding materials, which could include documentation, submission forms, webinars, one-on-one meetings, and other resources. Gather all supplemental information, and ensure you’ve met the data sharing requirements.
Deposit data and add information about the data. This step includes uploading and annotating data, as well as providing any supplementary information needed to understand and curate the data. This step may also include data quality checks and metadata validation. Consortia and/or funders may have additional requirements, such as milestone reports.
Determine data access controls. At Sage, we use the term data governance to refer to the practice of determining how data should be shared. This stage encompasses data licensing, as well as deciding how the data should be accessed, and by whom. Many datasets on our portals are unrestricted and open to the public, but some data require access controls, such as data use agreements and institutional review board approval.
Share data. After the above steps are complete, you’re finally ready to share your data with others, whether fully open or with restrictions. Often, this step occurs some time after the above processes, once a publication embargo period lifts.
Access data. Once your data is shared, it’s now accessible to others, including yourself and your colleagues. Because of the steps above, that data will be FAIR — a standard representing findable, accessible, interoperable, and reusable data. FAIR data are discoverable to users through precise metadata, understandable in terms of how the data can be used, machine-readable to enable computational analysis, and ultimately, fit for reuse.