About Data Sharing
Why share data?
Data sharing is central to open science. When you share your data, you show your support for the future of science—for its openness, its reproducibility, and its longevity. Responsible and open data sharing allows you to demonstrate the rigor and reliability of your work to others, and by doing so, you invite them to review, reproduce, and reuse your materials, potentially advancing new discoveries. For biomedical data, data sharing can lead to new treatments, therapies, and even cures, improving patient outcomes. Data sharing also contributes to the public good—it can build trust in science and increase access to scientific knowledge.
Beyond philosophical reasons, data sharing is also required by many funding organizations, including the U.S. National Institutes of Health (NIH), and many other governing bodies, foundations, and journal publications.
What is data sharing?
In general, data sharing means making data available to others in a responsible way. This can include the following additional steps:
providing information about the data such as abstracts, code, and protocols
ensuring access controls are in place for sensitive data
embargoing data for a specified period
de-identifying data
adding information about the data (called metadata or annotations) to enable data discovery and querying
These days, many researchers share data via online repositories equipped with features such as file storage, data annotation tools, security protocols, and search functionality. Some well-known scientific data repositories include GEO, cBioPortal, figshare, etc. At Sage, data are stored and curated in a platform called Synapse, and made explorable through data portals.
How do I share my data?
While the exact process may vary across our data portals and repositories, data sharing at Sage generally involves the following steps:
Get involved in a community. Start by connecting with the relevant community for your data. This may involve reaching out to portal maintainers, joining a consortium, securing funding, and/or developing a data sharing plan with a specific organization.
Prepare data. Before generating data, review the onboarding materials provided by your chosen community. These materials may include documentation, submission forms, webinars, one-on-one meetings, and other resources. Collect all necessary supplemental information and ensure you meet the data sharing requirements.
Deposit data and provide information about it. This step includes uploading and annotating your data, as well as supplying any additional information needed for understanding and curating the data. It may also involve conducting data quality checks and validating metadata. Consortia and/or funders may have extra requirements, such as milestone reports.
Determine data access controls. At Sage, we refer to the practice of deciding how data should be shared as data governance. This stage includes establishing data licensing and determining how and by whom the data can be accessed. While many datasets on our portals are unrestricted and open to the public, some require access controls, such as data use agreements and institutional review board approval.
Share data. Once the previous steps are completed, you are ready to share your data with others, whether openly or with restrictions. This step often occurs after a publication embargo period has lifted.
Access data. After your data is shared, it becomes accessible to others, including yourself and your colleagues. Thanks to the preceding steps, this data will be FAIR—meaning it is findable, accessible, interoperable, and reusable. FAIR data can be discovered through precise metadata, understood in terms of usage, machine-readable for computational analysis, and ultimately suitable for reuse.