By Kristen L. Overstreet
Senior Partner, Origin Editorial
Take Home Points:
Keep up to date on impending requirements from U.S. funders, related to the Nelson Memo, that will affect data and publication accessibility requirements for your journal’s future submissions.
Be prepared to update your journal processes related to open-data workflows.
Know where you can find resources to help you with ethical issues related to data publication.
Following the release of the Nelson Memo late last summer from the U.S. Office of Science and Technology Policy (OSTP), the industry has been buzzing about the potential effects these recommendations regarding U.S. federal funders’ policies will have on access to federally funded research. Will the recommendations force hybrid and subscription-based journals to flip to Gold Open Access or can these journals’ business models survive and still be in compliance with the U.S. funders’ expected policies of immediate public access? How will these changes be implemented? Where will the funding come from for authors who want to pay APCs to make their articles immediately publicly available? I will leave those questions to the Scholarly Kitchen, as the chefs have done a lot of thoughtful pondering on these subjects in recent months.
What interests me most in the Nelson Memo is related to the deposit of the underlying data:
“Scientific data underlying peer-reviewed scholarly publications resulting from federally funded research should be made freely available and publicly accessible by default at the time of publication . . .” (Nelson, 2022, p.4).
This recommendation will have a significant effect on editorial offices as we may be the people tasked with checking that authors have complied with this policy, at the journal level, and we will certainly have additional technical checks related to the datasets added to our submission processes. Our editors and reviewers may also have questions about accessing these data and how to review and comment on it. We will need to update submission instructions and policy and procedures manuals, and there will be ethical issues to manage.
Where do we start in preparing our documentation and updating our workflows? Who can tell us what to do? I’m sure the International Society of Managing and Technical Editors (ISMTE) and the Council of Science Editors (CSE) will soon have resources for us, and I predict this topic will be popular at the industry meetings in 2023. So, we should keep our eyes open for learning opportunities. In the meantime, I’m seeking advice from an expert and Origin colleague, Tim Vines, who contributed the “Processes” section below”.
Journals need to make a series of decisions about how they will implement and operate their open-data workflows. For funder-mandated open data, the journal must decide between two paths.
The Editorial Office (EO) could check open data only for manuscripts funded by an agency with an open-data mandate. In this workflow, the EO must check the funding statement for each manuscript to determine which policies apply. Since most manuscripts list funding from multiple sources, the EO must furthermore determine which policy takes precedence—presumably the strictest in each case. For journals in areas where most of the funders have open-data policies, this process will be difficult and time consuming, and hence, it’s recommended only when a small proportion of a journal’s manuscripts are affected by funder open-data mandates.
The second route avoids the difficulties of deciding which manuscripts to assess, and instead the EO formulates and enforces its own open-data policy. To ensure the EO is helping authors simultaneously comply with whatever funder mandate might apply, the journal must match its policy to the most relevant funder policy for its authors. In this route, the EO selects the article types most likely to contain data (e.g., Original Articles and Brief Communications) and checks for open data in all of these manuscripts.
Where to Check for Open Data
Next, the EO must determine where in the peer review process they will check for open data. There are a wide range of approaches here. Journals with limited resources might consider only checking articles that have been accepted for publication, which has the disadvantage that neither the reviewers nor editors are able to inspect the datasets before deciding whether to publish the paper. Moreover, once the manuscript is accepted for publication, the authors can be resistant to extra tasks like sharing their datasets.
The other extreme is checking all submitted manuscripts at initial submission (i.e., the EO checklist stage). In this case, the datasets are available to the reviewers and editors, but the (considerable) effort expended on open-data checks is effectively wasted if the manuscript receives an editorial rejection. However, this approach does have two advantages: 1) manuscripts that have not put all their datasets on a public server can be unsubmitted, which effectively resets the ‘time in review’ clock and avoids delays in the editorial process; 2) for journals that are part of a cascade, checking open data for all submissions saves effort for the other titles in the cascade. A caveat of the first advantage is that for authors who object to making their data public before the article is accepted for publication there are some data repositories (e.g., Dryad) that have a private peer review workflow; these allow reviewers and editors to log in and check the datasets, but the data are not publicly visible.
One sensible time point for checking open data is in parallel with the first round of peer review—once the article has been sent out to reviewers. Editorial rejections have already taken place, and hence this subset of articles has a considerably higher chance of acceptance. Moreover, the authors are engaged with the review process and are keen to take any steps that might improve the likelihood that their article is accepted for publication. Lastly, this stage of the review process typically takes at least two weeks, so checking open data in parallel does not slow down the review process (open-data checks typically take 2-4 days).
How to Check for Open Data
The last thing to consider is how the journal will check for open data, which can be difficult and time consuming. The steps are 1) read through the Methods section and generate a list of all the datasets that the authors generated for the article, 2) check which of those datasets have been shared on a public server, and 3) compose a list of actions the authors need to take to ensure that they comply with the open-data mandate. This can be difficult for EO staff who don’t have domain expertise, chiefly because it’s hard to decide what constitutes a dataset and what doesn’t and be able to confidently negotiate that with authors.
Given these difficulties, the EO can opt to just check whether the authors 1) generated any sort of data, and 2) shared any data on a public repository. The former can be established by finding any Methods sentence where the authors described collecting data (e.g., “We measured X with a Y machine.”), and the latter by looking in the Data Accessibility section for one or more links to an entry on a repository. While this approach is simple, and may help authors meet some lax funder standards, most authors will fail to share all of their datasets, leaving readers unable to reproduce the results and the journal open to criticism that the data policy is not being properly enforced. Finally, some funder mandates may include language to the effect that all of the data underlying the manuscript should be shared, so this basic approach may not help all authors to comply with their funders’ mandates.
A second approach is to bring in one or more Data Editors to help with assessing data sharing. These editors can make accurate judgements about which datasets should be shared, but motivating them to complete their assessments on time and to the agreed standard can be challenging. Since most Data Editors will be volunteers, the journal will typically try to minimize their workload by assessing only those articles that are accepted for publication, leading to the issues described above.
Third, the journal can outsource the compliance assessment process. This is a new approach, and only one organization, DataSeer, offers this service presently. DataSeer makes use of Natural Language Processing (NLP) Artificial Intelligence to find sentences where the authors describe generating and sharing their data. It then applies a set of rules to establish whether the authors have complied with the applicable policies. Using NLP helps keep costs down, although DataSeer’s Data Curators do check the tool’s assessment of every manuscript to ensure consistently high standards.
The National Institutes of Health (NIH) recently released their “2023 NIH Data Management and Sharing Policy,” which will become effective tomorrow, January 25, 2023, requiring investigators/authors to share the data underlying a publication at the time of publication, and “other data must be shared by the end of the research project or protocol” (When Data Need To Be Shared). “Scientific data are defined as the recorded factual material commonly accepted in the scientific community as of sufficient quality to validate and replicate research findings, regardless of whether the data are used to support scholarly publications” (What Scientific Data Need to Be Shared). Data Management and Sharing plans must be submitted to NIH for all research conducted on or after January 25, so manuscripts written about NIH-funded research conducted on or after January 25 must include a statement about how the data is being shared and where it can be accessed.
Publishing the underlying data for an article is important—future researchers can then reproduce and build upon that science. Data sharing may lead to improvements in human health, but has distracted by the practical questions:
- How do we manage the data statement in the article? Does the journal have to check that the statement adheres to the funders’ policy?
- Does the journal have to identify a preferred data repository?
- What does the editorial office have to do to ensure the authors properly deposited the data?
- What if the provided link to the dataset doesn’t work?
- How do the editors work with these data? They have precious little time now to get their editorial work done; how will they validate data sets?
- The same for reviewers—reviewer burden is an ever-increasing issue. It’s hard to secure reviewers and to get comments back in an author-approved amount of time. Are we now going to ask reviewers to comment on the dataset that accompanies a paper? What will that do to our conversion numbers and turn-around times?
And then there’s the ethics.
- Might we discover that datasets have been used without permission from the copyright or other licensed owner?
- What if the sharing of the data / attempt to publish the data violates privacy or other laws in the country where some of the data were collected?
- Could there be authorship issues related to datasets?
- And what about data manipulation and fabrication?
These questions just scratch the surface of the potential ethical issues related to publishing data.
I don’t have the answers to all of these questions, but fortunately, the Force11 and Committee on Publication Ethics (COPE) Research Data Publishing Ethics Working Group has already provided guidance on how to handle ethical issues related to data and has made that guidance available for editorial offices so we don’t have to figure this out on our own. In a blog post announcing the availability of this guidance, Puebla and Lowenberg (2021) said:
“The growth in data sharing over the last few years is an undeniably positive trend, providing the research community with ready access to valuable outputs and affording researchers further opportunities to extend the reach of their work. As more datasets are deposited and published, it is important—and necessary—to develop standards for the handling of possible ethical challenges that may arise in relation to published data: both to protect the researchers who contribute datasets and to secure trust by the scientific community in the value and reliability of public datasets.”
The Working Group has also created policy templates for repositories and publishers, and is working on flowcharts. I am a member of the working group and contributed to the guidance on authorship.
It is too early to say just how the Nelson Memo recommendations will affect our work as editorial office professionals, but I feel confident in saying that it will have an impact. Preparing ourselves by seeking the current information in our industry, considering potential effects on and changes to our processes, and being aware of the available resources will help us manage the new policies and procedures we need to improve the peer review process and resulting publications for our journals.
Acknowledgements: Ms. Overstreet gratefully acknowledges the contributions of Tim Vines for writing the “Processes” section of this post.
Conflicts of Interest
Kristen Overstreet: None to Declare
Tim Vines: Mr. Vines is the founder and director of DataSeer.