Title: URL Source: https://arxiv.org/pdf/2410.06941 Markdown Content: # WorkflowHub: a registry for computational workflows ## Ove Johan Ragnar Gustafsson 1, Sean R. Wilkinson 2,Finn Bacall 3, Stian Soiland-Reyes 3,4 , Simone Leo 5,Luca Pireddu 5, Stuart Owen 3, Nick Juty 3,Jos´ e M ª Fern´ andez 6,7 , Tom Brown 8, Herv´ e M´ enager 9,10 ,Bj¨ orn Gr¨ uning 11 , Salvador Capella-Gutierrez 6,7 ,Frederik Coppens 12 , Carole Goble 3* 1 Australian BioCommons, University of Melbourne, Melbourne, Victoria, Australia . > 2 Oak Ridge Leadership Computing Facility, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA . > 3 Department of Computer Science, University of Manchester, Manchester, UK . > 4 Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands . > 5 Center for Advanced Studies, Research, and Development in Sardinia (CRS4), Pula, Cagliari, Italy . > 6 Barcelona Supercomputing Center (BSC), Spain . > 7 Spanish National Bioinformatics Institute (INB), Spain . > 8 Leibniz Institute for Zoo- and Wildlife Research, Berlin, Germany . > 9 Institut Pasteur, Universit´ e Paris Cit´ e, Paris, France . > 10 CNRS, UMS 3601, Institut Fran¸ cais de Bioinformatique, Evry, France . > 11 Albert-Ludwigs-Universit¨ at Freiburg, Freiburg, Germany . > 12 VIB Data Core, VIB Technologies, Ghent, Belgium .*Corresponding author(s). E-mail(s): carole.goble@manchester.ac.uk; Contributing authors: johan.gustafsson@unimelb.edu.au; wilkinsonsr@ornl.gov; finn.bacall@manchester.ac.uk; soiland-reyes@manchester.ac.uk; simone.leo@crs4.it; luca.pireddu@crs4.it; stuart.owen@manchester.ac.uk; nick.juty@manchester.ac.uk; 1 > arXiv:2410.06941v1 [cs.DL] 9 Oct 2024 jose.m.fernandez@bsc.es; brown@izw-berlin.de; herve.menager@pasteur.fr; bjoerngruening@gmail.com; salvador.capella@bsc.es; frederik.coppens@vib.be; Abstract The rising popularity of computational workflows is driven by the need for repetitive and scalable data processing, sharing of processing know-how, and transparent methods. As both combined records of analysis and descriptions of processing steps, workflows should be reproducible, reusable, adaptable, and available. Workflow sharing presents opportunities to reduce unnecessary rein-vention, promote reuse, increase access to best practice analyses for non-experts, and increase productivity. In reality, workflows are scattered and difficult to find, in part due to the diversity of available workflow engines and ecosystems, and because workflow sharing is not yet part of research practice. WorkflowHub provides a unified registry for all computational workflows that links to community repositories, and supports both the workflow lifecycle and making workflows findable, accessible, interoperable, and reusable (FAIR). By interoperating with diverse platforms, services, and external registries, Work-flowHub adds value by supporting workflow sharing, explicitly assigning credit, enhancing FAIRness, and promoting workflows as scholarly artefacts. The reg-istry has a global reach, with hundreds of research organisations involved, and more than 700 workflows registered. > Keywords: workflows, registry, FAIR # 1 Introduction In an era of Big Data and data-driven science, the need for repetitive, scalable, repro-ducible and quality-assured data processing and analysis methods has contributed to a surge in popularity for computational workflows [1]. The past two decades have seen a handful of workflow management systems (WfMS) expand to hundreds [2], and workflows applied across a growing number of domains, including biosciences [3], astronomy [4] and the physical sciences [5]. In brief, computational workflows are a special kind of software for handling multi-step, multi-code data pipelines, analysis, and simulations, and are intended to automate data-handling processes. They come in many forms, but typically share cer-tain features: a high-level language executed by a dedicated WfMS, which manages data flow and code execution; a composition of modular code or workflow building blocks that can be remixed; and a tendency to be closely associated, even intertwined with the data on which they will operate [6]. Important scientific goals like repeatabil-ity, replicability, and reproducibility become more realistic when scientists specify their experiment’s analysis processes as a computational workflow [7]. For example, compu-tational workflows have become central to major international science missions that require systematic, reproducible, and shared data analysis. Recent examples include 2the global response to the COVID-19 pandemic and the analyses of SARS-CoV-2 [3], and the large-scale sequencing efforts currently in-flight for the Vertebrate Genomes Project (VGP) [8]. While these large consortia with defined collaborative research pro-grams are key drivers for computational workflow creation and deployment, workflows are also being adopted across scientific disciplines as their computational requirements increase, and the emphasis on reproducibility and portability increases [9–13]. Increasingly, scientists are also being asked to share their data and associated research objects (i.e. software), in ways others can reuse (e.g. the Nelson Memo 1,NASA SPD-41a 2). The idea is to accelerate scientific progress and spur innovation by enabling scientists to avoid reinventing each others’ work, and to explicitly support confidence in published results by removing ambiguity surrounding the approach taken to create research outcomes. In addition, scientific activity often includes the explo-ration of analysis variance; modifying workflows to understand effects and changes on data products is simpler when those workflows are clearly described and available. To this end, Wilkinson et al. published guiding principles for scientific data manage-ment and stewardship, providing guidelines for making data and other research objects Findable, Accessible, Interoperable, and Reusable (FAIR) by others [14]. The FAIR principles have sparked an entire movement in the international community towards adopting FAIR practices, and further work has been undertaken to extend the prin-ciples to research software [15], AI models [16], and computational workflows [17]. A fundamental step towards supporting FAIR workflows [18] is to enable the sharing of workflows and their descriptions [19] and make them findable. Researchers find software by searching: the web (i.e. search engines), public soft-ware project repositories (e.g. GitHub), the literature, mailing lists, discussion groups (e.g. StackOverflow), dependencies in the software itself, relevant registries (e.g. CRAN 3, communities of practice like nf-core [20, 21]), and even social media [22]. High-quality machine-processable metadata markup is needed to make workflows more findable and understandable in such a “search context”: in other words, the descriptors for workflows must be themselves standardised, accessible, and discoverable. Current mechanisms for sharing workflows do not achieve this outcome for the entire work-flow ecosystem. Sharing source code includes options such as version control platforms (e.g. GitHub 4 or GitLab 5), and WfMS-specific curated git repositories (e.g. Intergalac-tic Workflow Commission (IWC) 6 [23], nf-core, Snakemake catalogue 7 [24]). Creators can also publish their workflows, either in public generalist repositories (e.g. Zenodo 8,DataVerse 9), conventional journals (e.g. GigaScience 10 ) or software journals (e.g. the > 1https://www.whitehouse.gov/wp-content/uploads/2022/08/08-2022-OSTP-Public-Access-Memo.pdf > 2https://smd-cms.nasa.gov/wp-content/uploads/2023/08/smd-information-policy-spd-41a.pdf > 3https://cran.r-project.org/ > 4https://github.com/ > 5https://about.gitlab.com/ > 6https://github.com/galaxyproject/iwc > 7https://snakemake.github.io/snakemake-workflow-catalog/ > 8https://zenodo.org/ > 9https://dataverse.org/ > 10 https://academic.oup.com/gigascience 3Journal of Open Source Software, JOSS 11 ). Finally, a creator can register the work-flow using either a platform-specific (e.g. Knime Community Hub 12 , BinderHub 13 ,nf-core 14 ) or platform-agnostic solution. The latter includes Dockstore, a registry that supports the sharing and running of containerised tools, workflows and notebooks across diverse cloud computing environments 15 [25]. While critical for software, discovery of these many resources and platforms can be impeded by non-standardised descriptors that are not necessarily visible to search. Even once a workflow has been found, a divergent ecosystem does not lend itself to better integration of services, adoption of standards, or achieving FAIR outcomes for workflows. A registry that serves as a hub for these various mechanisms, and their specific benefits, would begin to address these challenges. It could support workflow developers to share and gain credit for their work, integrate with the platforms, services and infrastructures that developers and users rely on to both create and use workflows, and support making workflows FAIR. Structurally, a registry should be flexible, exten-sible, and use internationally recognised standards that accommodate rich metadata. To capture and present the breadth of the global computational workflow ecosystem back to the research community, a registry should be agnostic to domains and WfMS, and embrace community standards. Finally, it should provide mechanisms that can link workflows to other digital objects that provide context for a research project, including documents, standard operating procedures (SOPs) and publications. Here, we present a public and inclusive registry dedicated specifically to the shar-ing of computational workflows: WorkflowHub 16 [13, 26, 27]. WorkflowHub is designed to allow any scientist, regardless of expertise level, to contribute and share computa-tional workflows. It indexes workflows from any scientific domain, in any format, in any workflow language, regardless of whether it uses a WfMS. Here, we describe in detail how WorkflowHub’s structure, design, standards, community engagement, and continued evolution support: 1) collaboration, sharing and credit for workflow devel-opers, projects, and consortia; 2) integration with added-value services, platforms, and capabilities that support the workflow life cycle (i.e. creation, version control, execution, maintenance, reuse and citation); and 3) wizards and inbuilt features that ease the process of sharing workflows alongside the constellation of associated digital artefacts that give a workflow its scientific context. # 2 Results ## 2.1 A registry for computational workflows The WorkflowHub is a registry for describing, sharing and publishing scientific compu-tational workflows, irrespective of their type, development and maintenance location, or discipline. On the landing page for WorkflowHub, a new user is presented with a description of the platform and its purpose, the latest workflow additions, what > 11 https://joss.theoj.org/ > 12 https://hub.knime.com/ > 13 https://binderhub.readthedocs.io/en/latest/ > 14 https://nf-co.re/ > 15 https://dockstore.org/ > 16 https://workflowhub.org 4content is discoverable, and how to join the WorkflowHub community. Underpinning WorkflowHub is the implementation of open tools and standards, which are further described herein. Collaborating Teams are supported by registry features that support workflow reuse, and include integration with native workflow repositories, assignment of credit, import and export, and the creation of curated Collections of workflows that are enriched by other digital objects (e.g. publications, SOPs). Figure 1 provides a conceptual view of the WorkflowHub’s capabilities and its relationship to the work-flow development and publishing ecosystem outlined above and discussed later in more depth. The registry is agnostic to domain, discipline and workflow type, supporting its adoption by a wide spectrum of researchers and other stakeholders. As a result, at the time of writing (October 2024), the registry already indexes workflows ranging from biodiversity, to astronomy and particle physics (see workflows list on WorkflowHub 17 ), with 764 workflows registered for an array of workflow types 18 and 840 registered users from 236 Organisations across 35 countries. WorkflowHub was launched in 2020, as part of the EOSC-Life Workflow Collabo-ratory [13], to support the registration of workflows required for the response to the COVID pandemic. WorkflowHub now houses 66 COVID-related workflows 19 , includ-ing those that support the ongoing global analysis of intra-host variation as new samples become available [3, 29]. This outcome demonstrates a central ambition of WorkflowHub: to be of practical use in advancing the application of computational workflows in research science by supporting the needs of the communities that it serves .WorkflowHub meets community requirements and supports the workflow life cycle in three key ways. Firstly, the registry provides structures that directly support collab-oration, sharing knowledge and distributing credit. Secondly, woven into this structure are multiple integrations with other elements of the global research ecosystem that support the workflow life cycle: creation, development, discovery, reuse, and citation of workflows. Finally, WorkflowHub provides a registration wizard that guides users in leveraging these structures and integrations. This approach is deliberate and will continue to evolve in lock step with the requirements of the community. In the fol-lowing sections, we will first describe how the data model and metadata framework of WorkflowHub support the registry’s core functions - namely registering, finding, and launching workflows and associated digital data objects. In turn we highlight how the registry design, including the use of wizards to guide best practice, allows it to act as an integrating hub across the workflows ecosystem and to support each stage of the workflow life cycle. Finally, we will describe how the WorkflowHub engages with the workflow community and highlight some key use cases for the registry. ## 2.2 A data model that reflects the real-world collaborations that create workflows Science is a collaborative enterprise, and infrastructure platforms should reflect this quality to be of practical use in accelerating science missions. Research programs > 17 https://workflowhub.eu/workflows > 18 https://workflowhub.eu/workflow classes > 19 https://workflowhub.eu/search?utf8=%E2%9C%93&q=covid#workflows 5Fig. 1 WorkflowHub connects to platforms, services, and resources that support a workflow’s life cycle [28]. A researcher initially needs to Plan & Find , where they either plan for a particular anal-ysis and find existing workflows (i.e. using a registry), or Develop a new workflow. WorkflowHub integrates with Git repositories (e.g. GitHub, GitLab), and Git-supported communities (e.g. nf-core), to support development. A workflow requires Test & Review to Run & Deploy , and here WorkflowHub connects to support services (e.g. LifeMonitor, bio.tools, Sapporo WES, WfExS) and welcomes diverse workflow platforms that aid deployment (e.g. CWL, Snakemake, Galaxy, Jupyter, Python, BASH, WDL, Nextflow). A creator needs to Share a workflow and can benefit from Work-flowHub’s use of citation infrastructures and standards (i.e CITATION.cff, Zenodo, DataCite, DOI and ORCID). In the Maintain & Learn stage, maintenance, and also understanding of a workflow by other researchers, becomes critical as it impacts workflow Reuse & Rework , where a workflow is either reused, or adapted, by other researchers to suit their requirements. WorkflowHub supports these stages through registration of digital objects that enrich a workflow (e.g. documents, publica-tions, SOPs), the ability to create Collections and workflow citations based on DOIs, and ultimately through the connections created to knowledge graphs. WorkflowHub also enables communities of practice to benefit from all its integrations and connections, ensuring that they can reuse or rework workflows from across the globe. The entire support framework is enabled by the implementation of standards that allow WorkflowHub to interact with the ecosystem and truly act as a “Hub”: EDAM, Research Object Crates (RO-Crates), GA4GH APIs, abstract Common Workflow Language (CWL), FAIR Signposting, and Bioschemas. 6Fig. 2 Workflow types registered with WorkflowHub. and projects are also intertwined with diverse computational approaches and ways of sharing research outcomes, and these are subject to the same requirements for findability, credit and impact assessment [6]. As a result, WorkflowHub is structured to reflect real-world collaboration and assign complex credit well. The data model of WorkflowHub provides access to three elements for every reg-istered user: Organisations, Teams, and Spaces. A user can specify one or more Organisations (i.e. affiliations) as part of their user profile. They also must belong to at least one Team, which must also belong to an administrative Space. If existing Teams are not suitable or appropriate, a user can create a new Team. A member of a Team can specify Organisations for each Team they join, which allows users to be related to dif-ferent Organisations for different Teams. Multiple creators and Teams can be specified for a single workflow, additional credit can be assigned to contributors, and a distinc-tion can be made between creators and submitters. In effect, users belong to Teams, which belong to Spaces, and in this way credit is able to cascade as required from a workflow to creators, contributors, submitters, the Teams and Spaces to which they belong, the consortia and Organisations that these represent, and even new workflows which are derived from the original (see Figure 3). This nested structure is capable of addressing sharing and credit for a diverse set of workflow contributors, including, but not limited to, individuals (e.g. workflow developers), research groups, institutions (e.g. universities), communities of practice (e.g. nf-core), and major research consortia (e.g. Biodiversity Genomics Europe (BGE) 20 ). For example, single users or research groups may only require a single Team to represent their workflow(s), and in this case they would add their Team to the default “Independent Teams” Space. However, > 20 https://biodiversitygenomics.eu/, https://doi.org/10.3030/101059492 7a consortium representing multiple research groups or institutes may create a dis-tinct Team for each one of its collaborating entities, and add the Teams to either the Independent Teams Space, or create a new Space to administer all these Teams. An individual, group, or consortium can thus establish a presence on WorkflowHub that reflects their real world structure, and which WorkflowHub then uses to assign credit. > Fig. 3 A guide to the structures in WorkflowHub. You, the user, belong to one or more Organisations (i.e. affiliations). You can also belong to one or more Teams, each of which also needs to belong to a single Space (top) . You can nominate which Organisations you wish to use for the different Teams that you have created or joined, and you can belong to multiple Teams in the same Space, as well as multiple Teams in other Spaces (bottom) . Image reused with permission from WorkflowHub documentation. The structure of WorkflowHub can also support an entire community of prac-tice, whereby Spaces and Teams are used to organise contributors, share workflows, and support the sharing of knowledge within that community. This is achieved by registering workflows and linking them to other research outputs, including events, presentations, documents, publications, data files and SOPs - these assets can either be hosted by WorkflowHub or added by reference. A workflow developer can even nom-inate relevant community channels where they can connect with users, and curated “Collections” of workflows can be constructed to help a community manage specific sets of relevant workflows, and other outputs, that may span many scientific applica-tions, workflow languages and research programs (e.g. Threatened Species Initiative 21 annotation workflows 22 ). To streamline the onboarding process for consortia and larger > 21 https://threatenedspeciesinitiative.com/about-tsi/ > 22 https://workflowhub.eu/collections/23 8projects, a generic set up guide [30], and multiple guides for specific consortia [31– 33], have been created. These guides describe the steps required to get set up on WorkflowHub and use the platform effectively. ## 2.3 A Hub for workflows As the name of the registry suggests, the underlying aim of WorkflowHub is to act as a “Hub”, and specifically a Hub that helps to connect the workflows ecosystem and ease the interoperation of its constituent platforms and services. To realise this, Work-flowHub relies on a web-friendly metadata framework that simultaneously supports data representation within the registry itself, while also acting as a foundation for exchange of data and metadata between the platforms and services that are described in Figure 1. Here, we provide more details for three core areas that underpin this “Hub” functionality [13]. WorkflowHub participates in the EOSC federated Authentication and Autho-rization Infrastructure through LS-Login and the OAuth2 protocol, in addition to also supporting authentication via GitHub. This feature allows systems interacting with WorkflowHub to identify users as the same individual across different systems – even with their identity provided by their participating institutional accounts – enabling single-sign on and smart authorization decisions on accessing and operating on workflows, their metadata and other resources across the ecosystem forming around WorkflowHub. To help describe workflows and their components, the registry uses three profiles from bioschemas [34]: Computational Tool 23 , Computational Workflow 24 , and Formal Parameter 25 . Even though these are a part of the bioschemas effort, the profiles sup-port a discipline independent and standardised way of describing workflows and their components. In addition, and owing to close collaboration with the Common Workflow Language (CWL) [35, 36] community, the CWL workflow specification is encouraged as a workflow language independent way of describing a WorkflowHub entry. This is the so-called “Abstract CWL”. This format can even hold semantic annotations, which WorkflowHub leverages to extract the typing of workflow inputs and outputs, as well as EDAM ontology concepts for Topics and Operations [37]. RO-Crate [38] is a standard for FAIR Research Objects. It was developed by the community to package a workflow with the components required to understand and execute that workflow. The additional packaged components may include test data, Abstract CWL, diagrams, publications, and SOPs, as well as the flat metadata file that provides the context for all these assets 26 [38]. The implementation of RO-Crate is central to the ability of WorkflowHub to interoperate and exchange Workflow-RO-Crate 27 data with the ecosystem in Figure 1 (e.g. Zenodo archiving) and for exposing workflows as FAIR Digital Objects [39, 40]. WorkflowHub can also interoperate with workflow execution platforms that are part of the ecosystem (see Figure 1) through its implementation of the GA4GH Tools > 23 https://bioschemas.org/profiles/ComputationalTool/1.0-RELEASE > 24 https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE > 25 https://bioschemas.org/profiles/FormalParameter/1.0-RELEASE > 26 https://www.researchobject.org/ro-crate/ > 27 https://w3id.org/workflowhub/workflow-ro-crate/1.0 9Registry Service (TRS) API 2829 . This means that a user of a TRS enabled analysis platform, like Galaxy, is able to search for and retrieve workflows, without leaving the platform. It also means that WorkflowHub can send workflows to these platforms, ready for execution. Ultimately, the impact of these features is threefold. Firstly, WorkflowHub is able to support machine actionability, as described by the FAIR principles [14]. This underpins the registry’s ability to connect to services and platforms that are in use day-to-day by workflow users and developers. Finally, these same users are able to access this ecosystem “Hub” using multiple authentication mechanisms and leverage multiple standards when contributing workflows. ## 2.4 Using WorkflowHub to register, find and launch workflows The primary purpose of WorkflowHub is to allow researchers to register and share workflows. Existing public workflows can therefore be viewed, downloaded and launched without needing to register with WorkflowHub. By extension, the contri-bution of open access and publicly accessible workflows is also encouraged. However, workflows may be registered privately, or be embargoed. This functionality supports creators of workflows in cases where they would like to (i.e. a workflow is still being developed), or need to, limit access to a specific group of users. User authentication (i.e. login) is required to register content with WorkflowHub, and enables the registry to assign credit and enable citation. To contribute content an individual user needs to: 1. Register and indicate the Organisations to which they are affiliated. A user can add the following to their profile: a description, their ORCID 30 , contact details (visible to those in shared Teams and Spaces), as well as knowledge and expertise. More advance configurations are also available via a user profile, including the management of OAuth sessions, authorised applications, API applications, and API tokens. 2. Decide which Space on WorkflowHub to use: a Space is a user-administered section of WorkflowHub that can be used to manage the Teams required for consor-tia, institutes, or other large research activities. WorkflowHub administers a single default Space called “Independent Teams”. This Space can be used when users do not need to create and manage many Teams, but simply need to create a single Team. All other Spaces are created upon request and administered by those who requested the Space. 3. Create or join at least one Team: Teams are one or more people working on a particular research activity involving workflows. Every workflow in WorkflowHub is owned by at least one Team. WorkflowHub users must therefore belong to at least one Team, and this Team must belong to a Space (e.g. Independent Teams). In addition to supporting the correct assignment of credit to workflow developers, contributors, and submitters, the Team also enables its members to further describe > 28 https://ga4gh.github.io/tool-registry-service-schemas/ > 29 https://www.ga4gh.org/product/tool-registry-service-trs/ > 30 https://orcid.org/ 10 the context for their workflow development (i.e. background, project description), and serves to promote their contributions to other WorkflowHub users. Once these steps are complete, a user has the option to register: • Core resources such as workflows and Collections. As workflows do not exist in a vacuum, Collections allow a WorkflowHub user to bring together workflows with any of their other resources and activities (see below) to create a visible and holistic resource that can support workflow reuse. • Other resources , including publications, documents, data files, and SOPs. • Activities , including presentations and events. Registration of workflows can be carried out by manually uploading a workflow file, importing either a RO-Crate or Git repository, and by submission through a REST API 3132 . As a registry, WorkflowHub is designed to link to workflows held in their native repositories. However, because manual uploading and storage of files is also supported, it can also act as a repository. RO-Crate is used by WorkflowHub as a fundamental unit that underpins upload, download, import and export. Importantly, a user does not need to know how to create or work with RO-Crates, as the registry automatically builds a crate when a workflow is registered using one of the other available mechanisms. To streamline the above processes, each stage is guided by inbuilt wizards and users are prompted to carry out the next step. For example, after user registration with WorkflowHub, a user is prompted to join or create a new Team. After registration of a workflow file or workflow repository, the user is prompted to complete the set of metadata suggested to create a well described and FAIR workflow entry. Figure 4 illustrates two example WorkflowHub entries: the dna-seq-varlociraptor Snakemake workflow 33 [41], and the Find transcripts - TSI Galaxy workflow 34 [42]. The top of the entry emphasises the workflow name (Figure 4.B) and workflow man-agement system (Figure 4.A), along with quick links to the development repository, requesting contact with the authors, subscribing to notifications about changes to the workflow, downloading a workflow RO-Crate, and adding the workflow to a collection. In addition, the workflow creator also has access to administrative options for the workflow entry in this section, which include adding new documents or presentations connected to the workflow, and workflow actions such as adding new versions, request-ing a DOI for a specific workflow version, editing the workflow metadata, managing workflow contributors and visibility, and deleting the workflow entry (Figure 4.C). The main panel for the entry has three tabs (Figure 4.D) that provide a workflow overview (including descriptions, version information, metadata, critical annotations, and activity analytics), access and view capability for the files that were registered, and links to related items (e.g. people, Spaces, Teams, Collections, other workflows). The examples use many of the key features of WorkflowHub, including those enabled by the WorkflowHub registration wizard, which prompts inclusion of critical > 31 https://about.workflowhub.eu/developer/ro-crate-api/ > 32 https://about.workflowhub.eu/docs/adding-files/ > 33 https://workflowhub.eu/workflows/686 > 34 https://workflowhub.eu/workflows/877 11 Fig. 4 Two example entries in WorkflowHub (left: [41], right: [42]) with sections of the user interface annotated and each entry using the flexible features of WorkflowHub in distinct ways. Entry features include A) workflow type, B) title, C) access panel with links to the source repository (e.g. GitHub), requests to contact the creators, subscribe / unsubscribe, download research object crate (RO-crate), add to a Collection, and in the right hand example access to administrative menus such as Add new (e.g. document, SOP) and Actions (e.g. edit or manage the workflow, including versions and minting DOIs), D) tabs for navigation between the entry overview, the list of files in the entry, and lists of items related to the workflow, including people, Teams, Spaces, Organisations, and other digital objects (e.g. publications, documents, SOPs, other workflows), E) description, which can be imported from Git, if available, F) version history, including Git commits, if available, G) creator and submitter information, H) links to more information about tools that comprise the workflow (i.e. bio.tools registry entries), I) license information, J) activity metrics (i.e. downloads and views), K) ontology concept annotations (e.g. EDAM in the left example entry), L) workflow diagram, M) parsed workflow inputs, outputs and steps for specific WfMS (e.g. Galaxy in the right example entry), N) buttons for launching workflows on execution platforms (e.g. Galaxy for right example entry), O) citation for the workflow (i.e. either using information from a minted DOI or a custom citation (e.g. workflow publication), P) custom tags, and Q) Collections that include the current workflow entry. metadata: these include title (Figure 4.B), workflow management system (Figure 4.A), creators (Figure 4.G), component tools (Figure 4.H), license information (Figure 4.I) and ontology annotations (Figure 4.K). In addition, the dna-seq-varlociraptor work-flow made use of the WorkflowHub Git integration to ingest the repository README file (complete with badges), as well as providing the link to the development repos-itory (Figure 4.C), the ability to access and view the repository file list natively in WorkflowHub (Figure 4.D), and an annotated version history (including commit IDs, Figure 4.F). 12 To find these workflows within the registry, a user can start by applying a text-based search, or by visiting the complete workflow listing where they can filter by type (e.g. Galaxy, Nextflow), tools used (i.e. using bio.tools identifiers [43], creators, Organisations, Teams, Spaces, and more. Researchers can also refine their searches by making use of faceted browsing and filtering on tags and other annotations, and are able to sort by titles, dates, views and downloads. Galaxy’s integration with WorkflowHub leverages the GA4GH TRS API, enabling seamless workflow exchange. Researchers can discover, import, and run workflows from WorkflowHub directly within Galaxy. From the WorkflowHub interface users can utilise the “Run in Galaxy” button (see Figure 4.N), which redirects them to a Galaxy instance and a workflow run form. This interoperability facilitates immediate application and further development of workflows. The use of RO-Crate specification ensures that workflow metadata and components remain accessible and interoperable, aligning with FAIR principles. ## 2.5 Design that supports the workflow life cycle To support the workflow life cycle [6, 28, 44], WorkflowHub integrates with services that workflow creators use for development, execution, maintenance, testing, citation, and ultimately archiving. These integrations initially form part of the workflow reg-istration wizard, which guides a workflow creator through the process of registering their workflow for the first time. However, they are also accessible during a workflow’s maintenance phase, when the workflow may be updated to modify, improve, or repair its function. 2.5.1 Ease of access A user of WorkflowHub begins by accessing the service via LS-Login. This enables researchers to use credentials from their specific institutions or even other identity-providing platforms (e.g. Google, Apple, ORCID). Authentication via GitHub or a local WorkflowHub account is also supported. This feature also enables WorkflowHub administrators to manage user access rights, and to create a custom combination of access levels that are suitable for specific user groups (e.g. research groups, consortia, international projects). For example, a contributor may simply want to register a single workflow to make it findable and the contributor may be the only user that requires edit access to the workflow, or to the Team to which the workflow belongs, and access rights can be set accordingly. Conversely, a community of practice (e.g. nf-core), or consortium (e.g. BGE) may have multiple contributors, from multiple institutes, that also belong to multiple WorkflowHub Teams. In this case, granular permissions for edit rights can be set at the workflow, Team and Space levels. 2.5.2 Development and versioning Integration with the Git version control system is a key aspect of WorkflowHub that supports workflow creation, development and maintenance. If the workflow registra-tion wizard is provided with a Git repository URL, WorkflowHub will automatically import and parse its metadata. In this case, a workflow creator only needs to review, 13 and potentially update, metadata fields prior to completing the registration process. WorkflowHub’s Git integration supports workflows to remain in their native creation and development environment (i.e. a version control system), avoiding any impact of registration on the workflow’s development and management process. Moreover, it allows for automation of tasks like updating workflow entries in the registry when workflows are versioned (e.g., by using the LifeMonitor GitHub app 35 [13]). For use cases like Galaxy, where workflows can be created via graphical user interface, rather than scripting, WorkflowHub provides the option to manually upload a workflow file, and step through the wizard manually to enter metadata. 2.5.3 WorkflowHub welcomes all workflows Workflows come in all shapes and sizes, and may even be composed of multiple sub-workflows. They may start off as a set of scripts and evolve into a heavily standardised and portable workflow [11]. They may use one of the many known WfMS [2]. And, of course, workflows can span virtually every field in the sciences and beyond. In short, the workflow and WfMS ecosystems are diverse. One role of WorkflowHub is to make these ecosystems transparent, and it does this by being agnostic to workflow lan-guage, maturity, source, structure, and even scientific quality. Contributions of every workflow type are encouraged 36 . Workflows at any development stage (i.e. work-in-progress) are encouraged, not just those workflows considered to be completed and stable. As such, an indication of the maturity of a workflow can be assigned by the creators, and naturally, this metadata is presented to potential workflow consumers in the registry entry to readily identify workflows that may not be ready for reuse. 2.5.4 Annotating workflow purpose (i.e. adding metadata) The WorkflowHub registration wizard guides users in provision of metadata. Although Bioschemas metadata profiles are used, the only mandatory metadata fields are the workflow Title and the contributing Team(s). In addition to describing the workflow itself, the metadata wizard can be used to associate a workflow to relevant other workflows, presentations, publications, documents, SOPs and data files. Two key metadata integrations are in place that allow users to search for and add standard identifiers when editing workflow metadata. This functionality is avail-able for bio.tools software identifiers 37 and the EDAM ontology concept identifiers for both Topics (e.g. genomics) and data transformation Operations (e.g. genome assembly) [37]. A user can therefore annotate their workflow with persistent links to registry metadata about software that the workflow contains, and standardised short-hand terms that describe its application area and function. In the case of Galaxy, the standard structure of the workflow file is used to extract software tool identifiers and map these automatically to bio.tools [45]. A user can also build on these integrations by manually including custom tags and keywords. > 35 https://lifemonitor.eu/lm wft best practices github app > 36 https://workflowhub.eu/workflow classes > 37 https://bio.tools/ 14 2.5.5 Discovery and understanding When a workflow is ready to be shared and reused, a creator can update their workflow maturity from “work-in-progress” to “stable” in the WorkflowHub entry metadata. It is at this point that a workflow needs to be discoverable in multiple ways, and remain accessible in its original location. The reason for this is that not all researchers will necessarily seek to discover workflows in the same way. As a result, WorkflowHub is flexible in its approach. As mentioned earlier, within the registry itself a user can start by applying direct search and filtering to find workflows. External to the registry, the machine-readable, standardised format of WorkflowHub (i.e. Bioschemas) increases the search engine visibility of the rich metadata in a workflow entry. You can interactively explore the impact of workflow registration using evaluator tools such as FAIR-checker [46] and FAIRsoft [47]. Finding a workflow is step one for a potential user. Once found, the WorkflowHub entry metadata supports a user to understand the workflow, including its design, con-tent, and purpose [7]. For example, “this is a Galaxy type workflow, containing these tools (i.e. links to bio.tools), which are capable of these types of data transforma-tions (i.e. EDAM annotations)”. From a workflow entry that has been annotated with tools, a user can access links to navigate directly to the referenced bio.tools entries to explore and further understand the components that comprise the workflow. Users can also view the files in the source Git repository (if the workflow was imported from Git), and with a single click visit the repository. It is even possible to subscribe and be notified of changes to entries, removing the need for constant monitoring. 2.5.6 Execution / reuse WorkflowHub actively supports and develops integrations with workflow execution platforms and services. A key example is the GA4GH TRS API 38 . If an execution platform or system adopts the TRS standard, it can search WorkflowHub for suitable workflows, retrieve those workflows, and execute them, without the need to develop custom integrations with the workflow’s native repository. Examples include, Galaxy, Sapporo [48] (DNA Data Bank of Japan (DDBJ 39 ), and the Workflow Execution Service (WfExS) 40 [49], all of which implement the TRS API, either as providers or consumers. As execution platforms (e.g. Galaxy) can make use of the TRS, they are also able to provide inbuilt search interfaces that connect to WorkflowHub and support platform users to find and import a specific workflow. The connection of WorkflowHub to the LifeMonitor service 41 , through the LifeMonitor GitHub app, allows workflow function and status to be reported to maintainers and users through regular automated tests driven by continuous integration (CI) based monitoring (e.g. Planemo automated workflow testing using Galaxy [50]). In these cases, WorkflowHub will also include a badge that shows if the tests are passing or failing. An embedded link in the badge takes a user to the LifeMonitor page for the workflow, providing information on the reliability of the workflow over time and the timeliness of the workflow maintainers in > 38 https://github.com/ga4gh/tool-registry-service-schemas > 39 https://ddbj.nig.ac.jp/ > 40 https://github.com/inab/WfExS-backend > 41 https://www.lifemonitor.eu/ 15 solving issues as they arise. In addition, the app can automatically suggest changes to the workflow Git repository that will bring its metadata and structure in line with best practices. Finally, to streamline the inclusion of reuse conditions, licensing information is included in the metadata for workflows registered with WorkflowHub, default sharing and license conditions can be specified for Teams, and these rules can be updated by Team administrators. 2.5.7 Attribution and citation It is important to properly attribute contributions to workflows, and this includes provenance of the entire workflow development process. The RO-Crate format used by WorkflowHub allows for provenance tracking of metadata, ensuring that workflow cre-ators are given the credit they deserve, while also adding accountability. Workflows can be linked to each other using WorkflowHub metadata (i.e. the ”attribution” metadata can be used to indicate that a workflow is based on another workflow). The Git inte-grations described above also support the citation standard CITATION.cff 42 [51], so that WorkflowHub can import this file and use its contents to populate workflow entry metadata for creators, including their ORCID. This approach simplifies citing a work-flow according to the wishes of the workflow developer. Once a workflow is registered and credit is established within the metadata framework of WorkflowHub, the registry can also, at the push of a button, use DataCite 43 to mint persistent digital object identifiers (DOIs) for workflows and to contribute to the DataCite PID Graph 44 . This is the first step to ensuring that workflows can be cited effectively, increasing their visibility and potential impact, and supporting inclusion in the scholarly knowledge graph. ## 2.6 User engagement and training WorkflowHub engages and supports a broad set of use cases, including numerous projects and consortia of global significance. Major projects are supported so that their members are able to use WorkflowHub in a way that aligns with the expectations of their project and its funders. Workflow communities (e.g. nf-core, CWL, Snakemake) are directly engaged by the WorkflowHub team, and supported to make their best practice workflows available in the registry. The main mechanism through which engagement happens is the fortnightly open format WorkflowHub Club meeting 45 , where anyone can join the conversation, learn more about the registry, ask questions, and even contribute to the on-going devel-opment and evolution of the registry. There are also indirect ways through which workflow creators and users interact with WorkflowHub and its resources. The Work-flowHub Club team creates and presents content for the registry at conferences, in webinars and workshops, and as part of Ask-Me-Anything forum events 46 . Regis-tered users of WorkflowHub are also able to ask questions and provide direct feedback > 42 https://citation-file-format.github.io/ > 43 https://datacite.org/ > 44 https://support.datacite.org/docs/datacite-graphql-api-guide > 45 https://about.workflowhub.eu/#community > 46 https://about.workflowhub.eu/project/outreach/ 16 via the registry interface. These communications are sent directly to the adminis-trators of WorkflowHub for review and response. Finally, WorkflowHub operates a documentation site where users can access information on how to use the registry 47 .Clear and practical guidelines are required to support users of WorkflowHub. For example, some annotations are consistently missing from workflow entries, and useful features are sometimes overlooked (e.g. Git integration, parsing of CITATION.cff files, linking data to workflows, and linking workflows to each other). This highlights that the rich feature set of WorkflowHub is not necessarily immediately clear, and that guidance in leveraging these features is absolutely critical to support users in achieving best practice. It is important that the provided guidelines also extend to recommen-dations for how to organise a workflow development repository (i.e. Git repositories). This will enable WorkflowHub to extend the process currently in place for Galaxy IWC to more community repositories: integrating with these repositories in a more stan-dard way to automatically manage the update of workflow versions in WorkflowHub, such that they are in sync with Git releases. As a result, and as highlighted earlier, a general onboarding and set up guide for projects and consortia has been developed [30], as have multiple consortia specific guides [31–33]. Workflow resources have also been developed for the Galaxy Training Network (GTN [52]) Sm¨ org˚ asbord events 48 and specific GTN tutorial sets 49 .Finally, the WorkflowHub team actively identifies opportunities to engage with peer infrastructures to grow the user base of the registry, investigate and create inte-grations that are of enduring value, and further improve the function of the registry. For example, WorkflowHub is actively fostering a conversation with publishers and journals focused on how to make workflows citable objects in the literature [53]. Work-flowHub is also being further developed to fully align with the best practice guidelines of the SciCodes Consortium [54], implement the FAIR Principles for Research Software (FAIR4RS) [15] and contribute to the CodeMeta specification for software [55]. As described earlier, a registry like WorkflowHub enables the creation of workflows that follow the FAIR principles, from the perspective of data [14], software [15], and the unique features of workflows (e.g. abstraction, composition). WorkflowHub is central to discussions in the FAIR Computational Workflows working group for the Workflows Community Initiative (WCI 50 ). This working group engages broadly across the global workflows ecosystem (i.e. workflow developers, communities, platforms and services) to develop FAIR principles for workflows [17]. The ultimate aim is to use the outcomes of these engagements to guide the evolution of WorkflowHub as a FAIR registry for workflows. ## 2.7 Use Cases To effectively support the sharing of workflows, WorkflowHub supports collaborations and communities of practice within the sciences. WorkflowHub contributions span domains such as cancer, COVID, genomics, rare diseases, geosciences, climate, physics, > 47 https://about.workflowhub.eu/docs/ > 48 https://gallantries.github.io/video-library/modules/ro-crate > 49 https://training.galaxyproject.org/training-material/topics/fair/ > 50 https://workflows.community/groups/fair/ 17 and more. WorkflowHub users span the globe and 35 countries are represented in the registered user list 51 . 2.7.1 Research consortia & infrastructures WorkflowHub is an integral platform for consortia and projects. Here we provide details for three specific use cases, EOSC-Life, BGE, and Australian BioCommons. EOSC-Life was a key use case driver, as it supported the implementation of FAIR computational workflows in the EU by seeking to develop a cloud-based Workflow Collaboratory [26] that ultimately resulted in the creation of WorkflowHub. The aim was to create a platform that would support community collaboration on the devel-opment, use, and reuse of FAIR computational workflows [18], and to do so in a way that bridges research domains and infrastructures [13, 26]. WorkflowHub accommo-dates the diversity of EOSC-Life and ensures the visibility of workflows applied across its many established research infrastructures as they are created and registered [26]. The BGE project is a coming together of two communities of researchers with a common goal of cataloguing biodiversity through genomic resources: the European Reference Genome Atlas [56] and the European node of the International Barcode of Life consortium (iBOL Europe 52 ). Providing reference-quality genomes to the com-munity (ERGA) and monitoring biodiversity through DNA barcoding (iBOL Europe) requires the management and processing of vast amounts of data, in an accessible and distributed fashion, relying on input from multiple individuals and institutes. The combination of BGE WorkflowHub Spaces 53 , Teams 54 and Collections 55 allows indi-viduals to contribute as needed to the projects across the consortium. As workflows have been collected and curated by the community, they in effect also come with a “seal of approval” for external users that wish to replicate the work of ERGA or iBOL Europe. All together, such a structure of publishing and maintaining workflows facil-itates BGE in achieving their ambitious goals of cataloguing biodiversity in Europe and bringing together researchers from the biodiversity genomics community. Australian BioCommons 56 is a national infrastructure project that actively sup-ports life science research communities with community scale digital infrastructure [57]. Rather than building anew, BioCommons aims to adopt fit-for-purpose inter-national platforms and services that, in the case of workflows, can assist with the provision of sophisticated software, analysis capabilities, and digital asset steward-ship. WorkflowHub is the primary workflow registry for BioCommons 57 , and it is the focal point for sharing the collaborative workflow efforts of BioCommons and its infrastructure partners together with Australian life science researchers 58 . > 51 https://workflowhub.eu/people > 52 https://iboleurope.org/ > 53 https://workflowhub.eu/programmes/25 > 54 https://workflowhub.eu/projects/163 > 55 https://workflowhub.eu/collections/10 > 56 https://www.biocommons.org.au/ > 57 https://workflowhub.eu/programmes/8 > 58 https://workflowhub.eu/collections/6 18 2.7.2 Workflow management systems WorkflowHub accepts all workflow types, and includes Galaxy [23], Snakemake [24], Nextflow [21], job schedulers (e.g. PyCOMPS [58], application-specific types like SCI-PION [59], notebooks (e.g. Jupyter 59 ), and even scripting languages (e.g. R [60] and Python 60 [61] (see also Figure 2). WorkflowHub provides customised support for WfMS that are critical for specific domain communities (e.g. bioinformatics). This is cur-rently the case for Galaxy, CWL and Nextflow, and additional significant and popular WfMS may be supported with relevant features, when appropriate. As an example, the registry development team, together with Galaxy, have created functionalities for WorkflowHub that include 1) semi-automated registration of new and updated Galaxy IWC workflows, 2) integration with LifeMonitor to further support semi-automated registration but also Planemo workflow test monitoring, and 3) automatic mapping of tool identifiers to the bio.tools registry [13]. As the CWL community has been closely involved in registry development, WorkflowHub also has in-built functions for parsing CWL, and makes use of Abstract CWL. Most recently, engagement between nf-core and the WorkflowHub team resulted in the creation of automatic registration and metadata parsing functions for this community’s Nextflow workflows 61 . These workflows can now be found in WorkflowHub 62 . 2.7.3 Individual workflow developers WorkflowHub supports large international science missions and consortia. However, the registry welcomes and encourages contributions from any workflow developer, regardless of their research domain, application areas, or the size of their research group. In fact, the largest Space on WorkflowHub is currently Independent Teams, which covers 287 people, 185 Teams, 180 Organisations and 279 workflows. # 3 Discussion We have presented WorkflowHub, a registry that enriches the scientific workflows ecosystem by being a hub for discovery and sharing of workflows from across multiple languages, communities, consortia, and scientific domains. The registry connects this community of users and contributors to workflow development, support, and scholarly services that support the requirement for workflows to be shared and credited, as well as the various requirements developers encounter during the workflow life cycle (see Figure 1 ). Workflows continually evolve to keep pace with changing research ques-tions, data types and practices. Similarly, it is intended for WorkflowHub to evolve in lock step with the changing requirements of both the developers and users of com-putational workflows. The roadmap for WorkflowHub over the next few years can be broadly broken down into four aspects: improving the support and resources avail-able to users of the registry, onboarding new communities and domains, contributing thought leadership for workflow best practices that directly impact aspects such as > 59 https://jupyter.org/ > 60 https://www.python.org/ > 61 https://elixiruknode.org/news/2024/workflowhub-nf-core-workflow-accessibility/ > 62 https://workflowhub.eu/workflows?filter[project]=15 19 workflow visibility and quality, and aligning to FAIR principles for computational workflows. ## 3.1 Improving support As WorkflowHub and its integrated services intend to support the workflow life cycle, and not simply the registration and publication of workflows, the registry will aim to improve processes that align to supporting this life cycle. This includes improving search and the inbuilt wizards that assist with onboarding and resource registration, but also improving overall ease-of-use and guidance. Additions and improvements for the UI will continue, adding those features that are required by end users. Particular attention will be paid to tracking workflow cloning, as well as supporting workflow collections, sub-workflows, and nested workflows. Continued work on the onboarding guidelines created for WorkflowHub will improve the quality of workflow organisation for contributing groups (i.e. big projects and consortia), in part by including guides that cover best practice instructions for the structure and content of workflow repositories as well as how to make the most impact with LifeMonitor, but also by contributing to training. To date, WorkflowHub has been included in existing training, such as the GTN Sm¨ org˚ asbords 63 and work-flow registration workshops [62]. As a next step, this effort should be extended to include WorkflowHub lessons in Software Carpentry developed by the Data Science training programme for Health and Biosciences (Ed-DASH 64 ) for Nextflow 65 and Snakemake 66 .As the number of contributors and workflow assets grow, it will be critical to also explore the use of automated mechanisms that allow scaling of support. This could include streamlined metadata annotation approaches that use Large Language Mod-els (LLMs) to populate metadata fields for review by a workflow creator, automatic reporting of issues to workflow creators and their Teams, or integration with additional added value services like APICURON [63]. Reflecting their importance to the operation of WorkflowHub, integrations with platforms and services that support the workflow life cycle will be added, improved, and updated. The RO-Crate format used by WorkflowHub will be central to this effort. Planned updates include the ability to natively configure automated synchronisation between Git repositories and WorkflowHub. Setting this up to match the expectations of the community of WorkflowHub users will be essential, and will depend on registry and community co-development, in particular for those case where work involves the repositories used by WfMS communities. In addition, the ability to “launch” workflows on additional instances that implement TRS will be explored (e.g. Seqera Platform 67 ), and benchmarking service integrations for workflow entries will be added through collaboration with OpenEBench 68 [64]. > 63 https://training.galaxyproject.org/ > 64 https://edcarp.github.io/Ed-DaSH/ > 65 https://carpentries-incubator.github.io/workflows-nextflow/ > 66 https://carpentries-incubator.github.io/snakemake-novice-bioinformatics/ > 67 https://seqera.io/platform/ > 68 https://openebench.bsc.es/ 20 3.2 Onboarding WorkflowHub has already supported the onboarding of WfMS communities, including Galaxy’s IWC and Nextflow’s nf-core. A similar integration is currently being finalised for the GTN. These community-centric engagements have resulted in the addition of hundreds of workflows to the registry and, given that WorkflowHub is now integrated with the source repositories where new workflows are developed, this number will only grow. Identifying and engaging workflow communities to onboard, in particular those that converge on specific WfMS, will be critical to increasing the number of workflows in the registry and the visibility of the workflows ecosystem overall, as well as to ensure that further requirements for WorkflowHub are being collected from a diverse set of stakeholders. Support provided to new communities may entail 1) tailored integra-tions with community repositories, facilitating semi-automated ingestion of workflows to WorkflowHub and synchronised registration of new workflow releases, and 2) sup-porting WfMS communities to make the best use of WorkflowHub, including how to implement high quality workflow repository structures and effectively adopt both standards (e.g. RO-Crate, Abstract CWL, TRS) and services (e.g. LifeMonitor). This level of support is now being explored for Snakemake workflows, and will involve close collaboration with the Snakemake community. ## 3.3 Workflow visibility and recognition WorkflowHub already has the capacity to import citation, author and contributor credit metadata from CITATION.cff files. In future, this could be extended to other popular standards, including codemeta.json [55]. Through DataCite, WorkflowHub has the means to mint DOIs, and incorporate workflows with DOIs into OpenAIRE 69 .WorkflowHub has thus actively worked to increase the visibility of workflows in a standardised and streamlined fashion. This approach is already bearing fruit, with examples of WorkflowHub formatted citations appearing in the published litera-ture [65, 66]. In addition, there are now examples of journals making computational workflows the focus of published works. A critical outcome here is to ensure that Work-flowHub becomes a recommended registry for journals and publishers. This includes providing workflow creators and users with a set of best practice recommendations for how to properly document and ultimately cite workflows in published research. A forum has already been established with multiple publishers and journals, and these continuing conversations will aim to address the challenges (i.e. citation formats, com-plexity of workflow citations, impact on publishing system, recommended practices for peer review of workflows) and opportunities (i.e. developer recognition, tracking, reproducibility) presented by formal citation of computational workflows. ## 3.4 Hand-holding for FAIR principles As frequently alluded to in other sections, WorkflowHub is strongly connected to the FAIR principles at numerous levels. One of these levels that is of particular importance for users is the way that it “holds users’ hands” during the process of sharing workflows > 69 https://www.openaire.eu/ 21 and related research artefacts. The FAIR principles are only guidelines, and even the most well-intentioned attempts to follow them can go awry in unexpected ways [67]. WorkflowHub aims to help the user to follow best practices, including following the FAIR principles, by making them convenient, but not imposing them as requirements. For example, WorkflowHub helps users make their workflows Findable – easy to find for both humans and machines – by assigning globally unique and persistent iden-tifiers for the workflows and their different versions (i.e. WorkflowHub identifiers and DOIs). WorkflowHub also guides users to describe their workflows with rich meta-data, including the identifier for the workflow, and these metadata are automatically exposed for indexing through WorkflowHub’s use of Bioschemas. WorkflowHub similarly helps users make their workflows Accessible – available to humans and machines over open protocols that provide optional access control – by providing multiple APIs over HTTPS. These APIs include the JSON-based FAIRDOM-SEEK API 70 , an RO-Crate Submission API 71 , and the TRS API. Work-flow DOIs can also be minted, which means the workflow and its metadata will be accessible even if the workflow itself is no longer available or if the workflow itself cannot be shared openly. Workflows, after being found and accessed, should ideally be Interoperable – able to be used by humans and machines as part of a wider computational ecosystem. Inter-operability often requires a lot of “plumbing” that WorkflowHub provides for users automatically through the use of open-source standards (i.e. RO-Crate as the primary data exchange format) and domain- and tool-specific integrations (i.e. Galaxy IWC and Nextflow nf-core). By guiding users through the process of inputting metadata, WorkflowHub reduces complexity and tedium, making it significantly easier to create interoperable workflows. Finally, WorkflowHub helps users ensure their workflows are Reusable – allowed to be used in part or in entirety by other humans and machines. In particular, it does this by providing opportunities to specify a clear and accessible license, qualified references to other software, and detailed provenance. WorkflowHub also collaborates closely with prominent communities in the computational workflows space (see Use Cases section) so that the registry can accommodate and incorporate domain-relevant community standards. # 4 Methods ## 4.1 Governance WorkflowHub is an ELIXIR service supported by the UK and Belgium ELIXIR Nodes 72 [68] as well as Australian BioCommons 73 [57]. The registry is part of both the ELIXIR Tools Platform 74 and Research Software Ecosystem 75 , and forms part of both > 70 https://workflowhub.eu/api > 71 https://about.workflowhub.eu/developer/ro-crate-api/ > 72 https://elixir-europe.org/ > 73 https://www.biocommons.org.au/ > 74 https://elixir-europe.org/platforms/tools > 75 https://research-software-ecosystem.github.io/ 22 EuroScienceGateway 76 and DARE-UK TRE-FX 77 . Governance is coordinated and managed by the WorkflowHub Club 78 : an inclusive community that meets biweekly online and consists of workflow developers/creators and workflow users, as well as WorkflowHub developers and product owners. The minutes for these meetings are open 79 and a GitHub organisation is used to manage documentation 80 . Club members include representatives from ELIXIR, WCI, Australian BioCommons, and more. Over 60 people are listed as contributors 81 , and any new contributors are invited to join. WorkflowHub has secured funding for sustainability through Horizon Europe projects, ELIXIR Europe and national UK funds. ## 4.2 Technical infrastructure WorkflowHub is developed openly, and largely virtually, using open software devel-opment practices, hackathons, and virtual communication channels. It has both a roadmap 82 and regular release cycle (i.e SEEK release cycle 83 . WorkflowHub requests for Team registration are supported for the American and Asia Pacific time zones by Oak Ridge National Laboratory 84 and Australian BioCommons, respectively. WorkflowHub is built on the FAIRDOM-SEEK software [69]. FAIRDOM-SEEK was originally developed as a data-management platform for the systems biology commu-nity, but has been generalised over time to support a wide variety of use cases, and now has numerous deployments across the world supporting many different communities. Development of FAIRDOM-SEEK is a collaborative activity with contributors from institutions in the UK, Germany, Belgium, Sweden and elsewhere. WorkflowHub is currently hosted on the University of Manchester’s Research IT cloud. WorkflowHub makes use of Git to store workflows as repositories. This enables the addition, modifi-cation and deletion of files for workflows that are uploaded directly to the registry by a user, as well as the ability to freeze “snapshots” of specific workflow versions. If a workflow has been added either via Git import or through submission of an RO-Crate, it then needs to be explicitly versioned and uploaded again. Curation of registered workflows is the responsibility of the workflow creators and / or submitters. # 5 Data Availability WorkflowHub will ensure data and metadata availability and access up to 2027, with plans to extend this availability further. Further availability of data and metadata from WorkflowHub falls under two categories. For workflows with minted DOIs, the archiving of data and metadata is managed via DataCite. For all workflows, and associated digital assets, WorkflowHub has an End-of-Life policy. The policy states that “If and when the WorkflowHub reaches its end of service after that (i.e. 2027), the > 76 https://esciencelab.org.uk/projects/eurosciencegateway/ > 77 https://esciencelab.org.uk/projects/tre-fx/ > 78 https://about.workflowhub.eu/project/community/ > 79 https://docs.google.com/document/d/1U2KAlbKviCu-fCX-znncKIBUIUUOeEnuRGdAg-fNd4Q/edit > 80 https://github.com/workflowhub-eu/about > 81 https://about.workflowhub.eu/project/acknowledgements/#workflowhub-club > 82 https://about.workflowhub.eu/project/roadmap/ > 83 https://github.com/seek4science/seek/ > 84 https://www.ornl.gov/ 23 published contributions and metadata will be archived as RO-Crates and made available through a public repository, such as Zenodo, Figshare or another appropriate resource at that time. DOI registrations will in this case be updated to link to the archived deposits.” 85 . A knowledge graph of registered Workflow RO-Crates as of 2024-08 is published on Zenodo [70]. # 6 Code Availability WorkflowHub code is available as part of the FAIRDOM-SEEK project 86 [71]. Acknowledgements. workflowhub.eu was founded as part of EOSC-Life (WP2 Tools Collaboratory), funded by the European Union’s Horizon 2020 pro-gramme under grant agreement H2020-INFRAEOSC-2018-2 824087. This work was further supported by funding from European Commission’s Horizon Europe programme and UK Research and Innovation (UKRI) under the UK gov-ernment’s Horizon Europe funding guarantee: EuroScienceGateway (HORIZON-INFRA-2021-EOSC-01-04 101057388, UKRI 10038963), BY-COVID (HORIZON-INFRA-2021-EMERGENCY-01 101046203), FAIR-IMPACT (HORIZON-INFRA-2021-EOSC-01-05 101057344, UKRI 10038992), BioDT (HORIZON-INFRA-2021-TECH-01-01 101057437, UKRI 10038930), PREP-IBISBA (H2020-INFRADEV-2019-2 871118), AgroServ (HORIZON-INFRA-2021-SERV-01-02 101058020, UKRI 10038927), BIOINDUSTRY 4.0 (HORIZON-INFRA-2022-TECH-01 101094287, UKRI 10048146). This work is supported by Australian BioCommons which is enabled by NCRIS via Bioplatforms Australia funding. This work is also funded by the Sar-dinian Regional Government through the XData Project, and by the LIFEMAP project (Italian Ministry of Health, POS T3). The work supported by the ELIXIR-DE node was supported by the German Federal Ministry of Education and Research BMBF grant 031 A538A de.NBI-RBC and the Ministry of Science, Research and the Arts Baden-W¨ urttemberg (MWK) within the framework of LIBIS/de.NBI Freiburg. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. The authors are grateful to Sehrish Kanwal of the University of Melbourne for feedback and suggestions about the draft. Author Contributions. Here, we follow the Contributor Role Taxonomy (CRediT) 87 . OJRG contributed through: Writing – original draft, Writing – review & editing, Project administration, Data curation, and Supervision. SRW contributed through: Writing – original draft, Writing – review & editing, and Project admin-istration. FB contributed through: Conceptualization, Data curation, Investigation, Project administration, Software, Methodology, and Resources. SSR contributed through: Conceptualization, Funding acquisition, Investigation, Supervision, and Writ-ing – review & editing. SO contributed through: Conceptualization, Investigation, Project administration, Software, and Methodology. NJ contributed through: Data > 85 https://about.workflowhub.eu/project/#retention-and-end-of-life-policy > 86 https://github.com/seek4science/seek > 87 https://credit.niso.org 24 curation and Writing – review & editing. FC contributed through: Conceptualization, Funding acquisition, Methodology, Project administration, Resources, and Supervi-sion. CG contributed through: Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Supervision, Visualization, Writing – original draft, and Writing – review & editing. All other authors contributed to the manuscript through: Writing – review & editing. Competing Interests. The authors declare no competing interests. # References [1] Ferreira da Silva, R. et al. Workflows Community Summit 2022: A Roadmap Revolution (2023). URL https://doi.org/10.5281/zenodo.7750670. [2] Amstutz, P., Mikheev, M., Crusoe, M. R., Tijani´ c, N. & Lampa, S. Existing Workflow systems. Common Workflow Language wiki. URL https://s.apache. org/existing-workflow-systems. [3] Maier, W. et al. Ready-to-use public infrastructure for global SARS-CoV-2 mon-itoring. Nature Biotechnology 39 , 1178–1179 (2021). URL https://doi.org/10. 1038/s41587-021-01069-1. [4] Freudling, W. et al. Adaptive Data Reduction Workflows for Astronomy – The ESO Data Processing System (EDPS) (2023). URL https://doi.org/10.48550/ ARXIV.2311.03822. [5] McClure, J. E. et al. in Toward Real-Time Analysis of Synchrotron Micro-Tomography Data: Accelerating Experimental Workflows with AI and HPC (eds Nichols, J. et al. ) Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI , Vol. 1315 226–239 (Springer International Publishing, Cham, 2020). URL https://doi.org/10.1007/978-3-030-63393-6 15. [6] Gil, Y. et al. Examining the Challenges of Scientific Workflows. Computer 40 ,24–32 (2007). URL https://doi.org/10.1109/MC.2007.421. [7] Cohen-Boulakia, S. et al. Scientific workflows for computational reproducibility in the life sciences: Status, challenges and opportunities. Future Generation Com-puter Systems 75 , 284–298 (2017). URL https://doi.org/10.1016/j.future.2017. 01.012. [8] Larivi` ere, D. et al. Scalable, accessible and reproducible reference genome assem-bly and evaluation in Galaxy. Nature Biotechnology 42 , 367–370 (2024). URL https://doi.org/10.1038/s41587-023-02100-3. [9] The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium et al. Pan-cancer analysis of whole genomes. Nature 578 , 82–93 (2020). URL https: //doi.org/10.1038/s41586-020-1969-6. 25 [10] Reiter, T. et al. Streamlining data-intensive biology with workflow systems. GigaScience 10 , giaa140 (2021). URL https://doi.org/10.1093/gigascience/ giaa140. [11] Wratten, L., Wilm, A. & G¨ oke, J. Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers. Nature Methods 18 , 1161–1168 (2021). URL https://doi.org/10.1038/s41592-021-01254-9. [12] Patel, R. et al. Reproducibility of the First Image of a Black Hole in the Galaxy M87 from the Event Horizon Telescope (EHT) Collaboration (2022). URL https: //doi.org/10.48550/ARXIV.2205.10267. [13] Goble, C. et al. EOSC-Life Implementation of a mechanism for publishing and sharing workflows across instances of the environment (2023). URL https://doi. org/10.5281/zenodo.7886545. [14] Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data manage-ment and stewardship. Scientific Data 3, 160018 (2016). URL https://doi.org/ 10.1038/sdata.2016.18. [15] Barker, M. et al. Introducing the FAIR Principles for research software. Scientific Data 9, 622 (2022). URL https://doi.org/10.1038/s41597-022-01710-x. [16] Huerta, E. A. et al. FAIR for AI: An interdisciplinary and international com-munity building perspective. Sci. Data 10 , 487 (2023). URL https://doi.org/10. 1038/s41597-023-02298-6. [17] Wilkinson, S. R. et al. Applying the FAIR Principles to Computational Workflows (2024). URL https://doi.org/10.48550/arXiv.2410.03490. [18] Goble, C. et al. FAIR Computational Workflows. Data Intelligence 2, 108–121 (2020). URL https://doi.org/10.1162/dint a 00033. [19] Gil, Y. From data to knowledge to discoveries: Artificial intelligence and scientific workflows. Scientific Programming 17 , 167604 (2009). URL https://doi.org/10. 3233/SPR-2009-0261. [20] Ewels, P. A. et al. The nf-core framework for community-curated bioinformatics pipelines. Nature Biotechnology 38 , 276–278 (2020). URL https://doi.org/10. 1038/s41587-020-0439-x. [21] Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nature Biotechnology 35 , 316–319 (2017). URL https://doi.org/10.1038/nbt. 3820. [22] Stevens, F. Understanding how researchers find research software for research practice. Tech. Rep., [object Object] (2022). URL https://doi.org/10.5281/ ZENODO.7340034. 26 [23] The Galaxy Community et al. The Galaxy platform for accessible, reproducible, and collaborative data analyses: 2024 update. Nucleic Acids Research gkae410 (2024). URL https://doi.org/10.1093/nar/gkae410. [24] M¨ older, F. et al. Sustainable data analysis with Snakemake. F1000Research 10 ,33 (2021). URL https://doi.org/10.12688/f1000research.29032.2. [25] Yuen, D. et al. The Dockstore: enhancing a community platform for sharing reproducible and accessible computational protocols. Nucleic Acids Research 49 ,W624–W632 (2021). URL https://doi.org/10.1093/nar/gkab346. [26] Goble, C. et al. Implementing FAIR Digital Objects in the EOSC-Life Workflow Collaboratory (2021). URL https://doi.org/10.5281/ZENODO.4605654. [27] Goble, C. et al. WorkflowHub – a FAIR registry for workflows (2022). URL https://doi.org/10.7490/F1000RESEARCH.1118984.1. [28] Courbebaisse, G. et al. Research Software Lifecycle (2023). URL https://doi. org/10.5281/ZENODO.8324828. [29] Baker, D. et al. No more business as usual: Agile and effective responses to emerg-ing pathogen threats require open data and open analytics. PLOS Pathogens 16 ,e1008643 (2020). URL https://doi.org/10.1371/journal.ppat.1008643. [30] Soiland-Reyes, S., Goble, C., Bacall, F., Gustafsson, J. & Andrade Buono, R. A guide to using WorkflowHub (2024). URL https://doi.org/10.48546/ WORKFLOWHUB.SOP.13.4. [31] Soiland-Reyes, S. The BGE guide to using WorkflowHub (2024). URL https: //doi.org/10.48546/WORKFLOWHUB.SOP.15.1. [32] Soiland-Reyes, S. BioDT Guide to using WorkflowHub (2024). URL https://doi. org/10.48546/WORKFLOWHUB.SOP.14.1. [33] Goble, C., Bacall, F. & Soiland-Reyes, S. The BY-COVID Guide to using WorkflowHub (2024). URL https://doi.org/10.48546/WORKFLOWHUB.SOP. 10.1. [34] Gray, A., Goble, C. & Jimenez, R. Nikitina, N., Song, D., Fokoue, A. & Haase, P. (eds) Bioschemas: From potato salad to protein annotation . (eds Nikitina, N., Song, D., Fokoue, A. & Haase, P.) ISWC 2017 Posters & Demonstrations and Industry Tracks , CEUR workshop proceedings (RWTH Aachen University, Germany, 2017). URL https://iswc2017.semanticweb.org/paper-579/. The 16th International Semantic Web Conference 2017, ISWC 2017 ; Conference date: 21-10-2018 Through 25-10-2018. 27 [35] Crusoe, M. R. et al. Methods included: standardizing computational reuse and portability with the Common Workflow Language. Communications of the ACM 65 , 54–63 (2022). URL https://doi.org/10.1145/3486897. [36] Amstutz, P. et al. Common Workflow Language, v1.0 (2016). URL https://doi. org/10.6084/M9.FIGSHARE.3115156.V2. [37] Ison, J. et al. EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats. Bioinformatics 29 , 1325–1332 (2013). URL https://doi.org/10.1093/bioinformatics/btt113. [38] Soiland-Reyes, S. et al. Packaging research artefacts with RO-Crate. Data Science 5, 97–138 (2022). URL https://doi.org/10.3233/DS-210053. [39] Soiland-Reyes, S. et al. EuroScienceGateway D2.1: Reproducible FAIR Digital Objects for Workflows (2024). URL https://doi.org/10.5281/zenodo.13225792. [40] De Smedt, K., Koureas, D. & Wittenburg, P. FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units. Publications 8, 21 (2020). URL https://doi.org/10.3390/publications8020021. [41] K¨ oster, J. et al. snakemake-workflows/dna-seq-varlociraptor: v5.0.2 (2023). URL https://doi.org/10.5281/zenodo.8421328. [42] Silver, L. & Syme, A. Find transcripts - TSI (2024). URL https://doi.org/https: //doi.org/10.48546/WORKFLOWHUB.WORKFLOW.877.1. [43] Ison, J. et al. The bio.tools registry of software tools and data resources for the life sciences. Genome Biology 20 , 164 (2019). URL https://doi.org/10.1186/ s13059-019-1772-6. [44] Deelman, E. & Gil, Y. Managing Large-Scale Scientific Workflows in Distributed Environments: Experiences and Challenges , 144–144 (2006). URL https://doi. org/10.1109/E-SCIENCE.2006.261077. [45] Lamothe, L. et al. An evaluation of EDAM coverage in the Tools Ecosys-tem and prototype integration of Galaxy and WorkflowHub systems. preprint, BioHackrXiv (2023). URL https://doi.org/10.37044/osf.io/79kje. [46] Rosnet, T., Gaignard, A., Devignes, M.-D. & Frikha, S. FAIR-checker (2024). URL https://github.com/IFB-ElixirFr/fair-checker. [47] del Pico, E. M., Gelpi, J. L. & Capella-Guti´ errez, S. Fairsoft - a practical implementation of fair principles for research software. bioRxiv (2022). URL https://doi.org/10.1101/2022.05.04.490563. [48] Suetake, H. et al. Sapporo: A workflow execution service that encourages the reuse of workflows in various languages in bioinformatics. F1000Research 11 , 889 28 (2022). URL https://doi.org/10.12688/f1000research.122924.1. [49] Fern´ andez, J. M., Rodr´ ıguez-Navas, L. & Capella-Guti´ errez, S. Secured and annotated execution of workflows with WfExS-backend (2022). URL https: //doi.org/10.7490/F1000RESEARCH.1119198.1. [50] Bray, S. et al. The Planemo toolkit for developing, deploying, and executing scientific data analyses in Galaxy and beyond. Genome Research 33 , 261–268 (2023). URL https://doi.org/10.1101/gr.276963.122. [51] Druskat, S. et al. Citation File Format (2021). URL https://doi.org/10.5281/ zenodo.5171937. [52] Hiltemann, S. et al. Galaxy Training: A powerful framework for teaching! PLOS Computational Biology 19 , e1010752 (2023). URL https://doi.org/10. 1371/journal.pcbi.1010752. [53] Goble, C. WorkflowHub Publishers and Journal Forum (2024). URL https: //galaxyproject.org/news/2024-08-03-workflow-publisher-forum/. [54] Garijo, D. et al. Nine best practices for research software registries and reposi-tories. PeerJ Computer Science 8, e1023 (2022). URL https://doi.org/10.7717/ peerj-cs.1023. [55] Jones, M. B. et al. CodeMeta: an exchange schema for software metadata (2024). URL https://w3id.org/codemeta/v3.0. [56] Mazzoni, C. J., Ciofi, C. & Waterhouse, R. M. Biodiversity: an atlas of European reference genomes. Nature 619 , 252–252 (2023). URL https://doi.org/10.1038/ d41586-023-02229-w. [57] Francis, R. & Christiansen, J. H. Australian BioCommons Strategic Plan 2023 -2028 (2024). URL https://doi.org/10.5281/zenodo.13626350. [58] Tejedor, E. et al. PyCOMPSs: Parallel computational workflows in Python. The International Journal of High Performance Computing Applications 31 , 66–82 (2017). URL https://doi.org/10.1177/1094342015594678. [59] De La Rosa-Trev´ ın, J. et al. Scipion: A software framework toward integration, reproducibility and validation in 3D electron microscopy. Journal of Structural Biology 195 , 93–99 (2016). URL https://doi.org/10.1016/j.jsb.2016.04.010. [60] Team, R. C. R: A Language and Environment for Statistical Computing. URL https://www.r-project.org/. [61] Van Rossum, G. & De Boer, J. Interactively testing remote servers using the python programming language. CWI quarterly 4, 283–303 (1991). 29 [62] Gustafsson, J. & Samaha, G. WORKSHOP: Make your bioinformatics workflows findable and citable (2023). URL https://doi.org/10.5281/zenodo.7787488. [63] Hatos, A., Quaglia, F., Piovesan, D. & Tosatto, S. C. E. APICURON: a database to credit and acknowledge the work of biocurators. Database 2021 , baab019 (2021). URL https://doi.org/10.1093/database/baab019. [64] Capella-Gutierrez, S. et al. Lessons Learned: Recommendations for Establishing Critical Periodic Scientific Benchmarking (2017). URL https://doi.org/10.1101/ 181677. [65] Hall, M. B. & Coin, L. J. M. Pangenome databases improve host removal and mycobacteria classification from clinical metagenomic data. GigaScience 13 ,giae010 (2024). URL https://doi.org/10.1093/gigascience/giae010. [66] Roach, M. J. et al. Hecatomb: an integrated software platform for viral metagenomics. GigaScience 13 , giae020 (2024). URL https://doi.org/10.1093/ gigascience/giae020. [67] Wilkinson, S. R. et al. F*** workflows: when parts of FAIR are missing , 507– 512 (IEEE, Salt Lake City, UT, USA, 2022). URL https://doi.org/10.1109/ eScience55777.2022.00090. [68] ELIXIR. ELIXIR Annual Report 2023 (2024). URL https://doi.org/10.7490/ F1000RESEARCH.1119751.1. [69] Wolstencroft, K. et al. FAIRDOMHub: a repository and collaboration envi-ronment for sharing systems biology research. Nucleic Acids Research 45 ,D404–D407 (2017). URL https://doi.org/10.1093/nar/gkw1032. [70] Hambley, A., Chadwick, E., Woolland, O., Soiland-Reyes, S. & Savchenko, V. WorkflowHub Knowledge Graph (2024). URL https://doi.org/10.5281/zenodo. 13362051. [71] Owen, S. et al. seek4science/seek: FAIRDOM-SEEK v1.15.0 (2024). URL https: //doi.org/10.5281/zenodo.11209855. 30