Publication Date: 2019-02-11
Approval Date: 2018-12-13
Submission Date: 2018-10-31
Reference number of this document: OGC 18-085
Reference URL for this document: http://www.opengis.net/doc/PER/t14-D026
Category: Public Engineering Report
Editor: Sam Meek
Title: OGC Testbed-14: BPMN Workflow Engineering Report
Copyright (c) 2019 Open Geospatial Consortium. To obtain additional rights of use, visit http://www.opengeospatial.org/
This document is not an OGC Standard. This document is an OGC Public Engineering Report created as a deliverable in an OGC Interoperability Initiative and is not an official position of the OGC membership. It is distributed for review and comment. It is subject to change without notice and may not be referred to as an OGC Standard. Further, any OGC Engineering Report should not be referenced as required or mandatory technology in procurements. However, the discussions in this document could very well lead to the definition of an OGC Standard.
Permission is hereby granted by the Open Geospatial Consortium, ("Licensor"), free of charge and subject to the terms set forth below, to any person obtaining a copy of this Intellectual Property and any associated documentation, to deal in the Intellectual Property without restriction (except as set forth below), including without limitation the rights to implement, use, copy, modify, merge, publish, distribute, and/or sublicense copies of the Intellectual Property, and to permit persons to whom the Intellectual Property is furnished to do so, provided that all copyright notices on the intellectual property are retained intact and that each person to whom the Intellectual Property is furnished agrees to the terms of this Agreement.
If you modify the Intellectual Property, all copies of the modified Intellectual Property must include, in addition to the above copyright notice, a notice that the Intellectual Property includes modifications that have not been approved or adopted by LICENSOR.
THIS LICENSE IS A COPYRIGHT LICENSE ONLY, AND DOES NOT CONVEY ANY RIGHTS UNDER ANY PATENTS THAT MAY BE IN FORCE ANYWHERE IN THE WORLD. THE INTELLECTUAL PROPERTY IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE DO NOT WARRANT THAT THE FUNCTIONS CONTAINED IN THE INTELLECTUAL PROPERTY WILL MEET YOUR REQUIREMENTS OR THAT THE OPERATION OF THE INTELLECTUAL PROPERTY WILL BE UNINTERRUPTED OR ERROR FREE. ANY USE OF THE INTELLECTUAL PROPERTY SHALL BE MADE ENTIRELY AT THE USER’S OWN RISK. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR ANY CONTRIBUTOR OF INTELLECTUAL PROPERTY RIGHTS TO THE INTELLECTUAL PROPERTY BE LIABLE FOR ANY CLAIM, OR ANY DIRECT, SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM ANY ALLEGED INFRINGEMENT OR ANY LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR UNDER ANY OTHER LEGAL THEORY, ARISING OUT OF OR IN CONNECTION WITH THE IMPLEMENTATION, USE, COMMERCIALIZATION OR PERFORMANCE OF THIS INTELLECTUAL PROPERTY.
This license is effective until terminated. You may terminate it at any time by destroying the Intellectual Property together with all copies in any form. The license will also terminate if you fail to comply with any term or condition of this Agreement. Except as provided in the following sentence, no such termination of this license shall require the termination of any third party end-user sublicense to the Intellectual Property which is in force as of the date of notice of such termination. In addition, should the Intellectual Property, or the operation of the Intellectual Property, infringe, or in LICENSOR’s sole opinion be likely to infringe, any patent, copyright, trademark or other right of a third party, you agree that LICENSOR, in its sole discretion, may terminate this license without any compensation or liability to you, your licensees or any other party. You agree upon termination of any kind to destroy or cause to be destroyed the Intellectual Property together with all copies in any form, whether held by you or by any third party.
Except as contained in this notice, the name of LICENSOR or of any other holder of a copyright in all or part of the Intellectual Property shall not be used in advertising or otherwise to promote the sale, use or other dealings in this Intellectual Property without prior written authorization of LICENSOR or such copyright holder. LICENSOR is and shall at all times be the sole entity that may authorize you or any third party to use certification marks, trademarks or other special designations to indicate compliance with any LICENSOR standards or specifications.
This Agreement is governed by the laws of the Commonwealth of Massachusetts. The application to this Agreement of the United Nations Convention on Contracts for the International Sale of Goods is hereby expressly excluded. In the event any provision of this Agreement shall be deemed unenforceable, void or invalid, such provision shall be modified so as to make it valid and enforceable, and as so modified the entire Agreement shall remain in full force and effect. No decision, action or inaction by LICENSOR shall be construed to be a waiver of any rights or remedies available to it.
None of the Intellectual Property or underlying information or technology may be downloaded or otherwise exported or reexported in violation of U.S. export laws and regulations. In addition, you are responsible for complying with any local laws in your jurisdiction which may impact your right to import, export or use the Intellectual Property, and you represent that you have complied with any regulations or registration procedures required by applicable law to make this license enforceable.
- 1. Summary
- 2. References
- 3. Terms and definitions
- 4. Overview
- 5. Introduction
- 6. Review of motivating work
- 7. BPMN 2.0 discussion
- 8. OGC Service Orchestration with BPMN 2.0 Best Practices
- 9. Description of demonstrator implementation using jBPM
- 9.1. Workflow engine Helper classes
- 9.2. Security implications
- 9.3. Component design
- 9.4. Helper class design
- 9.5. Remote execution of BPMN documents
- 9.6. Architecture
- 9.7. Scenario
- 9.8. TIE Results
- 9.9. Shortcomings of the approach
- 10. Docker, Kubernetes and Cloud Foundry
- 11. Conclusion
- Appendix A: XML Schema Documents
- Appendix B: Revision History
- Appendix C: Bibliography
This Engineering Report (ER) presents the results of the D146 Business Process Modeling Notation (BPMN) Engine work item and provides a study covering technologies including Docker, Kubernetes and Cloud Foundry for Developer Operations (DevOps) processes and deployment orchestration. The document also provides the beginning of a best practices effort to assist implementers wishing to orchestrate OGC services using BPMN workflow engines. As with previous investigations into workflow engines, the implementation described within utilizes a helper class, which is a bespoke implementation of some of the best practices. Work in future testbeds on workflows should include a compelling use case to demonstrate the power of service orchestration.
Workflows have long been a topic of interest for the OGC and have a corresponding Workflows DWG. Previous testbeds have focused on Business Process Execution Language (BPEL) that gained some traction, but has a set of fundamental problems to wide-spread adoption. BPMN is a language that addresses many of the issues associated with BPEL. The International Organization for Standardization (ISO) has approved BPMN as ISO/IEC 19510:2013 standard. The Testbed-13 Workflows ER described a rudimentary implementation a BPMN engine that sought to enable execution of remotely authored BPMN documents via a transactional web processing service. The test was successful, however, it was noted that there is no best practices for orchestrating OGC services using BPMN as the orchestration language. This ER provides a description of the BPMN best practices and corresponding implementation including issues with the approach and identified gaps for future research.
Prior to execution of this testbed, there were several areas for investigation for workflows based activities within the OGC, this testbed has answered these questions as well as producing a set of best practices for future workflows applications.
Throughout this piece of work, several recommendations for future work have been identified. Testbed-14 sought to, among other things, generate a set of best practices for orchestrating OGC services using BPMN. However, there are several outstanding work items that need to be addressed, potentially in future testbeds.
Checking for likelihood of workflow completion success prior to execution. Currently, workflows are executed blind, i.e. the user has no indication of whether the workflow will successfully execute or fail at some point. This is particularly important for long running processes, where the workflow could fail after several hours. The supporting documentation outlined in the Testbed-13 workflows report is a suitable starting point for this endeavor and potentially use a semantic registry.
Managing the BPMN to Web Processing Service (WPS) process transformation. The BPMN document for this Testbed was supplied to the client implementer as a template for population. However, there is not currently a way within OGC to map WPS to BPMN. It maybe the case that utilization of BPMN in WPS should be tightly coupled to the WPS 3.0 standard. As a start, the related OGC Testbed-14: WPS-T Engineering Report (OGC 18-036) describes a transactional extension for WPS 2.0 and recommendations for a process deployment profile for BPMN
Explore more sophisticated security encoding options. In this testbed, security was handled by inserting the OAuth2 token into the BPMN document, which is translated into the HTTP header when WPS are executed. Future testbeds should explore access to different federations using different security models as well as finding methods to obfuscate security tokens. Some of the considerations are discussed in the related OGC Testbed-14: Federated Clouds Engineering Report (OGC 18-090r1). Additionally, security tokens have an expiry time. If a process is long running, then tokens may expire and cause the workflow to fail. A pre-testing or authentication process could allow a session to be initiated and refreshed if necessary whilst the workflow is running. A suggested solution to this problem is to use HTTP headers with multiple scopes.
Understand how to enforce removal of secured resource access to unauthorized services. It is currently possible to extract resources from a secured service and pass them on to an unsecured service, methods for controlling this should be explored.
Error catching in BPMN. The mechanisms of executing workflows are very complex by nature. If an error is thrown by a service, then it can be difficult to trace the source and report the error to a user in a recognizable fashion. Error catching and reporting specific to OGC services should be passed through the workflow engine and reported to the user in a human readable fashion.
Create a suitably complex, relevant and motivating use case for future Testbeds. In Testbed-14 and with Testbed-13, the use cases for workflows are simplistic and have focused on the mechanics of getting software up and running. Future testbeds should have a sufficient number and range of services to orchestrate to utilize the BPMN language as well as an end goal to take advantage of aspects such as parallel processing, swimlanes, decision gates and compensation events.
All questions regarding this document should be directed to the editor or the contributors:
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights.
Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the standard set forth in this document, and to provide supporting documentation.
The following normative documents are referenced in this document.
For the purposes of this report, the definitions specified in Clause 4 of the OWS Common Implementation Standard OGC 06-121r9 shall apply. In addition, the following terms and definitions apply.
automation of a process, in whole or part, during which electronic documents, information or tasks are passed from one participant to another for action, according to a set of procedural rules (source: ISO 12651-2:2014(en)).
software service or “engine” that provides the run time execution environment for a process instance (source: ISO 12651-2:2014(en)).
Section 5 introduces the ER and the corresponding component task within the NextGen Thread in the OGC Testbed-14.
Section 6 provides a review of motivating work for producing this iteration of the BPMN workflow engine and the rationale for the Docker, Kubernetes and Cloud Foundry study at the end of this document.
Section 7 provides an overview and discussion of the BPMN 2.0 standard, the language used to express workflows in the BPMN workflow engine component.
Section 8 outlines best practices for orchestrating OGC services using BPMN.
Section 9 describes the component implementation for the D148 BPMN Engine.
Section 10 consists of the Docker, Kubernetes and Cloud Foundry study required as part of this ER.
Section 11 concludes the document.
This OGC Engineering Report (ER) describes the work carried out in the Testbed-14 initiative for utilizing BPMN workflows and provides the results of a study into technologies including Cloud Foundry, Docker and Kubernetes. The foundational work for this document is the Testbed-13 Workflows ER, which describes many of the grounding concepts and the state-of-the-art in OGC compliant workflows. This document is in turn based upon the practices outlined in , motivating document for using Business Process Modeling and Notation 2 (BPMN2, styled as BPMN) within OGC, and  that provides a preliminary implementation. Additionally, there are explorations of workflows using technologies such as BPEL in OGC, notably from Testbed 8.
Testbed-13 provided many successes for the authors of the ER and design and implementation of the system described within. The Testbed-13 Workflows concept demonstrator enabled execution of workflows using BPMN and a Transactional Web Processing Service (WPS-T), however, the demonstrator required a helper class, which is bespoke to the Camunda implementation of BPMN. The workflows work in Testbed-14 proposes an OGC Best Practices methodology for executing workflows and presents a demonstrator implementation of these best practices in the form of a similar helper class. Due to the available services for this testbed, the best practices are presented as implementation recommendations, not utilization of the BPMN language for orchestrating OGC services, which should be explored with a suitable use case in future testbeds. Additionally, it is the objective of this ER to identify solutions to some of the shortcomings of the Testbed-13 approach to workflows and to describe solutions.
The main issues cited with the BPMN helper class in Camunda was the utilization of data inputs and outputs, as each implementation handles this aspect differently. Additionally, BPMN provides standardized methods for managing data both in flux and at rest that were not utilized within Testbed-13 work. BPMN concepts such as Service Tasks also provide a mechanism to input data via direct input or reference, however there is also the concept of data objects, which seek to model data inputs and outputs in a formal way. This document aims to address ambiguity and contradictions between the standard and available software whilst providing guidance to implementers on OGC best practice. This document reports on a demonstrator implementation of the stated draft OGC Best Practice and the results of a study on Cloud Foundry, Kubernetes and Dockers with their applicability to the OGC and the wider geospatial domain.
This section contains an overview of the existing OGC work in workflows. It is noted that work has also been done outside the OGC that has either been reviewed in the publications addressed here, or is deemed out of scope for this ER.
There are two publications from OGC, notably Testbed-13, that are considered in scope for motivating work for this ER:
Security continues to play an important role in the development of OGC standards, practice and doctrine. Implementation of security is also a requirement for all work done in the Next Generation Services thread in Testbed-14. Therefore, encoding security into a workflow request is a requirement for this work.
The three motivating use cases outlined in the Testbed-13 Security ER are:
Use Case 1: Dominating Privileges - the situation where the user has more privileges than the computer system being used. Access attempted in this way will result in a security violation.
Use Case 2: Tunneling Proxies - passing of credentials is likely to be interrupted as a proxy constitutes a new connection to the resources.
Use Case 3: Identity Mediation - mediation between different security models is required in a chained workflow, as it is likely different services will use different authentication procedures and providers.
This section briefly discusses the considerations from the ER that are relevant to the work presented here.
As part of the Testbed-13 Security ER, an included work item was how to enable WPS to be authenticated using OAuth tokens. Following on from this, Testbed-14 includes security to authenticate workflows in all scenarios.
Figure 1 outlines the abstract OAuth flow including the resource owner, the authorization service, and the resource server. In the flow, the client requests access to the resource via the resource owner. The owner grants authorization to the client who then requests a token from the authorization server using the grant from the resource owner. The authorization server then provides the client with an access token that is sent to the resource server to gain access to the resource.
A key aspect from this ER is the Authorization Code Grant Flow, which is the complete flow described in Figure 1. In this scenario, credentials are granted prior to resource access requests, therefore removing the requirement to register the application. Authorization can be done using a private authorization server or, in the case of Testbed-13, it is the Auth0 service (https://auth0.com). One of the motivating requirements for this ER is to explore the use of code grant flow in workflows. This requirement will be addressed later in the document.
The Workflows ER provided an account of the work done in Testbed-13 workflows and included information on workflow construction and execution, an overview of workflow engines, transactional web processing services for accepting BPMN workflow documents, data quality services, conflation services and clients for constructing an executing the workflows. The scenario from Testbed-13 is shown in Figure 2 where the workflow client aspect stored existing workflows in a Catalog Service for the Web (CSW) and submitted the workflow to the workflow engine for execution. The workflow engine then orchestrated and executed the three WPS services to produce a result. The Web Feature Service (WFS) is utilized as a data repository within the workflow, but not formally by the workflow engine as the data is passed to the first WPS in the chain through configuration of WPS parameters.
The workflow engine has no knowledge of the concept of data objects, either in the formal BPMN definition or informally as part of service tasks, as data endpoints are passed as string references via a Service Task parameter. Although this method is acceptable in terms of the BPMN standard, data objects should be modeled formally within a BPMN diagram where the implementation supports it. This represents one of the challenges addressed in the ER, as BPMN has a well-defined set of methods for handling data objects whether at rest or in flux. Understanding how to de-couple data objects from processing capability so that they can be reused, if necessary, within a workflow is also an objective of this ER and is discussed in detail later in the document. One of the issues mentioned in Testbed-13 was the encoding and decoding of data being a hurdle to interoperability as BPMN workflow engines tend to handle the encoding and decoding procedures differently. The best practices outlined in this ER accept this as being part of an implementation standard and does not seek to mandate any specific encoding or decoding procedure, but instead rely on the capabilities of the orchestrated services to deal with data inputs and outputs. The workflow engine should remain neutral in its data format requirements and be used to orchestrate existing capability rather than introducing new processing capability. Alternatively, if orchestrated services do not contain a common format, then the workflow engine may call upon a partner WPS to perform the translation.
The scenario outlined in the Testbed-13 Workflows ER consists of two use cases and corresponding user groups. These are:
The expert/workflow configurator.
The user/workflow executor.
These distinctions are not necessarily mutually exclusive, as it is envisaged that users will be able to construct or at least tweak existing workflows to their requirements. Likewise, the expert may be required to execute workflows that either they or another expert has created. In the scenario, the expert creates and configures the workflow via querying a CSW for processes that fit their needs. They then compose the workflow and configure all of the parameters including the variables and end points. They then send the composed document to the WPS-T, which calls the workflow engine for execution. The cataloged processes are protected by an OAuth Security Service. Once authorization takes place, then the workflow engine WPS harvests the processes and data and the workflow is constructed (Figure 3).
A notable issue with this use case is that there is no workflow execution and no testing whether the workflow is valid prior to making it available in the catalog. Therefore, the expert (or user for that matter) does not know whether the workflow will execute in the given scenario, and also does not know the scope that the workflow will operate within. For example, if the user (non-expert) retrieves the workflow, executes and then receives an error, then this will likely prove problematic for issue resolution. A method to address this is to use the getErrors() call within the WPS to return any errors to the user. However, the mechanism for returning errors to the WPS is not yet determined and will likely be represented by well-defined error catching mechanisms from within the BPMN standard.
Decoupling error reporting and handling from the WPS is an approach that fully utilizes the BPMN standard whilst ensuring an implementation is service agnostic for maximum flexibility.
The capabilities of the workflow engine have been somewhat enhanced in the Testbed-14 version of the engine as it no longer requires the user to setup a workflow and then execute it with a set of parameters. Instead the BPMN document can be submitted and executed by the workflow engine regardless of whether the workflow engine has prior knowledge of what is to be executed. This added functionality is due to the choice of base implementation used rather than anything to do with the BPMN language. This has implications for security, as a completely unsecured workflow engine service would in theory be susceptible to misuse; as it could be instructed to execute anything a client sent it. This issue is resolved in the demonstrator by only allowing execution of a single workflow type and variable passing within a well-defined scope.
The Testbed-13 workflow engine was based on Camunda fronted by a WPS-T to enable access to the standardized OGC calls as well as a new insertProcess call that enables a process to be inserted into the WPS using the HTTP POST operation. A diagram of the workflow component setup can be found in Figure 4.
Fronting a workflow engine with a WPS enables standardized calls to be used to execute workflows. However, it should be noted that it introduces a level of verbosity that is not found when using the raw workflow engines, many of which are already fronted by a Representational State Transfer (REST) Application Programming Interface (API). Additionally, the construction of a helper class is required for workflow engines to understand the different data models and types relevant to geoprocessing. In the Camunda example, the helper class parsed data by utilizing a GeoTools internal construct called a FeatureCollection. This concept of a collection of features is prevalent throughout geographic data endpoints for vector data, however, coverages are delivered in a number of file formats that should be considered when defining best practices for OGC workflows. The jBPM BPMN workflow engine uses a combination of the Git protocol and a REST API to submit and execute workflows. The detail of this is discussed in a later section.
There are several recommendations that were made at the culmination of the ER, many of which have made it into requirements for the work described in this ER. Briefly, the relevant workflow recommendations are described below with the responses to the recommendations including actions taken:
Investigate how client functionality to communicate with OGC web services can be made usable by different workflow engines. This recommendation is the cornerstone to the efforts made in this ER, understanding how BPMN as a language relates to OGC services with a focus on data models will likely solve re-usability and verbosity issues in workflows whilst enabling interoperability between workflow engines beyond demonstrators in Testbed-13 and 14. This is initiated by returning to the language, making recommendations in the form of best practices and then demonstrating these best practices in an implementation. The solution to this requirement outlined here is to submit the BPMN document wholesale and then implement functionality via the helper class to register work items and produce the result. Note that the beginning of a BPMN language mapping exercise is started, but was not tested due to the restricted WPS services made available.
Investigate a common approach to handle inputs and outputs defined in BPMN. The data modeling aspect of interoperability will help towards realizing the first recommendation in this list. BPMN essentially has two data concepts, the first is a Data Object, which models data in flux, and a Data Store, which models data at rest. OGC services have a subtle, but important distinction in that data are supplied as services. Inputs to WPS and other services are provided as parameters, rather than existing as standalone objects. In truth, BPMN allows for references to data to be passed as a parameter. However, data objects should be utilized to model the flow of data between services. Reconciliation between these two concepts is described later in the document. It should be noted that OGC architectures do not really account for data objects as such, instead it deals with services and more recently, resources. These concepts are covered by the BPMN service task and data objects should not be used to refer to external resources, but they can be used optionally to orchestrate processes internally.
Investigate process discovery mechanisms. Process discovery in Testbed-13 was done in a catalog, however the processes were presented to the client via an identifier, which is suitable for a user who understands what all of the processes do (an extreme case of the expert user use case in the Testbed-13 Workflows document), but potentially problematic for other users. It has been suggested that the user is able to discover processes from the catalog using metadata, however a profile for WPS processes has not been agreed upon. There are ISO standards that are likely to fit the requirements including ISO 19119 - Geographic Information - Services, however further discussion is required. Process discovery in Testbed-14 is done through the CSW that stores the registered WPS processes, it is also populated with BPMN workflows described as WPS processes. Therefore, discovery should be completed in the recognized OGC way, through a CSW.
Investigate OAuth Code Grant Flow for dynamic authorization. OAuth for authorization has been gaining traction within the OGC, along with SAML and some other technologies. This recommendation largely follows some of the work completed in the COBWEB project where the concept of a session in terms of security was discussed . Essentially, a session can exist at three different levels within a workflow:
The process level.
The workflow execution level.
The workflow composition level.
These three levels offer different security grains in ascending order. Security at the process level is a useful model if processes require different levels of access to execute. In the Testbed-13 setup, it was prudent to deny access at the workflow composition and orchestration phase, rather than upon execution. The workflow execution level means that the user authenticates as soon as the workflow is executed and the session exists for the duration of the workflow execution. This concept notably lends itself to synchronous rather than asynchronous processes. The workflow composition level means that the user authenticates in a federated fashion, i.e. once they are authenticated, then they have access to everything. This scenario appears to be most suited to the motivating use cases for this ER.
For Testbed-14, it was decided that the workflow engine would simply pass the security information in the WPS execute requests generated during workflow execution. Although this is a simplified use case, it shows the beginnings of truly secured workflows with options for resources sitting in different federations that should be explored in a future Testbed. In terms of the levels of security, this is an example of authentication at the process level.
Investigate security encoding aspects in BPMN.
Encoding security in a BPMN document is likely to require a best practices approach, as the BPMN standard does not mention security in any meaningful way beyond a normative reference. However, the standard is flexible enough to be able to support security, even if it is carried as a parameter in a service task. It is possible to extend the standard to enable further execution semantics, however the support for extensions in implementations is unknown and a strategy for implementing security in an extension is somewhat of an undertaking. In this ER, authorization tokens are passed along with the http header information in an execute request generated by the workflow engine. In this instance, the BPMN engine is acting as a pass through for security tokens as authentication to processes and data will have been done prior to execution via request of a token. A more thorough examination of security models will likely be the work of future Testbeds and will span more than just a piece on workflows.
Documents outside the Testbed initiative have acted as motivating work for this ER and ERs from previous Testbeds. The two documents to be considered are:
These two documents created the foundation for bringing BPMN into the OGC as a language for expressing service orchestration. The use case for OGC service orchestration has always focused on the data quality domain, this is potentially due to the requirement for configured, finely grained services to establish data fitness for purpose. Although a suitable use case for orchestration, there are several points highlighted in the documentation that are considered:
Configuring processes is time consuming and complicated and generally only undertaken by an expert.
Workflow orchestration and execution rely on external services that the engine may not be able to validate remotely, therefore errors within the workflow engine might be uncontrollable and cause the workflow engine to fail.
Security was not considered within these documents either in terms of the workflow engine security beyond basic authentication or in terms of secured services.
The output mainly consisted of metadata, which meant that the input data was not changed during the workflow. Ramifications of altering the data during the workflow are discussed later in this document.
Previous testbeds have attempted workflows using Business Process Execution Language (BPEL), which provided some success, but suffered with the following fundamental issues:
Lack of a standardized graphical interface.
Issues with execution due to lack of correct WSDL bindings.
Complexity and skills required to execute.
A general lack of uptake.
Therefore, BPMN was put forward as an alternative as it addresses the stated issues and has full interoperability with BPEL as conversion can be done losslessly. There have also been other suggestions to enable orchestration of services within OGC, notably to return to the WPS standard and make orchestration and chaining fundamental to the standard, rather than optional.
Testbed-13 made recommendations that have in turn been reiterated as requirements in Testbed-14, however there are some notable omissions from this set that are discussed in this section. These suggestions and observations are generated from a combination of prior experience from projects such as Citizen Observatory Web (COBWEB), LandSense and the other noted documents in this space.
When a user constructs a workflow for execution they currently do not know if the workflow will run or fail. This problem is exacerbated when there are workflow processes with dependencies on inputs and outputs. There are in general two types of processes that should be considered here:
Processes that do not change the input data - unordered. These processes are likely to consist of information gathering algorithms (for example, processes that generated metadata, as seen in the data quality WPS in Testbeds-12 and 13). Here the input data does not change in the process, and all processes following a process of this type do not need to consider the order that processes are executed, i.e. the processes can be executed in any order to produce the same result.
Processes that do change the input data - ordered. In this scenario, the order that processes are executed in will likely provide differing results, null results or an error. For example, there are two processes:
process 1 - filters a dataset according to input variables
process 2 - selects data within a distance of a set of input points
If a workflow were constructed where process 1 is followed by process 2, this could potentially produce a resultant data set than if process 1 followed process 2. A likely occurrence is that if the filtering or selection criteria are two strict then the first process in the chain would produce a null result. This would then result in an error message and workflow failure.
Managing the ordering and production of results for processes is an on-going issue that is likely to remain unsolved in this Testbed as it is out of scope. However, solutions should be considered to mitigate, be it through sampling, some sort of semantic validity of workflows and/or suitable error messaging to make the user aware of process failures and a reason for the failure. Construction of some OGC specific error messaging should be considered in future Testbeds.
As mentioned previously, the data modeling aspects of this ER are key to the motivating requirements and the best practices endeavor. BPMN has the concept of data objects and data stores. The former is for modeling data in flux and the latter is for modeling data at rest. OGC services and in particular; WPS take data as parameters either by reference or as a raw type. When orchestrating OGC services, ideally this should be de-coupled and data objects passed by reference as a BPMN data object or store; rather than as an input parameter to a WPS process. The data object as a reference could then be mapped to a WPS process parameter therefore making the data object reusable throughout the workflow. Additionally, this method of mapping data objects enables users to graphically model data flows as well as process flows within the BPMN workflow diagram. Generally, the BPMN constructs for data modeling are suited to internal orchestration of data and references to data, rather than providing methods to access external data. BPMN also does not have a formal method for querying internal process variables post process completion, i.e. workflows are self-contained, therefore, getting data in and out of workflow is defined by the services that form part of the orchestration effort.
The implementation of demonstrators supporting the workflow engine work items have had to implement a helper class. In this implementation, the helper class is simplified to act as a WPS client, thus offloading the capabilities required for executing OGC services to resources that already have it.
The issue with geospatial data structures is experienced when workflows are executed synchronously, as the data are retrieved from an executed process and then passed to the next process. This could potentially be mitigated through passing of links in an asynchronous fashion. However, this would restrict the functionality of the system as it would make demands upon it that are not OGC compliant. The data inputs and outputs point relates back to the concept of data objects, which has been covered in previous sections. This helper class by default passes the results as a reference from one process to another, thus removing the requirement for the workflow engine to constantly be passing data back and forth from the engine to the next service in the chain. This approach does have a drawback in that every service in the chain is acting as both a client and a server. However, this is preferable to putting data load onto the workflow engine as it could easily form a bottleneck for processing and data.
A consideration for future work in this area is to have a system that tests for data compatibility within the workflow to make a decision on the approach to take, that is, passing by reference or passing by value. This would ensure full data compatibility throughout the workflow while making the most efficient use of computational resources.
BPMN 2.0 is an Object Management Group (OMG) standard ratified in 2011 and designed primarily to model business processes through a graphical interface. In addition to graphical modeling, it is also able to execute services and model human interjection into an automated process. It also contains typical systems type operations such as error catching and messaging and multiple exit paths from a system. Due to its flexibility and simple user interface, it can be utilized by both novices and experts to communicate a workflow.
As BPMN is a normative standard, it builds upon rules to ensure harmonization across implementations. This section does not recreate the normative documentation, it instead reviews some of the aspects that are of consequence to providing an OGC best practices effort that can be implemented across BPMN workflow engines via the aforementioned helper class.
BPMN contains the concept of Activities, these are essentially how the standard models work being undertaken within the workflow either in an automated fashion, or by manual intervention by a user. There are several Tasks within the Activities grouping that have relevance to orchestrating OGC services, these are:
BPMN describes Service Tasks as those done by software. Essentially the user configures a service task with parameters required by the external service which is then executed as part of the workflow on behalf of the user. A key aspect of a service task is that it is synchronous, therefore the workflow will wait for a result before continuing. This can of course cause problems if a web service times out or is non-responsive. This can be mitigated using an Event which is described in a following section.
From an OGC perspective, Service tasks are best used for orchestrating OGC web services that are synchronous, i.e. the workflow engine configures a process, executes the process and then passes the result onto the next process. This requires the workflow engine to understand spatial data structures and concepts, this is expressed in the helper class for the software package being used.
Service tasks should be used for processing rather than internal data management as these are best described in data objects and data stores which are discussed later and used for internal workflow data management. Therefore, OGC services such as WPS and Web Coverage Processing Service (WCPS) are best represented using Service tasks. Note that service tasks are different from Script tasks that execute a simple script upon execution. Script task could feasibly be used to interact with OGC services. However, the script task is designed as more of a catch-all to execute a script within the workflow rather than to interact with well-defined services.
Send tasks are used to send messages between processes and swim lanes. These are generally used to initiate another part of the workflow via sending a message to do so. This type of task does not have the ability to await a response from the service task and is completed as soon as the message is sent. A typical use case for a send task is to initiate a branch of the workflow that is separate from the primary swim lane. As with the data modeling aspects of BPMN, send tasks refer to internal workflow orchestration rather than messaging external services that are orchestrated using the service task.
Receive tasks are the Send task counter-part and will await a message sent (usually from a send task) before initiating. This task is usually used at the beginning of a swim lane to initiate tasks after instantiation. The task is considered complete as soon as the message is received. As with send tasks, these are designed for internal orchestration of workflows.
BPMN has the ability to model multiple instances of a task in a workflow as well as using loops for iterative execution of a task. These can be used as normal in OGC services, as the two examples are simply executing instances of a process with a given pattern.
Swim lanes and Pools are ways of organizing a BPMN diagram to describe collaboration between entities. In the case of OGC services, swim lanes can be used to model the location of different processing entities. For example, should the workflow engine have access to different WPS instances, then each of these instances could be represented by a Swim lane to model result passing between the different services.
There are several start events that can be used to orchestrate OGC services. Start events define how a process is initiated. It might be via a manual process (i.e. someone calling the workflow from a REST endpoint), or it might be via a message received from another process or third-party application. For the most part, the standard BPMN guidance documentation is sufficient. However, there are some specific aspects of BPMN where extra notation is necessary.
BPMN does not differentiate conceptually between events that can start and end a process. For example, it is possible to both start and end a workflow using a message event or any other type of event for that matter. In addition to start and end events, there are also intermediate events that may or may not interrupt a workflow according to certain conditions.
Message start events are those that initiate a workflow based upon some external event. A typical example of this is that the workflow is called by an external process. This is in contrast to a standard start event where the workflow is called by a client interacting with the API interface of the workflow engine. Additionally, the Message event start does not discern between users or calling systems, it simply executes having received the message. In contrast to this is the signal event, where the workflow is awaiting a specific signal to initiate. This is likely to be a specific client or execution pattern rather than the general one that messaging is suited to. Signal events also have more flexibility for communicating aspects other than simple error messages, as is expected with the Message event. Likewise, an end event will act in the same way by throwing either a signal or message to a chosen service to end the workflow. Note that an end event is different than the email event, which is specifically setup to email results to a configured account.
Intermediate events for messages and signals are triggered in much the same way as the start and end events, however they are triggered before the workflow has completed.
Signaling functions are available from outside the workflow during execution and are often used to hold the workflow until an external task has been completed. This has crossovers with the Human Task aspect of BPMN, which relies on an external actor to provide information to allow the workflow to continue.
Error events capture and report on errors based upon a condition. For example, a workflow error might be that a process has failed for some reason, be it a coding error or unexpected behavior experienced due to the input parameters. Error messaging is a catch-all in terms of BPMN, but should be used to communicate the errors from processes or data objects. Typical uses of an Error event include; mis-configuration of processes, reporting a lack of authentication for services and data type conflicts. Error catching should be used to report back to the workflow executor in human readable language. An example of where this is useful is to watch for a null result from an external process and report back accordingly, rather than simply throwing a language specific error.
It is difficult to predict how workflows will behave with different sets of input data and parametrization of processes. For example, a workflow that executes for one set of data may fail for another set. This could be due to an Error in the typical sense (i.e. there is something wrong with one of the processes) or it could be that there is no error and the processes executed correctly but it failed for another, data dependent reason.
Considering a workflow that has two dependent processes expressed as Service tasks, with the second process configured to use the outputs from the first process. If the first process produces an output that contains zero or a null result, then it is likely that the second process would fail. Although the workflow may fail at this point, this should not be considered an error as everything in the workflow acted exactly as it should. A compensation event tells the workflow how to behave in this eventuality; it might execute a message to the user informing them of the reason for the event being triggered, alternatively it may provide an alternative execution path for results that conform to the compensation requirements.
Orchestrating services that include compensation is prudent when the workflow is complex with many branches. It may also be considered an extension of the error event instead of reporting an error, it activates an alternative path to compensate for the error.
The data modeling aspect of BPMN is key to creating best practices as it is one of the aspects that differ between implementations. As alluded to in the review sections of this document, providing a suitable method of modeling data from OGC services will increase the interoperability of workflow engines, remove verbosity from business processes, and enable re-usability of data end points within workflows. BPMN has four ways of describing data with a workflow:
Data objects describe data requirements for activities to be performed. The data object element can represent a singular object or a collection of objects. The life-cycle of the data object is implicitly tied to the containing process or sub-process. In fact, a data object in its raw form cannot exist without a containing process or sub-process. Data object references are related to data objects in that they enable reference to the same data object across a workflow. The advantage with data object references is that they are state-aware across the diagram where as data objects do not have a state and following the life cycle of the containing process. Data objects also have a class like hierarchy of access, much like variable scope. A process containing sub-processes may all have data objects and references. However, their accessibility is dependent on their location in the process/sub-process hierarchy. Figure 5 and 6 show the BPMN representation for a data object and data object collection.
A data store is used when data will persist beyond the life cycle of the containing process. Data stores also have the concept of a reference. However, the reference to the data store is simply a convenient way to access the data store without replicating the icon as the reference, for all intents and purposes is the same as the store. It is not appropriate to represent OGC services as data stores, as a data store refers to a traditional database. Data from OGC services is requested via a query to a standardized interface, which may refer to a set of databases or indeed, other services behind the scenes. It is more likely that access to data should be treated like any other service task in BPMN as it is essentially performing the same operation.
This section outlines the draft best practices for orchestrating OGC services using BPMN. The main focus of these best practices is:
Representation of processes and tasking within BPMN.
How to configure processes.
Representation of workflow outputs.
BPMN contains several methods of representing processes and tasking in the form of Activity Tasks. There are several options for tasking that enable the workflow to perform functions such as:
Sending and receiving information.
Implementing business rules.
OGC processing tasks through services such as WPS and WCPS should therefore be represented as a Service Task, as they are designed to represent executable processes.
Understanding how to manage data objects across the OGC services suite and as part of a workflow is a key requirement for this piece of work. As mentioned previously, there are two main types of data objects that need to be represented:
Data that persists beyond the life-cycle of a workflow.
Data that exists in the life-cycle of a workflow only.
These different data objects map well to the BPMN concept of a data store and a data object. However, there is a conceptual difference between the data object in BPMN and passing parameters. The data object should only exist according to the life-cycle of a process. Therefore, a process that includes an input from a previous process’s output should be modeled separately as data outputs and then data inputs.
Data that is to be utilized beyond the life-cycle of the workflow via exposure as global processes.
Internal mapping of data should be represented by a data object where supported and mapped where required.
Where data formats are compatible within a workflow, data should be passed by reference.
Where data formats are incompatible within a workflow, data should be passed by value and encoded in an appropriate format.
A data store is used for a persistent database, which is rarely the case in OGC services. Instead data are posted and accessed through services as the data architecture behind the services is opaque. Therefore, a data store should not be used to invoke an OGC service, as they are completely different concepts but maybe used to represent a local or remote traditional database storage.
The data object is used to graphically represent the data inputs and outputs from a single Task. The BPMN specification describes usage of the data object be restricted to a single Task and as such, the data object is destroyed with the completion of the Task. Data objects should be used in the following circumstances:
Describing relevant inputs and outputs of singular Tasks.
Additionally, data objects should be used to represent certain data types, notably, complex types that include:
Extensible Markup Language (XML)/ Geography Markup Language (GML).
Any other complex type.
This is done to enable visualization of the data flow within a workflow and between Service Tasks.
By convention, all data objects have states that are referenced after the name of the data object and communicated via square braces (). Data objects in OGC do not require a specific state to be accessed, as the data are all provided through interaction with service tasks. For example, there has been no use case presented so far for workflows that simply requests data, as the objective of the BPMN workflow is to provide processing orchestration with data request orchestration a bi-product.
Collections of data objects are simple representations of one to many data objects within a single state. Collections of data objects are not represented in a service or resource-based architecture as the data objects are generally provided from a single WFS or WCS interface. What happens behind the scenes in these queries is largely up to the implementer, therefore requesting data objects is largely the work of a process. A related concept is passing multiple instances of an input to a WPS, in this instance, data object collections may be used.
In addition to input data, processes and service tasks also often contain parameters for configuring each service. Unlike data objects, these parameters should be configured directly in the Service Task and not represented by a data object. Simple types typically consist of the following:
The nature of OGC services is to utilize complex data inputs and therefore a helper class has to be created for each implementation. This section does not describe in detail the implementation practices for creating helper classes as it is envisaged that BPMN does not support external data types natively, but instead uses a common internal method of dealing with data objects. Currently, the data parsers are defined by the processing services, as it is likely that this is the entry point for getting data. As mentioned previously, the BPMN engine has no processing capability of its own but relies on the services that it is orchestrating for processing and data parsing power. Therefore, all outputs from a WPS are represented as an Object type. This allows the output of a WPS, be it as a FeatureCollection or reference, to be successfully handled by the workflow engine. It is possible to define the input and output types as their actual types if required. The disadvantage of using Object types is that no validation is done by the workflow engine on the content of an input or output parameter.
Currently, all image types should be referred to via a reference since imagery is not supported natively within BPMN or via the helper class. But as with vector data, services that can support imagery should be used to process and parse that imagery.
Accessing a processing service via a service task is done by passing the variables required to execute the process from the BPMN document to the execute document, irrespective of the specific WPS. A WPS process generally requires the following to execute:
WPS Uniform Resource Locator (URL) - the location of the WPS.
Process description - the specific WPS process to execute.
Authentication information (optional).
Variables - the required variables to execute the process.
The variables required to execute the process are defined in the BPMN document.
This section outlines the component design and implementation for the BPMN workflow engine component. The design is based off of the overall architecture and the statement of work provided by the sponsors. In general terms, the implementation is designed to reflect the best practices described in this ER with respect to the scenario or set of use cases. The best practices however are intended to be independent of the BPMN engine used.
The implemented workflow helper class is deployed in jBPM, a Java-based BPMN workflow engine. The service utilizes the jBPM Workbench stack that provides a REST interface for remote execution. This interface is sent a BPMN document by the WPS-T that was in turn orchestrated by the client. The BPMN engine executes the document and then returns the result (an output or an error to the WPS-T that passes it onto the client).
jBPM works with Custom Work Items to orchestrate services and these work items have definitions that describe the inputs, outputs and data types. Once these are described, they are selectable within the web application interface. Enabling the re-usability of work items in jBPM is not a requirement for this piece of work. Therefore the helper class provides the BPMN engine with knowledge of spatial data structures rather than performing functions such as setting up the user interface.
The helper class essentially acts as a WPS client that creates the process request for each WPS from the submitted BPMN document. It is then the job of the workflow engine to map the outputs from one process to the input process that requires it. This is done through a simple local variable definition or, if supported, via a data object.
Throughout the Testbed process, there has been discussion as to the model of security that should be used to enable differential access to services according to access credentials of a user. There have been two main scenarios discussed to handle security in the workflow engine, these are as follows:
Encoding the security information in the BPMN document.
Passing security information including tokens in the HTTP request header.
Encoding the security information directly into the BPMN document. This offers the advantage of differential access to services, potentially in separate federations. However, it also creates several points of concern including:
The aforementioned security concerns of the sponsor and others.
The ability to extract data from a protected service and then insert it into an unprotected service by way of result passing within the workflow engine.
The increased complexity of modeling security in this way.
The usage of different types of security. Putting an access token in the BPMN document maybe suitable for OAuth2, but it is unclear how this transfers to other types of security.
Exposing security information such as access tokens outside of the HTTP header may be considered a security risk.
The implications of different concepts of security are described in Figure 7. There are three concepts expressed within the diagram, the first is where the BPMN engine sits outside of the security federation and requires credentials represented by the solid line. This scenario offers the advantage of allowing access to the BPMN workflow engine by outside actors, but requires credentials to access the federated, protected services. The approach adopted in Testbed-14 largely conforms to this. A second scenario is described by the dashed line where the workflow engine is inside the federation. This offers the advantage of not requiring credentials to be passed to the workflow engine as the user is already authenticated and therefore has access to the services. A disadvantage with this approach is that the workflow engine is potentially closed to users that are not authenticated and therefore does not offer differential access to services. A third scenario is a workflow that includes services that are both inside and outside the federation represented by the WPS-3 object in the diagram. In this instance, it is possible for the BPMN engine to pass protected data out of the federation to a service that is not authenticated to see that data, therefore posing a potential security risk.
Although these issues exist, in Testbed-14 it has been decided that the security token would be passed as part of the BPMN document describing each service. This allows differential access to security models and enables the workflow engine to deal with secured and unsecured services in the same workflow.
The BPMN workflow engine component works in a similar manner to the workflow engine component, that is, an open source implementation of a workflow engine coupled with a helper class to enable the workflow engine to understand geographic data constructs. As the workflow engine is used to request either data or processing capability, the workflow engine does not contain any data parsing capabilities (beyond that implemented in the helper class), but instead relies on the external services orchestrated to perform this functionality. Adopting this approach solves the data encoding/decoding issues that arose in the Testbed-13 Workflows implementation and report.
Relying on external services to perform data encoding and decoding keeps the workflow engine agnostic and it therefore does not make any demands on the services being orchestrated. However, the workflow engine still requires an internal construct to be able to manage geographic information internally, this is accomplished through the helper class, which is likely to require some language specific components.
The open source BPMN engine used in this implementation is jBPM (https://www.jbpm.org/) created and maintained as an open source product by Redhat. jBPM has several components including a development suite and a web application that includes a visual front end for orchestrating services. In terms of the OGC implementation, there are three aspects of the jBPM to consider:
The workflow engine and helper class.
The REST API for remote execution.
The User Interface (UI).
The objective of this work item is to produce a helper class for the workflow engine, therefore that will be the focus of the implementation. Orchestrating OGC services using the jBPM user interface is not in scope for this piece of work (although it is functional), the BPMN documents are created by the client and then sent to the WPS-T, which is in turn handed off to the workflow engine to actually to the service orchestration and execution. The workflow engine is fronted by a WPS which provides a standards based interface for the WPS-T to call. The WPS contains a process called ExecuteWorflow that accepts a BPMN document as a parameter.
The UI aspects of jBPM are mentioned here as they require customization and configuration within the web application to be fully functional. That is, should the user wish to see their custom work items appear in the UI, then they have to define them as part of the Work Item Definition (WID) file. This can be accomplished remotely, but is deemed out of scope for this Testbed. A full explanation of this can be found in the draft OGC 16-091 BPMN 2.0 for Orchestrating OGC Services Discussion Paper.
As mentioned previously, the helper class is a simple implementation that is compiled and added to the jBPM library priory to deploying the web application (WEB-INF/lib/<name of jar>.jar), this enables the web application to have access to geospatial constructs. As jBPM is written in Java, the geospatial library used is GeoTools (www.geotools.org), and the specific class used to hold and manipulate geospatial information from within the workflow engine is FeatureCollection. There are variants of the FeatureCollection such as SimpleFeatureCollection, which deals specifically with simple features (i.e. flat data structures). However, these are not implemented as specific constructs as the FeatureCollection can handle any variants or derivatives of this construct.
jBPM requires two types of class to execute WPS processes and by extension, WFS data. Workflows will usually have a processing component to them, therefore parsing of data objects should be handled by the processing services with the outputs from those processing service mapped to data objects. This means that a workflow that only contains data inputs and outputs will not make use of the helper class as a data object will likely only be passed as a reference between BPMN Activities. The data are only parsed when it is necessary to do so as it needs to be processed. The two classes that need to be implemented are as follows:
WPS client - to create execute documents, send execute requests, and parse and transmit the results of the execution. The WPS client is 2.0 compliant.
Custom work item handler - this class acts as the interface between the WPS and the workflow engine. To keep it generic, the work item handler for a WPS has the following variables:
The WPS URL (for example http://localhost:8080/wps/)
The WPS process description (for example echo.process)
A hashmap defined as <String>, <Object> - this is a generic class for handling process specific variables. The Object data type allows for casting in the WPS client.
An OAuth Bearer token, optional overall but required if the process being called is secured.
These variables are then exposed in each of the workflow service tasks in the BPMN document for population. If the workflow engine interface is being used for orchestration, then it is prudent to configure the workflow engine to take default values for the WPS URL and the process description, as these can be gathered from the WPS services exposed in the workflow. In Testbed-14, however, the workflow documents are configured prior to submission and execution and therefore default values are not required.
The helper class is designed to be generic as it is able to deal with any WPS with inputs. Additionally, processes have to be registered with the workflow engine and maybe submitted in a BPMN document without registration. This means that the workflow engine should act as a receptacle of a BPMN workflow and not put any further burden on the user to register processes.
This implementation does security by first testing whether the capabilities of the service requires security, it does this by getting the capabilities of a service and looking for <Constraint> tags within the advertised operations.
It is assumed that if the constraint tags are missing or are empty, then there are no security constraints on the service. If there are no security constraints, then the service execution happens as normal without the inclusion of a bearer token in the request. However, if there are found to be constraints, they are currently assumed to be OAuth2. The Bearer token is currently held in the process variables for the BPMN workflow. Originally it was considered to have a single security federation with the OAuth Bearer token inserted into the header of the request to the WPS that fronts the BPMN workflow engine. However, it quickly became apparent that this was an over-simplification of the requirements for even the simplest workflows due to:
The lack of available services to produce a simple demonstrator workflow.
The recognition that security is prevalent throughout many services, therefore a generic BPMN workflow engine needs to be flexible enough to utilize multiple tokens.
The BPMN engine has its own security requirements, one of which is encoding the credentials of the executor in the header of the REST calls to create the container that can execute the workflow. Therefore, passing all credentials in the header becomes problematic.
The helper class then takes the security token as an input and inserts it into the HTTP header of the execute request for the relevant WPS service. If the token is valid then the WPS executes as normal.
jBPM acts as a Git repository, this is implementation specific, and simply a way of pushing a remotely constructed BPMN document to a running instance of jBPM. Therefore, the workflow engine has a WPS facade in front of it to:
Receive the BPMN document.
Create the Git repository on the jBPM instance.
Execute the workflow.
These functions are performed using a combination of Git calls made in jGit (https://git-scm.com/) and then calls to the jBPM REST API within the WPS process.
It is noted that other BPMN workflow engines perform this task in a different way, i.e. by directly allowing access to the remote execution aspect of the workflow as seen in the Camunda implementation in Testbed-13.
The WPS contains a single process to execute the workflow with a parameter to receive the BPMN document. This approach replicates some of the work done in the WPS-T aspect of the architecture for simplicity and to assist other implementers. The process description for the ExecuteWorkflow WPS process is as follows:
<?xml version="1.0" encoding="UTF-8"?> <wps:Execute xmlns:wps="http://www.opengis.net/wps/2.0" xmlns:ows="http://www.opengis.net/ows/2.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wps/2.0 http://schemas.opengis.net/wps/2.0/wps.xsd" service="WPS" version="2.0.0" response="document" mode="sync"> <ows:Identifier>testbed14.workflows.ExecuteWorkflow</ows:Identifier> <wps:Input id="BPMNDocument"> <wps:Reference mimeType="text/xml" xlink:href="http://meekbaa1.miniserver.com/dl/testbed-14-gait-process.bpmn"/> </wps:Input> <wps:Output id="WorkflowResults" transmission="value"/> </wps:Execute>
The WPS is there as a simple way for a client to execute a workflow by executing a WPS process. As mentioned, jBPM uses Git as a method of remotely updating projects within the web application. Therefore, the WPS process simply commits the BPMN document to the relevant project within the jBPM instance. In the testbed, the jBPM project is already setup and the change committed is the workflow document received by the WPS. jBPM projects contain supporting files beyond the workflow document such as project configuration that includes registration of service task work items and deployment variables. Although these can be configured remotely via Git, they are not in the Testbed due to the added complexity.
This highlights an issue where jBPM requires more than a BPMN document to perform executions as there are several supporting files used in the registration process. Therefore in the testbed, projects are updated with changes to the BPMN document and these usually include updates to variable values for the WPS processes executed by the workflow engine.
jBPM offers a REST API as a method of executing workflows remotely, however this is a multi-step process. The steps are as follows:
Check for container deployment of existing workflow project and if necessary undeploy the container.
Deploy the new container with the updated BPMN document.
Create a new instance of the BPMN process using the deployed container - note that a container has the ability to execute several workflows.
Execute the instance of the workflow and get the workflow execution status - this does not correspond to the results of the workflow. it is a set of messages that respond to whether the workflow has executed correctly or not.
Executing jBPM remotely requires basic authentication, which is passed in the header information. Additionally, content-type must also be set appropriately (all examples here are XML, but the system has been tested using JSON).
All of this is managed through the REST API with the following example requests.
Deleting an existing deployment is required to remove any existing container deployment of a project. Note that containers need to be rebuilt when the underlying project is changed via Git and example REST request using HTTP DELETE is as follows:
This process is called regardless of whether there is a running container or not. In an operational system, it is likely that existing containers would have to be checked for running processes prior to removal.
The new container is deployed with the updated project using HTTP PUT operation:
<kie-container container-id="Testbed_14_secured_1.0.0"> <release-id> <artifact-id>Testbed_14_secured</artifact-id> <group-id>com.myspace</group-id> <version>1.0.0</version> </release-id> </kie-container>
This process simply rebuilds the project using the updated BPMN documents that have been committed via Git.
Process execution via REST has the ability to take variables as arguments for execution. However, all variables are described in the submitted BPMN document and execution without further arguments is therefore sufficient. This endpoint is executed using an HTTP POST operation with a blank body.
When called, this endpoint returns a process ID to the client for reference.
After the process has completed (in this case it is run synchronously), the results are made available via the following HTTP GET call:
The results are then returned to the calling WPS and inserted as WPS output in the WorkflowResults variable.
The architecture has changed a little from the Statement of Requirement (SOR) as the main addition is the WPS that sits in front of the workflow engine. It is felt that including a WPS on the workflow engine side is simpler than attempting to write a WPS process for the WPS-T that is under the control of another vendor. This way, a standard interface is provided for anyone to execute the BPMN workflow engine beyond the WPS-T, therefore adhering to a loosely coupled architecture. The crude revised architecture can be found in Figure 8.
The behavior of the system requires implementation of the use cases found in Figure 9. The Actors on the system are the WPS-T and the generic client. The generic client is included to show that the revised interfaces are standards based and therefore any client can interact with the WPS and therefore the workflow engine. The use cases involve receiving and parsing a BPMN document to orchestrate the processes. This is not a simple command as it involves treating the workflow web application as a Git repository, this enables dependencies to be incorporated into a workflow command (through Maven), but also requires aspects such as authentication.
The sequence of executions is much like that seen in Testbed-13. The sequence diagram for the implementation is found in Figure 10. The system receives the BPMN diagram and executes the workflow. Access to services is checked as the access token is simply passed in the header of a POST request. The BPMN system is not doing any token authentication, it simply passes the token into the execute document for the WPS services to initiate a session. If the user is not authenticated, then the BPMN service receives an error that is passed onto the client. After the workflow has executed, there are two possible outcomes, either the workflow executes all the way to the end, or it does not. If it does, then a result is generated and passed to the client. If it does not then an error message is passed to the client.
jBPM is an implementation of a BPMN workflow engine and has its own methods of committing workflows to be executed, deploying containers, executing tasks and retrieving results. Figure 11 describes the jBPM specific sequence for receiving a BPMN document, committing the document to the workflow engine and performing the execution. jBPM uses the Git protocol to create and update projects. The specific REST calls used are described in the previous section. The local machine contains a Git project that is changed according to a newly submitted BPMN document through the WPS, the changes are then committed to the local Git repository and then the changes are pushed to the remote Git repository, i.e. the jBPM web application. This requires authentication that is hard coded into the WPS process as a temporary solution. After the project is committed, old versions of the deployment have to be undeployed, this is done through a simple REST call. Then, a new version is deployed and built, and the process is executed by a further REST call and assigned a task number. This enables multiple instances of a process to be run sequentially or simultaneously. It is possible to inject variables into the REST call, however, this is not required in this use case because all of the variables updates are included in the submitted BPMN document. After the process is completed, the results are made available by a further REST call with the previously assigned task number used as the lookup.
Unlike previous Testbed initiatives on workflows, this piece of work has no grounding scenario and no services to execute as part of the CFP. Therefore, a use case for workflow execution has been created in collaboration with the sponsors and thread partners. The BPMN engine will be responsible for executing a data quality workflow using the National Geospatial Intelligence Agency (NGA) Gait tools  fronted by a WPS facade.
The workflow has been setup to include two processes:
NGA Gait Tools process:
<wps:ProcessOfferings xmlns:wps="http://www.opengis.net/wps/2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ows="http://www.opengis.net/ows/2.0" xsi:schemaLocation="http://www.opengis.net/wps/2.0 http://schemas.opengis.net/wps/2.0/wps.xsd"> <wps:ProcessOffering processVersion="1.0.0" jobControlOptions="sync-execute async-execute" outputTransmission="value reference"> <wps:Process> <ows:Title> A simple demonstrator process to execute a workflow </ows:Title> <ows:Abstract> Takes a BPMN document, executes the workflow and produces the result </ows:Abstract> <ows:Identifier>testbed14.workflows.ExecuteWorkflow</ows:Identifier> <wps:Input minOccurs="1" maxOccurs="1"> <ows:Title> The BPMN Document to execute (this process only allows for workflows with known handlers) </ows:Title> <ows:Identifier>BPMNDocument</ows:Identifier> <wps:ComplexData xmlns:ns="http://www.opengis.net/wps/2.0"> <ns:Format default="true" mimeType="text/xml"/> </wps:ComplexData> </wps:Input> <wps:Output> <ows:Title>The results from the workflow engine</ows:Title> <ows:Identifier>WorkflowResults</ows:Identifier> <wps:ComplexData xmlns:ns="http://www.opengis.net/wps/2.0"> <ns:Format default="true" mimeType="text/xml"/> </wps:ComplexData> </wps:Output> </wps:Process> </wps:ProcessOffering> </wps:ProcessOfferings>
A sample process that returns the data given to it, although it performs no processing on the data, it demonstrates:
That processes can be chained.
That a workflow engine can work with both secured and unsecured processes.
The workflow engine working with data generated with different WPS.
<wps:ProcessOfferings xmlns:wps="http://www.opengis.net/wps/2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ows="http://www.opengis.net/ows/2.0" xsi:schemaLocation="http://www.opengis.net/wps/2.0 http://schemas.opengis.net/wps/2.0/wps.xsd"> <wps:ProcessOffering processVersion="1.0.0" jobControlOptions="sync-execute async-execute" outputTransmission="value reference"> <wps:Process> <ows:Title>storage.geoserver.GetWFSData</ows:Title> <ows:Identifier>storage.geoserver.GetWFSData</ows:Identifier> <wps:Input minOccurs="1" maxOccurs="1"> <ows:Title>Input WFS Data</ows:Title> <ows:Identifier>inputWFSUrl</ows:Identifier> <wps:ComplexData xmlns:ns="http://www.opengis.net/wps/2.0"> <ns:Format default="true" mimeType="application/x-zipped-shp"/> <ns:Format mimeType="text/xml; subtype=gml/3.1.1" schema="http://schemas.opengis.net/gml/3.1.1/base/feature.xsd"/> <ns:Format mimeType="text/xml; subtype=gml/3.1.0" schema="http://schemas.opengis.net/gml/3.1.0/base/feature.xsd"/> <ns:Format mimeType="application/json"/> </wps:ComplexData> </wps:Input> <wps:Output> <ows:Title>Vector Data</ows:Title> <ows:Identifier>vectorData</ows:Identifier> <wps:ComplexData xmlns:ns="http://www.opengis.net/wps/2.0"> <ns:Format default="true" mimeType="application/x-zipped-shp"/> <ns:Format mimeType="text/xml; subtype=gml/3.1.1" schema="http://schemas.opengis.net/gml/3.1.1/base/gml.xsd"/> <ns:Format mimeType="text/xml; subtype=gml/3.1.0" schema="http://schemas.opengis.net/gml/3.1.0/base/feature.xsd"/> <ns:Format mimeType="application/json"/> </wps:ComplexData> </wps:Output> </wps:Process> </wps:ProcessOffering> </wps:ProcessOfferings>
The BPMN document used in Testbed-14 chains the two processes above and is what was used in the Technology Integration Experiment (TIE) testing aspects of the workflow. The BPMN document was to demonstrate the security token parameter being changed to show transactional BPMN document execution, it can be found in the annex.
jBPM has two different environments for orchestrating and executing processes, the development environment based upon the Eclipse Integrated Development Environment (IDE), and the web application which is a set of standalone modules running on JBOSS Wildfly 11. There are several differences between the two and they are not particularly interoperable and both have been included in this document to show the mapping of data objects. The web application does not support data object icons, but the Eclipse IDE environment does. From a BPMN perspective, both methods of mapping variables produce valid BPMN (both conceptually and via a validator), but the formal mapping of objects is preferred.