Changes between Version 6 and Version 7 of CEDAWPS/Overview

16/09/10 10:58:20 (10 years ago)



  • CEDAWPS/Overview

    v6 v7  
    77== Introduction == 
    9 This page provides an overview of what the CEDA Web Processing Service is, its various components and how it can be used. 
    11 == What is the WPS Spec? == 
    13 == Why should we bother with WPS? == 
    15 == Is our WPS standards-compliant? == 
    17 == When we say "WPS" we mean more than a "WPS" == 
    19 === "Our" WPS architecture === 
     9This page provides an overview of what the CEDA Web Processing Service (CEDA-WPS) is, its various components and how it can be used. 
     11== The WPS Standard == 
     13=== What is the WPS Spec? === 
     15WPS is an Open Geospatial Consortium (OGC) specification. As with many OGC specifications the main focus is on building a web-application to respond to a standardised interface in order to aid interoperability. 
     17At its simplest level, the WPS spec provides a ''generic'' framework for deploying ''any old process'' in an OGC-compliant manner. If you have an OGC stack (such as COWS) then the WPS can be built on this, hooking into the existing deployment/testing environment and making use of such features as OGC-style exceptions. 
     19Some key features of the WPS spec are: 
     21 * encode requests/responses for/from process execution 
     22 * embed data and metadata in process execution inputs/outputs 
     23 * reference web-accessible data inputs/outputs 
     24 * support long-running (asynchronous) processes 
     25 * return process status information 
     26 * return processing errors 
     27 * request storage of process outputs 
     29'''The spec consists of 3 main methods:''' 
     31 * GetCapabilities - tells you what processes are available 
     32 * DescribeProcess - tells you about the inputs, outputs and metadata relating to a given process 
     33 * Execute - runs a given process 
     35=== Uptake and compliance issues === 
     37At the time of writing it is fair to say that WPS has not been taken up in the community like the WMS, WCS and WFS specs. However, this is understandable due to its more complex and less focussed nature. It is probable that WPS fits into "service chains" somewhere in the future. 
     39A working group (WPS-2.0.SWG) is continuing to refine the WPS specification including adding some features already present in the CEDA WPS, for instance cancelling jobs.  Although the future of WPS as a widely used standard within OGC is uncertain the use of asynchronous processing services is not.   
     41Within OGC there are several service specifications competing to provide this functionality.  There has recently been a debate within WPS-2.0.SWG about harmonising WPS, WfCS (Workflow chaining service) and SPS (sensor planning service).  As always with the OGC there is a lot of innovation going on so tracking the WPS standard too closely risks wasting effort.  This encourages us to think we have the right strategy of using the general features of WPS as a blueprint for a service that is really useful. 
     43> Sept 2010: At the FOSS4G conference 2010 it a presentation that compared interoperability between WPS server and WPS client implementations and also the compliance with OGC schematas. Many implementations fell far short of "compliance".  
     45=== Versions === 
     47At present we are working to the 1.0.0 specification (2007). There are no subsequent versions but much is likely to change as implementers feed back issues with this spec. 
     49== The CEDA WPS: Overview == 
     51=== The benefits of developing a WPS === 
     53The WPS is a relatively new spec and the opportunities of real interoperability are currently limited. However, we do: 
     55 * have an OGC codebase (COWS) which makes deploying OGC services straightforward 
     56 * have a requirement for a generalised web-service framework (to deploy a range of services) 
     57 * have a requirement for a ''batch processing'' system that run asynchronous jobs, which in turn means we... 
     58 * have a requirement for a method of handling asynchronous requests and responses 
     60=== When we say "WPS", we mean more than the WPS spec === 
     62Traditionally we have developed services for one project or another that share many common functions. However, deploying on different languages, platforms and by different developers has led to these functions being duplicated for each service. The purpose of the CEDA-WPS is to create a '''deployment framework for any new service''' (but in WPS-speak we refer to "processes" instead of "services"). 
     64By using the WPS specification to define the interface we have built a deployment framework that has common hooks for: 
     66 * Any Web Service code - automatically providing a web service interface to any callable function  
     67 * OGC-style exceptions - so all exceptions are wrapped and returned usefully to the client 
     68 * Offline job scheduler - allowing large offline processes to be run 
     69 * A User Interface that auto-generates submission forms and presents responses 
     70 * Zipping of output files 
     71 * Notification of job completion - e-mail to user 
     72 * Writing to "/requests" directories (or wherever you like) 
     73 * Estimating size and duration of a job (a dry-run) 
     74 * Caching of outputs (per process) 
     75 * Ability to run some ''quick'' jobs inside the current process, and to schedule larger jobs on other servers 
     76 * Connection to the CEDA archive 
     77 * Robust parallelised service 
     78 * Querying of current/cached/old jobs 
     79 * Integration with CEDA (NDG) Security 
     80 * Scalability - due to deployment on virtual machines 
     81 * Development environment separate from the deployment environment 
     83We have developed a framework to deal with these issues which should allow non-developers to deploy "useful processes" with ease. 
     85=== CEDA-WPS interoperability? === 
     87With our attention being focussed on the above list of ''operational requirements'' it is fair to say that we have not been driven by interoperability. However, we continue to work towards greater compliance with the WPS standard and to feed into the development of the standard. 
     89== The CEDA-WPS: architecture == 
     91One view of the CEDA-WPS architectuer is to look at the deployment view. In the figure below it is easiest to work from the bottom up. Firstly, there are ''n'' instances of batch processing virtual machines (VMs), receiving instruction from the scheduler that manages communication from the WPS layer. The are many instances of the WPS layer that provide running of small jobs and scheduling of large jobs. The WPS layer also provides the database interactions required to manage jobs as well as a secured service. 
     93On the right are multiple instances of the WPS UI (which currently runs in the same Pylons application as the WPS - but doesn't have to) that provide browser interaction with the WPS. However, the "Process-specific UI" boxes on the top-left show that it is also possible to develop any number of user interfaces/applications/portals to act as a client to the WPS. In some cases these applications could be part of another OGC interface which uses the WPS for service-chaining. 
     95> NOTE: At present the missing piece of this diagram is the ''state server'' that holds the output cache disk and the underlying database.  
    2197[[Image(ceda_wps_architecture.png, border=1)]] 
     99The next section talks in more detail about each layer. 
    23101== The WPS layer == 
    25 == The "generic" User Interface layer == 
    27 == The batch processing layer == 
    29 == The concept of "processes" == 
    31 === Adding a new process === 
     103The WPS is a Python application using Pylons running inside of the mod_wsgi Apache module. The WPS will either farm a request out to one of its multi-processes, dispatching a response within 15 seconds, or it will schedule the job with the offline processing layer.  
     105== The UI layer == 
     107The UI web-interface layer is also deployed under Pylons and is a mixture of python templating and JavaScript. It provides the following components: 
     109 * view of each process 
     110 * submission form for each process - automatically generated from the configuration file 
     111 * handling of submission - including confirmation page for async requests 
     112 * monitoring of async responses 
     113 * results page - showing results of a job (XML and other views) 
     114 * jobs page - queryable listing of all jobs 
     115 * cancellation of jobs 
     116 * admin interface - simple admin view of jobs page 
     118The current UI is deployed at: 
     122'''Why build a User Interface?''' 
     124The WPS does a number of useful things but interfacing with URL requests and XML responses can be difficult for the most component of programmers, let alone users! The UI layer is currently a bespoke WPS client for our WPS. As the specification develops we will be able to migrate the UI to work with responses from !GetCapabilities and !DescribeProcess outputs. At present the UI ''cheats'' - it builds its contents based on the WPS process configuration files rather than grabbing the DescribeProcess response. 
     126The power of the UI comes in auto-generating forms for each process. This is important because it removes the need to create new web forms for every single process we want to deploy. Also, the UI includes validation of inputs based and type-checking for: 
     128 * strings 
     129 * integers 
     130 * floats 
     131 * regular expression matches 
     132 * bounding box inputs 
     134The UI also provides a jobs page that is really just a presentation of the underlying database tables. Users can interrogate old jobs, cancel current jobs and re-extract previous outputs. 
     136=== Offline processing layer === 
     138Any processes that take a significant amount of time/resource to run are labelled in their configuration files as asynchronous. This layer is separated out because it has no web-based services. Its role is to run large jobs that will take between minutes and hours to run. It is managed by Sun Grid Engine (SGE) which schedules jobs from the WPS layer. The offline processing layer limits access on a one job-per-user basis. 
     140=== Managing state: databases and common disk === 
     142The WPS and UI layers require interaction with the WPS-db in order to manage jobs and requests. The postgres db is currently running on {{{}}} and is accessed from all instances of the WPS and UI. 
     144Outputs are currently written to a cache disk but we will move over to using the "/requests" area soon. This is mounted across all WPS and Offline processing layers so that they can all be accessed. 
     146== WPS "Processes" == 
     148=== The concept of "processes" === 
     150A "WPS process" translates to a piece of callable code that returns something. In the past we might have developed a "service" to do this, with the WPS we can think of "adding the process" to the CEDA-WPS. 
     152Each process is defined within the WPS layer with two files: 
     154 * a process configuration file 
     155 * a python module 
     157These files can be generated from templates using a simple script. The example below shows how this can be done. 
     159=== Adding a new process: example === 
     161The following code needs to be run to add a simple process 
    34 cwps@ceda-wps1:/usr/local/cwps/cows_wps> ./ just_a_demo JustADemo 
     164$ ./ just_a_demo JustADemo 
    35165Wrote new process module: process_modules/ 
    36166Wrote new process config: process_configs/JustADemo.ini 
    38168# Re-start the WSGI app (to pick up the changes) 
    39 cwps@ceda-wps1:/usr/local/cwps/cows_wps> touch /usr/local/apache2/wsgi_scripts/cows_wps.wsgi 
     169$ touch /usr/local/apache2/wsgi_scripts/cows_wps.wsgi 
    42172See the [wiki:CEDAWPS/Overview/JustADemo Just A Demo walk-through page] for how the UI would present this. 
    44 == Integrating with NDG security == 
     174== The CEDA-WPS: Deployment issues == 
     176=== How do we deploy the WPS? === 
     178Our WPS is deployed on VMs with NDG-security provided as a middleware layer that processes each URL to decide if access should be secured.  
     182At present we have two VMs running on {{{}}}, these are: 
     184 * {{{}}} - which runs the WPS and UI layers in one Pylons app 
     185 * {{{}}} - which runs the offling processing layer 
     187We are currently developing the second set of VMs on {{{}}}. 
     191The WSGI app is re-started simply by touching the WSGI file: 
     194$ touch /usr/local/apache2/wsgi_scripts/cows_wps.wsgi 
     197=== Integration with NDG security === 
     199The WPS is secured by NDG security as documented on the [wiki:???? securing WPS page]. 
     201The {{{policy.xml}}} file allows per-process access to files. 
     203Access to outputs also needs to be restricted. This is done by ...TBD 
    46205=== Giving processes access to the archive === 
    48 === The policy.xml file === 
    50 === Securing outputs === 
    52 == Scalability == 
    54 == The test environment == 
     207In general our processes require read access to the CEDA archives. The "cwps" user has been created and is currently used to run the WPS (and entire Apache service). This user has read access to everything under "/badc".  
     209=== The test environment === 
     211We are currently working out how best to deploy the test/development environment to avoid degradation of the operational service. A major issue is that the entire environment needs to be set up the same as the deployment environment to allow processes to connect to datasets, security to work, etc etc,. 
     213=== Deployed uses of the WPS === 
     217The UK Climate Projections (UKCP09) User Interface sits in front of a WPS which is employed to serve up the following processes: 
     219 * generation of plots (line graphs, maps, contour plots, box-and-whisker plots) - synchronously 
     220 * generation of data products (CSV and NetCDF outputs) - sync and async 
     221 * scheduling large Weather Generator model runs - async 
     223It also informs the UI about: 
     225 * previous and current jobs related to each user 
     229In the NERC-funded??? !MashMyData project we are collaborating with the Reading eScience Centre??? to embed Java processes under the CEDA-WPS that will be called from the externally hosted !MashMyData processing engine. 
     231== The future == 
     233 1. How to define ''dynamic possible values'' for WPS inputs? E.g. the possible values for the "County" parameter are found at: {{{}}} 
     235 2. How to define ''dynamic parameter selection'' for WPS inputs? E.g. the possible parameters that need selecting for the "SubsetData" process are provided by: {{{${Dataset}::xml://Response/Results/Domain/.*}}}