wiki:CEDAWPS/Overview

Version 8 (modified by astephen, 9 years ago) (diff)

--

CEDA WPS - Overview

Return to CEDA WPS main page

Introduction

This page provides an overview of what the CEDA Web Processing Service (CEDA-WPS) is, its various components and how it can be used.

The WPS Standard

What is the WPS Spec?

WPS is an Open Geospatial Consortium (OGC) specification. As with many OGC specifications the main focus is on building a web-application to respond to a standardised interface in order to aid interoperability.

At its simplest level, the WPS spec provides a generic framework for deploying any old process in an OGC-compliant manner. If you have an OGC stack (such as COWS) then the WPS can be built on this, hooking into the existing deployment/testing environment and making use of such features as OGC-style exceptions.

Some key features of the WPS spec are:

  • encode requests/responses for/from process execution
  • embed data and metadata in process execution inputs/outputs
  • reference web-accessible data inputs/outputs
  • support long-running (asynchronous) processes
  • return process status information
  • return processing errors
  • request storage of process outputs

The spec consists of 3 main methods:

  • GetCapabilities? - tells you what processes are available
  • DescribeProcess? - tells you about the inputs, outputs and metadata relating to a given process
  • Execute - runs a given process

Uptake and compliance issues

At the time of writing it is fair to say that WPS has not been taken up in the community like the WMS, WCS and WFS specs. However, this is understandable due to its more complex and less focussed nature. It is probable that WPS fits into "service chains" somewhere in the future.

A working group (WPS-2.0.SWG) is continuing to refine the WPS specification including adding some features already present in the CEDA WPS, for instance cancelling jobs. Although the future of WPS as a widely used standard within OGC is uncertain the use of asynchronous processing services is not.

Within OGC there are several service specifications competing to provide this functionality. There has recently been a debate within WPS-2.0.SWG about harmonising WPS, WfCS (Workflow chaining service) and SPS (sensor planning service). As always with the OGC there is a lot of innovation going on so tracking the WPS standard too closely risks wasting effort. This encourages us to think we have the right strategy of using the general features of WPS as a blueprint for a service that is really useful.

Sept 2010: At the FOSS4G conference 2010 it a presentation that compared interoperability between WPS server and WPS client implementations and also the compliance with OGC schematas. Many implementations fell far short of "compliance".

Versions

At present we are working to the 1.0.0 specification (2007). There are no subsequent versions but much is likely to change as implementers feed back issues with this spec.

The CEDA WPS: Overview

The benefits of developing a WPS

The WPS is a relatively new spec and the opportunities of real interoperability are currently limited. However, we do:

  • have an OGC codebase (COWS) which makes deploying OGC services straightforward
  • have a requirement for a generalised web-service framework (to deploy a range of services)
  • have a requirement for a batch processing system that run asynchronous jobs, which in turn means we...
  • have a requirement for a method of handling asynchronous requests and responses

When we say "WPS", we mean more than the WPS spec

Traditionally we have developed services for one project or another that share many common functions. However, deploying on different languages, platforms and by different developers has led to these functions being duplicated for each service. The purpose of the CEDA-WPS is to create a deployment framework for any new service (but in WPS-speak we refer to "processes" instead of "services").

By using the WPS specification to define the interface we have built a deployment framework that has common hooks for:

  • Any Web Service code - automatically providing a web service interface to any callable function
  • OGC-style exceptions - so all exceptions are wrapped and returned usefully to the client
  • Offline job scheduler - allowing large offline processes to be run
  • A User Interface that auto-generates submission forms and presents responses
  • Zipping of output files
  • Notification of job completion - e-mail to user
  • Writing to "/requests" directories (or wherever you like)
  • Estimating size and duration of a job (a dry-run)
  • Caching of outputs (per process)
  • Ability to run some quick jobs inside the current process, and to schedule larger jobs on other servers
  • Connection to the CEDA archive
  • Robust parallelised service
  • Querying of current/cached/old jobs
  • Integration with CEDA (NDG) Security
  • Scalability - due to deployment on virtual machines
  • Development environment separate from the deployment environment

We have developed a framework to deal with these issues which should allow non-developers to deploy "useful processes" with ease.

CEDA-WPS interoperability?

With our attention being focussed on the above list of operational requirements it is fair to say that we have not been driven by interoperability. However, we continue to work towards greater compliance with the WPS standard and to feed into the development of the standard.

The CEDA-WPS: architecture

One view of the CEDA-WPS architectuer is to look at the deployment view. In the figure below it is easiest to work from the bottom up. Firstly, there are n instances of batch processing virtual machines (VMs), receiving instruction from the scheduler that manages communication from the WPS layer. The are many instances of the WPS layer that provide running of small jobs and scheduling of large jobs. The WPS layer also provides the database interactions required to manage jobs as well as a secured service.

On the right are multiple instances of the WPS UI (which currently runs in the same Pylons application as the WPS - but doesn't have to) that provide browser interaction with the WPS. However, the "Process-specific UI" boxes on the top-left show that it is also possible to develop any number of user interfaces/applications/portals to act as a client to the WPS. In some cases these applications could be part of another OGC interface which uses the WPS for service-chaining.

NOTE: At present the missing piece of this diagram is the state server that holds the output cache disk and the underlying database.

The next section talks in more detail about each layer.

The WPS layer

The WPS is a Python application using Pylons running inside of the mod_wsgi Apache module. The WPS will either farm a request out to one of its multi-processes, dispatching a response within 15 seconds, or it will schedule the job with the offline processing layer.

The UI layer

The UI web-interface layer is also deployed under Pylons and is a mixture of python templating and JavaScript?. It provides the following components:

  • view of each process
  • submission form for each process - automatically generated from the configuration file
  • handling of submission - including confirmation page for async requests
  • monitoring of async responses
  • results page - showing results of a job (XML and other views)
  • jobs page - queryable listing of all jobs
  • cancellation of jobs
  • admin interface - simple admin view of jobs page

The current UI is deployed at:

 http://ceda-wps1.badc.rl.ac.uk/ui/home

Why build a User Interface?

The WPS does a number of useful things but interfacing with URL requests and XML responses can be difficult for the most component of programmers, let alone users! The UI layer is currently a bespoke WPS client for our WPS. As the specification develops we will be able to migrate the UI to work with responses from GetCapabilities and DescribeProcess outputs. At present the UI cheats - it builds its contents based on the WPS process configuration files rather than grabbing the DescribeProcess? response.

The power of the UI comes in auto-generating forms for each process. This is important because it removes the need to create new web forms for every single process we want to deploy. Also, the UI includes validation of inputs based and type-checking for:

  • strings
  • integers
  • floats
  • regular expression matches
  • bounding box inputs

The UI also provides a jobs page that is really just a presentation of the underlying database tables. Users can interrogate old jobs, cancel current jobs and re-extract previous outputs.

Offline processing layer

Any processes that take a significant amount of time/resource to run are labelled in their configuration files as asynchronous. This layer is separated out because it has no web-based services. Its role is to run large jobs that will take between minutes and hours to run. It is managed by Sun Grid Engine (SGE) which schedules jobs from the WPS layer. The offline processing layer limits access on a one job-per-user basis.

Managing state: databases and common disk

The WPS and UI layers require interaction with the WPS-db in order to manage jobs and requests. The postgres db is currently running on bora.badc.rl.ac.uk and is accessed from all instances of the WPS and UI.

Outputs are currently written to a cache disk but we will move over to using the "/requests" area soon. This is mounted across all WPS and Offline processing layers so that they can all be accessed.

WPS "Processes"

The concept of "processes"

A "WPS process" translates to a piece of callable code that returns something. In the past we might have developed a "service" to do this, with the WPS we can think of "adding the process" to the CEDA-WPS.

Each process is defined within the WPS layer with two files:

  • a process configuration file
  • a python module

These files can be generated from templates using a simple script. The example below shows how this can be done.

Adding a new process: example

The following code needs to be run to add a simple process

$ ./create_process.sh just_a_demo JustADemo
Wrote new process module: process_modules/just_a_demo.py
Wrote new process config: process_configs/JustADemo.ini

# Re-start the WSGI app (to pick up the changes)
$ touch /usr/local/apache2/wsgi_scripts/cows_wps.wsgi

See the Just A Demo walk-through page for how the UI would present this.

The CEDA-WPS: Deployment issues

How do we deploy the WPS?

Our WPS is deployed on VMs with NDG-security provided as a middleware layer that processes each URL to decide if access should be secured.

Hardware

At present we have two VMs running on kona.badc.rl.ac.uk, these are:

  • ceda-wps1.badc.rl.ac.uk - which runs the WPS and UI layers in one Pylons app
  • ceda-batch1.badc.rl.ac.uk - which runs the offling processing layer

We are currently developing the second set of VMs on hurricane.badc.rl.ac.uk.

Re-starting

The WSGI app is re-started simply by touching the WSGI file:

$ touch /usr/local/apache2/wsgi_scripts/cows_wps.wsgi

Integration with NDG security

The WPS is secured by NDG security as documented on the securing WPS page.

The policy.xml file allows per-process access to files.

Access to outputs also needs to be restricted. This is done by ...TBD

Giving processes access to the archive

In general our processes require read access to the CEDA archives. The "cwps" user has been created and is currently used to run the WPS (and entire Apache service). This user has read access to everything under "/badc".

The test environment

We are currently working out how best to deploy the test/development environment to avoid degradation of the operational service. A major issue is that the entire environment needs to be set up the same as the deployment environment to allow processes to connect to datasets, security to work, etc etc,.

Deployed uses of the WPS

UKCP09

The UK Climate Projections (UKCP09) User Interface sits in front of a WPS which is employed to serve up the following processes:

  • generation of plots (line graphs, maps, contour plots, box-and-whisker plots) - synchronously
  • generation of data products (CSV and NetCDF outputs) - sync and async
  • scheduling large Weather Generator model runs - async

It also informs the UI about:

  • previous and current jobs related to each user

MashMyData

In the NERC-funded??? MashMyData project we are collaborating with the Reading eScience Centre??? to embed Java processes under the CEDA-WPS that will be called from the externally hosted MashMyData processing engine.

The future

  1. How to define dynamic possible values for WPS inputs? E.g. the possible values for the "County" parameter are found at: http://uk-county.co.uk/services?GetCountyList=England::xml://Response/Results/Counties
  1. How to define dynamic parameter selection for WPS inputs? E.g. the possible parameters that need selecting for the "SubsetData?" process are provided by: http://data.data.org.uk/serv/data?dataset=${Dataset}::xml://Response/Results/Domain/.*

Technologies

The CEDA-WPS runs using the following technologies:

  Pylons
  WebHelpers==0.6.4
  PasteScript==1.7.3
  WebOb
  cows
  cdat_lite==5.0-0.2.9pre2
  csml
  SQLAlchemy>=0.5.0rc1
  geoplot
  cdms-utils
  nappy
  nose
  TileCache==2.01dims-0.3
  genshi
  lxml
  vncctrl
  multiprocessing
  cElementTree
  cows_wps
  matplotlib
  basemap
  psycopg2
  ndg_security_server

And we use Sun Grid Engine for scheduling.

Attachments