Changes between Version 6 and Version 7 of CEDAWPS/Overview


Ignore:
Timestamp:
16/09/10 10:58:20 (9 years ago)
Author:
astephen
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • CEDAWPS/Overview

    v6 v7  
    77== Introduction == 
    88 
    9 This page provides an overview of what the CEDA Web Processing Service is, its various components and how it can be used. 
    10  
    11 == What is the WPS Spec? == 
    12  
    13 == Why should we bother with WPS? == 
    14  
    15 == Is our WPS standards-compliant? == 
    16  
    17 == When we say "WPS" we mean more than a "WPS" == 
    18  
    19 === "Our" WPS architecture === 
     9This page provides an overview of what the CEDA Web Processing Service (CEDA-WPS) is, its various components and how it can be used. 
     10 
     11== The WPS Standard == 
     12 
     13=== What is the WPS Spec? === 
     14 
     15WPS is an Open Geospatial Consortium (OGC) specification. As with many OGC specifications the main focus is on building a web-application to respond to a standardised interface in order to aid interoperability. 
     16 
     17At its simplest level, the WPS spec provides a ''generic'' framework for deploying ''any old process'' in an OGC-compliant manner. If you have an OGC stack (such as COWS) then the WPS can be built on this, hooking into the existing deployment/testing environment and making use of such features as OGC-style exceptions. 
     18 
     19Some key features of the WPS spec are: 
     20 
     21 * encode requests/responses for/from process execution 
     22 * embed data and metadata in process execution inputs/outputs 
     23 * reference web-accessible data inputs/outputs 
     24 * support long-running (asynchronous) processes 
     25 * return process status information 
     26 * return processing errors 
     27 * request storage of process outputs 
     28 
     29'''The spec consists of 3 main methods:''' 
     30 
     31 * GetCapabilities - tells you what processes are available 
     32 * DescribeProcess - tells you about the inputs, outputs and metadata relating to a given process 
     33 * Execute - runs a given process 
     34 
     35=== Uptake and compliance issues === 
     36 
     37At the time of writing it is fair to say that WPS has not been taken up in the community like the WMS, WCS and WFS specs. However, this is understandable due to its more complex and less focussed nature. It is probable that WPS fits into "service chains" somewhere in the future. 
     38 
     39A working group (WPS-2.0.SWG) is continuing to refine the WPS specification including adding some features already present in the CEDA WPS, for instance cancelling jobs.  Although the future of WPS as a widely used standard within OGC is uncertain the use of asynchronous processing services is not.   
     40 
     41Within OGC there are several service specifications competing to provide this functionality.  There has recently been a debate within WPS-2.0.SWG about harmonising WPS, WfCS (Workflow chaining service) and SPS (sensor planning service).  As always with the OGC there is a lot of innovation going on so tracking the WPS standard too closely risks wasting effort.  This encourages us to think we have the right strategy of using the general features of WPS as a blueprint for a service that is really useful. 
     42 
     43> Sept 2010: At the FOSS4G conference 2010 it a presentation that compared interoperability between WPS server and WPS client implementations and also the compliance with OGC schematas. Many implementations fell far short of "compliance".  
     44 
     45=== Versions === 
     46 
     47At present we are working to the 1.0.0 specification (2007). There are no subsequent versions but much is likely to change as implementers feed back issues with this spec. 
     48 
     49== The CEDA WPS: Overview == 
     50 
     51=== The benefits of developing a WPS === 
     52 
     53The WPS is a relatively new spec and the opportunities of real interoperability are currently limited. However, we do: 
     54 
     55 * have an OGC codebase (COWS) which makes deploying OGC services straightforward 
     56 * have a requirement for a generalised web-service framework (to deploy a range of services) 
     57 * have a requirement for a ''batch processing'' system that run asynchronous jobs, which in turn means we... 
     58 * have a requirement for a method of handling asynchronous requests and responses 
     59 
     60=== When we say "WPS", we mean more than the WPS spec === 
     61 
     62Traditionally we have developed services for one project or another that share many common functions. However, deploying on different languages, platforms and by different developers has led to these functions being duplicated for each service. The purpose of the CEDA-WPS is to create a '''deployment framework for any new service''' (but in WPS-speak we refer to "processes" instead of "services"). 
     63 
     64By using the WPS specification to define the interface we have built a deployment framework that has common hooks for: 
     65 
     66 * Any Web Service code - automatically providing a web service interface to any callable function  
     67 * OGC-style exceptions - so all exceptions are wrapped and returned usefully to the client 
     68 * Offline job scheduler - allowing large offline processes to be run 
     69 * A User Interface that auto-generates submission forms and presents responses 
     70 * Zipping of output files 
     71 * Notification of job completion - e-mail to user 
     72 * Writing to "/requests" directories (or wherever you like) 
     73 * Estimating size and duration of a job (a dry-run) 
     74 * Caching of outputs (per process) 
     75 * Ability to run some ''quick'' jobs inside the current process, and to schedule larger jobs on other servers 
     76 * Connection to the CEDA archive 
     77 * Robust parallelised service 
     78 * Querying of current/cached/old jobs 
     79 * Integration with CEDA (NDG) Security 
     80 * Scalability - due to deployment on virtual machines 
     81 * Development environment separate from the deployment environment 
     82 
     83We have developed a framework to deal with these issues which should allow non-developers to deploy "useful processes" with ease. 
     84 
     85=== CEDA-WPS interoperability? === 
     86 
     87With our attention being focussed on the above list of ''operational requirements'' it is fair to say that we have not been driven by interoperability. However, we continue to work towards greater compliance with the WPS standard and to feed into the development of the standard. 
     88 
     89== The CEDA-WPS: architecture == 
     90 
     91One view of the CEDA-WPS architectuer is to look at the deployment view. In the figure below it is easiest to work from the bottom up. Firstly, there are ''n'' instances of batch processing virtual machines (VMs), receiving instruction from the scheduler that manages communication from the WPS layer. The are many instances of the WPS layer that provide running of small jobs and scheduling of large jobs. The WPS layer also provides the database interactions required to manage jobs as well as a secured service. 
     92 
     93On the right are multiple instances of the WPS UI (which currently runs in the same Pylons application as the WPS - but doesn't have to) that provide browser interaction with the WPS. However, the "Process-specific UI" boxes on the top-left show that it is also possible to develop any number of user interfaces/applications/portals to act as a client to the WPS. In some cases these applications could be part of another OGC interface which uses the WPS for service-chaining. 
     94 
     95> NOTE: At present the missing piece of this diagram is the ''state server'' that holds the output cache disk and the underlying database.  
    2096 
    2197[[Image(ceda_wps_architecture.png, border=1)]] 
    2298 
     99The next section talks in more detail about each layer. 
     100 
    23101== The WPS layer == 
    24102 
    25 == The "generic" User Interface layer == 
    26  
    27 == The batch processing layer == 
    28  
    29 == The concept of "processes" == 
    30  
    31 === Adding a new process === 
     103The WPS is a Python application using Pylons running inside of the mod_wsgi Apache module. The WPS will either farm a request out to one of its multi-processes, dispatching a response within 15 seconds, or it will schedule the job with the offline processing layer.  
     104 
     105== The UI layer == 
     106 
     107The UI web-interface layer is also deployed under Pylons and is a mixture of python templating and JavaScript. It provides the following components: 
     108 
     109 * view of each process 
     110 * submission form for each process - automatically generated from the configuration file 
     111 * handling of submission - including confirmation page for async requests 
     112 * monitoring of async responses 
     113 * results page - showing results of a job (XML and other views) 
     114 * jobs page - queryable listing of all jobs 
     115 * cancellation of jobs 
     116 * admin interface - simple admin view of jobs page 
     117 
     118The current UI is deployed at: 
     119 
     120 http://ceda-wps1.badc.rl.ac.uk/ui/home 
     121 
     122'''Why build a User Interface?''' 
     123 
     124The WPS does a number of useful things but interfacing with URL requests and XML responses can be difficult for the most component of programmers, let alone users! The UI layer is currently a bespoke WPS client for our WPS. As the specification develops we will be able to migrate the UI to work with responses from !GetCapabilities and !DescribeProcess outputs. At present the UI ''cheats'' - it builds its contents based on the WPS process configuration files rather than grabbing the DescribeProcess response. 
     125 
     126The power of the UI comes in auto-generating forms for each process. This is important because it removes the need to create new web forms for every single process we want to deploy. Also, the UI includes validation of inputs based and type-checking for: 
     127 
     128 * strings 
     129 * integers 
     130 * floats 
     131 * regular expression matches 
     132 * bounding box inputs 
     133 
     134The UI also provides a jobs page that is really just a presentation of the underlying database tables. Users can interrogate old jobs, cancel current jobs and re-extract previous outputs. 
     135 
     136=== Offline processing layer === 
     137 
     138Any processes that take a significant amount of time/resource to run are labelled in their configuration files as asynchronous. This layer is separated out because it has no web-based services. Its role is to run large jobs that will take between minutes and hours to run. It is managed by Sun Grid Engine (SGE) which schedules jobs from the WPS layer. The offline processing layer limits access on a one job-per-user basis. 
     139 
     140=== Managing state: databases and common disk === 
     141 
     142The WPS and UI layers require interaction with the WPS-db in order to manage jobs and requests. The postgres db is currently running on {{{bora.badc.rl.ac.uk}}} and is accessed from all instances of the WPS and UI. 
     143 
     144Outputs are currently written to a cache disk but we will move over to using the "/requests" area soon. This is mounted across all WPS and Offline processing layers so that they can all be accessed. 
     145 
     146== WPS "Processes" == 
     147 
     148=== The concept of "processes" === 
     149 
     150A "WPS process" translates to a piece of callable code that returns something. In the past we might have developed a "service" to do this, with the WPS we can think of "adding the process" to the CEDA-WPS. 
     151 
     152Each process is defined within the WPS layer with two files: 
     153 
     154 * a process configuration file 
     155 * a python module 
     156 
     157These files can be generated from templates using a simple script. The example below shows how this can be done. 
     158 
     159=== Adding a new process: example === 
     160 
     161The following code needs to be run to add a simple process 
    32162 
    33163{{{ 
    34 cwps@ceda-wps1:/usr/local/cwps/cows_wps> ./create_process.sh just_a_demo JustADemo 
     164$ ./create_process.sh just_a_demo JustADemo 
    35165Wrote new process module: process_modules/just_a_demo.py 
    36166Wrote new process config: process_configs/JustADemo.ini 
    37167 
    38168# Re-start the WSGI app (to pick up the changes) 
    39 cwps@ceda-wps1:/usr/local/cwps/cows_wps> touch /usr/local/apache2/wsgi_scripts/cows_wps.wsgi 
     169$ touch /usr/local/apache2/wsgi_scripts/cows_wps.wsgi 
    40170}}} 
    41171 
    42172See the [wiki:CEDAWPS/Overview/JustADemo Just A Demo walk-through page] for how the UI would present this. 
    43173 
    44 == Integrating with NDG security == 
     174== The CEDA-WPS: Deployment issues == 
     175 
     176=== How do we deploy the WPS? === 
     177 
     178Our WPS is deployed on VMs with NDG-security provided as a middleware layer that processes each URL to decide if access should be secured.  
     179 
     180'''Hardware''' 
     181 
     182At present we have two VMs running on {{{kona.badc.rl.ac.uk}}}, these are: 
     183 
     184 * {{{ceda-wps1.badc.rl.ac.uk}}} - which runs the WPS and UI layers in one Pylons app 
     185 * {{{ceda-batch1.badc.rl.ac.uk}}} - which runs the offling processing layer 
     186 
     187We are currently developing the second set of VMs on {{{hurricane.badc.rl.ac.uk}}}. 
     188 
     189'''Re-starting''' 
     190 
     191The WSGI app is re-started simply by touching the WSGI file: 
     192 
     193{{{ 
     194$ touch /usr/local/apache2/wsgi_scripts/cows_wps.wsgi 
     195}}} 
     196 
     197=== Integration with NDG security === 
     198 
     199The WPS is secured by NDG security as documented on the [wiki:???? securing WPS page]. 
     200 
     201The {{{policy.xml}}} file allows per-process access to files. 
     202 
     203Access to outputs also needs to be restricted. This is done by ...TBD 
    45204 
    46205=== Giving processes access to the archive === 
    47206 
    48 === The policy.xml file === 
    49  
    50 === Securing outputs === 
    51  
    52 == Scalability == 
    53  
    54 == The test environment == 
    55  
     207In general our processes require read access to the CEDA archives. The "cwps" user has been created and is currently used to run the WPS (and entire Apache service). This user has read access to everything under "/badc".  
     208 
     209=== The test environment === 
     210 
     211We are currently working out how best to deploy the test/development environment to avoid degradation of the operational service. A major issue is that the entire environment needs to be set up the same as the deployment environment to allow processes to connect to datasets, security to work, etc etc,. 
     212 
     213=== Deployed uses of the WPS === 
     214 
     215'''UKCP09''' 
     216 
     217The UK Climate Projections (UKCP09) User Interface sits in front of a WPS which is employed to serve up the following processes: 
     218 
     219 * generation of plots (line graphs, maps, contour plots, box-and-whisker plots) - synchronously 
     220 * generation of data products (CSV and NetCDF outputs) - sync and async 
     221 * scheduling large Weather Generator model runs - async 
     222 
     223It also informs the UI about: 
     224 
     225 * previous and current jobs related to each user 
     226 
     227'''!MashMyData''' 
     228 
     229In the NERC-funded??? !MashMyData project we are collaborating with the Reading eScience Centre??? to embed Java processes under the CEDA-WPS that will be called from the externally hosted !MashMyData processing engine. 
     230 
     231== The future == 
     232 
     233 1. How to define ''dynamic possible values'' for WPS inputs? E.g. the possible values for the "County" parameter are found at: {{{http://uk-county.co.uk/services?GetCountyList=England::xml://Response/Results/Counties}}} 
     234 
     235 2. How to define ''dynamic parameter selection'' for WPS inputs? E.g. the possible parameters that need selecting for the "SubsetData" process are provided by: {{{http://data.data.org.uk/serv/data?dataset=${Dataset}::xml://Response/Results/Domain/.*}}} 
     236