Changes between Version 9 and Version 10 of NDGRoom101Meeting


Ignore:
Timestamp:
03/09/08 14:50:31 (11 years ago)
Author:
mpritcha
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • NDGRoom101Meeting

    v9 v10  
    128128MOLES : There seemed to be someone in each institution trying to use it. But many people were trying to understand DIFs, let alone MOLES (some "fear" of it), even worse with CSML. Perhaps should have implemented (& made deployable) early on to get people on board [''...by showing what benefits were obtained rather than what overhead it entailed?''] 
    129129 
     130=== Logging === 
     131 * Need more logging built in to code that we write 
     132 * This was one module that we always planned to build but never did 
     133 * If we use a good logging framework (e.g. Log4J, or python equivalent) can send stuff down logging pipe (labelled appropriately) without having to about where it ends up. External configuration then decides which level of messages go where. Much better solution. 
     134 * For mature services, need to have monitoring systems in place that are aware of error & warning messages that get generated, perhaps linked to some alert handler. 
     135 
     136=== Testing === 
     137 * Decent test environment really important for good development 
     138   * Feasible to set up traffic-light system to alert when things not OK. 
     139   * NOSE ''?'' similar to how done in JUnit. Simple server available 
     140     * Module-level tests could be run overnight 
     141 * Setting this up is not really possible until we have unit tests built into code 
     142 * Responsibility of coders needs to extend beyond simply writing code. 
     143 * Unit tests tell you (6 months down the line): 
     144   * Whether it's working 
     145   * What you were trying to achieve 
     146 * Writing tests forces you to write code in a way that's testable  
     147   * ''e.g. where 1 component does one thing only, ...and can be demonstrably good at it'' 
     148 
    130149== Review of components == 
    131150=== COWS === 
     
    162181   * There always seemed to be 1 thing in the jigsaw that was in such a state of flux as to prevent the whole from working at any time. 
    163182   * We have to think about deploying earlier, accepting that some bugs need to be left, to be fixed in a later version. 
     183   * Need more info (from Ag?) about progress with GeoServer at UKMO : should pursue this. 
    164184 
    165185=== CSML === 
     
    172192==== Weaknesses ==== 
    173193 * Hasn't really been implemented ''(deployed?)'' yet. Working in prototype but no effort on part of data scientists. All the bits are there but...  
    174  
    175 (Stephen) Not convinced that Feature Type approach is the way forward. See GML : simple features. How is this ever going to join up with Feature Types? This is what INSPIRE is mandating... 
    176  
    177 (Bryan) Problem with CSML historically : nothing changes except through Andrew Woolf 
    178  
    179 CSML was designed to have different I/O layers & be lossy. It would never be as good as reading netCDF directly, '''but''' could read netCDF & PNG files at the same time. 
    180  
    181 HDF Example : It would be great to plot Grape + ERA40 data on same plot : that would be something really new ''i.e. an easy win for demonstrating CSML's capability''. 
    182  
    183 (Bryan) We have the middleware layer, but now need to get the benefit of it. We should make a big effort to get ~50 datasets working with CSML (albeit if all 150 is impossible for now). 
    184  
    185 Granulite concept should help in trying to deploy some benefits (e.g. getting parameters into MOLES records), rather than all aspects (e.g. visualisation). 
     194 * (Stephen) Not convinced that Feature Type approach is the way forward. See GML : simple features. How is this ever going to join up with Feature Types? This is what INSPIRE is mandating... 
     195 * (Bryan) Problem with CSML historically : nothing changes except through Andrew Woolf 
     196 * CSML was designed to have different I/O layers & be lossy. It would never be as good as reading netCDF directly, '''but''' could read netCDF & PNG files at the same time. 
     197 * HDF Example : It would be great to plot Grape + ERA40 data on same plot : that would be something really new ''i.e. an easy win for demonstrating CSML's capability''. 
     198 * (Bryan) We have the middleware layer, but now need to get the benefit of it. We should make a big effort to get ~50 datasets working with CSML (albeit if all 150 is impossible for now). 
     199 * Granulite concept should help in trying to deploy some benefits (e.g. getting parameters into MOLES records), rather than all aspects (e.g. visualisation). 
    186200 
    187201=== MOLES === 
     
    198212 
    199213==== Lessons ==== 
    200 (NDG team) No point producing schema unless accompanied by lightweight tool to help users populate example instance documents. 
    201  
    202 (Data centres) Need to define tools early on that satisfy the requirements of the data centre in terms of populating metadata records. 
    203  
     214 * (NDG team) No point producing schema unless accompanied by lightweight tool to help users populate example instance documents. 
     215 * (Data centres) Need to define tools early on that satisfy the requirements of the data centre in terms of populating metadata records. 
     216 
     217=== Discovery === 
     218==== Overview ==== 
     219(Could do with presentation from Calum (& Steve?) about how new Discovery service works) 
     220Provides search facility against metadata records harvested via OAI from data providers 
     221 
     222==== Strengths/Successes ==== 
     223 * Used successfully in NERC Portals project and by MDIP as well as by NERC Data Discovery Service. 
     224 * Despite delays in making it "operational", now provides useful service to NERC 
     225 * Metadata subgroup now formed, to talk about these issues, so Data Centres are now interested and working together on these. 
     226 * DMAG now like it. Could do with more usage stats, but satisfies FOI requirement for way of finding out what information an organisation holds. 
     227 
     228==== Weaknesses ==== 
     229 * Performance. This has been addressed in Calum's re-write, largely by transforming to all required export formats on ingest rather than on-the-fly during Present operation of web service. 
     230 * Could do with imcluding context of hit with result (i.e. why was this document a hit?), plus returning more than just the document id (e.g. abstract or "summary Present")  
     231 * SOAP : yesterday's technology that we got stuck with. 
     232 * Looking at supporting OpenSearch, OGC WCS, etc. Hopefully new revision by Calum should make adaptation to provide these interfaces easier. 
     233 
     234==== Lessons ==== 
     235 * Bryan should never be in the critical path of any development work! 
     236 * Even if NERC DDS isn't creating huge usage stats, we should make sure that the Data Centres (esp NEODC, BADC) actually use Discovery as their own search tool on their public-facing website(s). '''Should do this now'''. 
     237 
     238=== Security === 
     239==== Overview ==== 
     240 Key concept = role mapping 
     241 
     242==== Strengths / Successes ==== 
     243 * Security showed early on that it was possible to build quite sophisticated system based on Web Services. 
     244 * Things have come a long way, in particular, good progress has been made with OpenId in Java & Python. 
     245 
     246==== Weaknesses ==== 
     247 * Too tied to what was available at the time (Globus, Proxy, certificates etc) 
     248 * Problems of integrating this with normal user/password system used by Data Centres. In fact this was overkill compared to what was needed. 
     249 * Went down a blind alley with MyProxy (good tool, but made things too complicated in this context) 
     250 * Personal User Certificates : didn't need these (can do same job by asserting someone's identity & using something like SAML) 
     251 * Single sign on : wasn't a big requirement at the start but ended up spending lots of time on it. 
     252 
     253==== Technology ==== 
     254 * Lots of immature tooling hence lots of time wasted trying to get things to work. 
     255 * WS-Security : spent too much time on this. In the end people voted with their feet & just used SSL instead. Huge problems of implementation even between different Java toolkits. 
     256 * WSGI is a really important tool for layering services & for encouraging modular developement 
     257 * Pylons : good for security but perhaps more difficult to pull other things out of it. 
     258 * OGC side : big challenges remain. People are still breeding security solutions and having headaches getting them to be interoperable. 
     259 * GEO-DRM experiment : largely SOAP-based. Didn't want to hinder OGC protocol, but at same time, the only emerging technology out there was SOAP. 
     260 * There still isn't an established technology for HTTP-based security 
     261   * Big corporations using SOAP (...works) 
     262   * "Rebels" using RESTful systems ''...?'' 
     263 
     264COWS context :  
     265 * Got stuck in security but had to move to OGC-style services 
     266 * Came up with workable solution. 
     267 * Could in theory create a WSGI layer for security (& Stephen wouldn't need to know too much about what was inside it) 
     268 * Still need database with record of what resource has what security policy 
     269 * Phil would have liked more time to look at SAcML or GEOSAcML (e.g. enabling restriction by bounding box) 
     270 * One thing we did wrong was to assume that access control would be in the MOLES docs. 
     271   * Much better if this information is in some out-of-band database (FTP needs to talk to it, too) 
     272     * Could easily write plugin for ProFTPd to do this ...should look at this for NEODC/BADC now 
     273 * Shibboleth is going to be key as is OpenId 
     274   * Is Shibboleth a "winner" technology that we should have anticipated? Doesn't meet NDG requirements but does meet those of BADC. 
     275 
     276=== Vocab Server === 
     277==== Overview ==== 
     278Enables query by term, responds with "narrower than" terms that are useful for searching. 
     279Place to maintain lists of vocab terms : essential part of building domain vocabs (and ontologies) 
     280==== Strengths / Successes ==== 
     281 * Key piece of NDG, one of its only operational components. In use in the discovery service. 
     282 * Stimlated lots of interest 
     283 * This was the one part of the jigsaw driven by BODC (& it works well) 
     284==== Weaknesses ==== 
     285 * BODC lost some key staff & found it difficult to spend small amounts of money on required development. 
     286   * Future progress possibly difficult. Maybe Steve D can take on some of this work? 
     287 * Breaks if query contains 2 terms (but otherwise very easy to use) 
     288==== Lessons learned ==== 
     289* Roy shouldn't be in the critical path of a project, either! 
     290* Andrew's group is part of CEDA in wider sense, but need to interact with them as much as possible. We now know what their skill (& weaknesses) are. They are important contacts, even if we don't always agree on things. 
     291 
     292== Going forward == 
     293=== NDG MSI === 
     294Aim : to get across the message of data modelling to those parts of NERC that haven't been exposed to it before. 
     295 
     296Note that application schema (e.g. CSML) is '''our''' viewpoint of our data, i.e. corresponding with our application. 
     297 
     298==== Activities ==== 
     299 * New MOLES (v3) atom serialization 
     300   * addressing "when" for the first time (Simon Cox to be brought in for this & offer up result to standards body). He's a Earth Scientist so hopefully should get buy-in to results from e.g. BGS. 
     301 * National Capability 
     302   * Workshops about understanding of Data Modelling 
     303 * Security 
     304 * COWS 
     305 * Discovery Service 
     306   * Improve GUI, implelement OpenSearch, Inspire requirements, improve spatial searching (e.g. proximity ranking) 
     307 * Vocab service 
     308 
     309MOLES v2+v3 activities both supported. By end of FY should have v3. 
     310 
     311Bid has now been funded. Intend to get James Doughty to organise work, workshops, keep things on track. 
     312 
     313Not enough money to fix all flaws with COWS... 
     314 
     315(Stephen) At its core, it is a library to help in creating OGC services. Code needs maturing so hope MSI will find some time for this. 
     316 
     317(General comment) : So far, have tended to develop new things on a code branch (because it's expedient to do so). Need to move to situation where expediency means developing on the trunk rather than a branch. 
     318 
     319