wiki:NDGBrowseHowTo

Version 18 (modified by lawrence, 12 years ago) (diff)

A new egg dependency for browse

How to Install the NDG Browse Code

[This page is NOT FINISHED ... and is not a real guide to do anything YET, further, I *KNOW* this doesn't work as of January 25th]

(Intended for *full* NDG data providers: that is, data providers who *do* have their own browse repository of MOLES documents.)

Note that the new browse and discovery code is integrated using paste, so you will need to ensure you have all the paste code installed in your python. In the following instructions, I'll assume you have access to your python, and that it exists at /your/path/bin/python. You may not need all these steps depending on the state of your system, and you may have needed to ensure that you have a http proxy variable set! Note also that much of this will be vastly simplified when we eggify the installation!

  1. Setup fastcgi.

On RHEL4, this just involved downloading the fastcgi module from  http://www.fastcgi.com/, editing Makefile.AP2 (top_dir should be 'usr/lib/httpd'), renaming to Makefile and doing make ; make install

Your apache configuration should have something like this in it:

<IfModule mod_fastcgi.c>
  Alias /retrieve "/var/www/fastcgi/ndg.fcgi/retrieve"
  Alias /browse "/var/www/fastcgi/ndg.fcgi/browse"
  Alias /discovery "/var/www/fastcgi/ndg.fcgi/discovery"
  Alias /layout "/var/www/fastcgi/ndg.fcgi/layout"
  <Directory /var/www/fastcgi>
    Options +ExecCGI
    SetHandler fastcgi-script
    Order allow,deny
    Allow from all
  </Directory>
  # following socket ought to be setup by paste via ndg.ini (bnl)
  FastCgiExternalServer /var/www/fastcgi/ndg.fcgi -socket /tmp/ndg_fastcgi.soc
</IfModule>

Note the /var/www/fastcgi (or wherever) directory must exist but /var/www/fastcgi/ndg.fcgi must not.

Also note that /tmp might be a risky place to keep the socket file if your /tmp directory is regularly cleaned up (e.g. by tmpwatch)

  1. Make sure the pieces we need from paste are installed:
    /your/path/bin/easy_install paste
    /your/path/bin/easy_install PasteDeploy
    /your/path/bin/easy_install PasteScript
    /your/path/bin/easy_install flup
    /your/path/bin/easy_install wsgiutils
    /your/path/bin/easy_install wsgilog
    
  2. If your python is 2.4 or less (it comes as standard in 2.5), you'll need elementtree:
    /your/path/bin/easy_install elementtree
    
  3. And now for the big one, we need a recent version of ZSI. I've used the default today, which pulled ZSI-2.0RC3. Watch out for this one because a later one might break. Let me know how this goes!:
    /your/path/bin/easy_install 
    
  4. Get the contents of TI07-MOLES/trunk/PythonCode/wsgi and put them into a suitable directory on your webserver, and change the ownership of the directory to apache (or whatever account you run your website under). This can most easily be done with the following command (which will give you a subversion working copy):
    svn co http://glue.badc.rl.ac.uk:/ndgsvn/TI07-MOLES/trunk/PythonCode/wsgi
    
  5. cd into that directory!
  6. Modify 00deploy.sh so that MYPYTHONBIN points to your python:
    MYPYTHONBIN=/your/path/bin/
    
  7. Modify the contents of ndg.ini so that the socket matches the description in your fastcgi setup, and it has the right configDir in the [DEFAULT] section. Here, for example, is what is running on glue (where the wsgi code is directory /var/www/ndg):
    [server:main]
    #use = egg:PasteScript#wsgiutils
    #host = localhost.localdomain
    #port = 8001
    use = egg:PasteScript#flup_fcgi_thread
    socket = /tmp/ndg_fastcgi.soc
    
    [DEFAULT]
    configDir = /var/www/ndg/
    
  8. Create a passwords.txt file and make sure it exists in the directory from which you run the deploy script (and check that you can't get access to that directory via the web!). It is for allowing the browse code access to your exist repository and should have the following format:
    existhost.your.domain.ac.uk exist_access_userid password
    
  9. You will need to modify the ndgDiscovery.config file as well. You should only need to modify the entries in the default and layout sections.
  10. At this point you can check the underlying transport works, try
    python ndgSearch.py
    
    1. It should run some unittests (at the moment one might fail, but some should succeed), if not, then you have a problem with the underlying transport:
    2. You may find that you have a firewall problem, if you need to, you should set an http_proxy variable to get through, eg:
      http_proxy=http://wwwcache.rl.ac.uk:8080;export http_proxy
      
  11. You may need to make changes to 001deploy.sh (in particular the path to your python, and potentially you may not be running apache under the user apache).
  12. Make sure that the directory that holds all this code provides read/write access to the user apache (or whatever user your webserver runs under).
  13. Now you can try and run the service:
    ./001deploy.sh
    
  14. At the moment one needs to restart the process by hand after a system reboot.

Troubleshooting

The last step should produce a log file in your directory, and a file called paster.pid. The latter has the pid of the process. If it doesn't exist, something has gone wrong with your deployment, check the log files in your directory. If they don't exist, suspect that your webserver either has failed to fire up fastcgi or that apache can't write into your directory!

If you have trouble setting up wsgi and fastcgi, see if you can get the wsgiEnvTest code working. You will need to modify wsgiEnvTest.sh for your python path, possibly your web server user and modify wsgiEnvTest.ini for your wsgi server environment, but you should then be able to point your browser at  http://yourhost/browse, and see the wsgi environment variables!

SELinux

A common problem on modern distributions is running into SELinux. SELinux is a set of kernel-level hooks that allow system policy to be very specific in controlling the behavior of applications - it's commonly used to harden externally accessible services by ensuring that if they are compromised, the harm they can do is strictly limited to the parts of the filesystem (and access to other services) that they're allowed to access by the policy. In practice, this will manifest as bizarre-seeming denials of permission to do things, despite the standard unix permissions appearing to be correct. You can normally check for this more specifically with something like dmesg, which will list errors like:

audit(1177519136.261:269): avc:  denied  { getattr } for  pid=22315 comm="httpd" name="www" dev=sda5 ino=7554543 scontext=root:system_r:httpd_t tcontext=root:object_r:var_t tclass=lnk_file

(note httpd in the above)

Not all distributions will report in the same way - RHEL4 dumps errors to /var/log/messages and console (dmesg), Fedora Core 6 hides them away in /var/log/audit (maybe needs an additional audit daemon running too, not sure).

A typical response to SELinux is to turn it off (or switch to permissive rather than enforcing mode - this just warns but doesn't block). The "correct" fixes are either to arrange things to match the expectations of your policy (e.g. put webserver files in specific locations, marked with correct security contexts) or, if necessary, to write additional policy rules within the framework your distribution has provided that allow the specific actions you wish to do.

To test if SELinux is definitely enabled, run sestatus (as root) or cat /selinux/enforce (1 = active and enforcing). To disable it is distribution specific (RHEL & FC: run system-config-securitylevel, go to SELinux tab). To find out what you should be doing re: system policy is also system specific, but some good documents are:

(mggr) Speaking as someone with a fair degree of SELinux experience, it's pretty hard going ;) This is something we should probably resolve at a distribution level and include instructions as needed (possibly just saying "mail us if you figure it out ;)").