Changeset 4495 for exist/trunk


Ignore:
Timestamp:
26/11/08 15:14:27 (11 years ago)
Author:
cbyrom
Message:

Store cache of validated urls and vocab terms in validator - to avoid the need to do multiple lookups of the same data + simplify the term validation by creating a re-usable method for both the category and links data.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • exist/trunk/python/ndgUtils/lib/atomvalidator.py

    r4494 r4495  
    6868        self._nl = newLineChar 
    6969         
     70        # collections to effectively cache positive results - to avoid multiple 
     71        # (time consuming) lookups of the same data 
     72        self._validLinks = [] 
     73        self._validVocabTerms = [] 
     74         
    7075        # set up connection to eXist and postgres DBs 
    7176        if dbConfigFile: 
     
    219224            if link.hasValue(): 
    220225                try: 
     226                    # don't lookup link, if it has already been validated before 
     227                    if link.href in self._validLinks: 
     228                        continue 
     229                     
    221230                    if not simpleURLCheck(link.href): 
    222231                        self.__addError(self.BROKEN_LINKS, "Broken link: '%s'" %link.href) 
     232                    else: 
     233                        self._validLinks.append(link.href) 
     234                         
    223235                except Exception, e: 
    224236                    self.__addError(self.BROKEN_LINKS, e.message) 
     
    233245        logging.info("Validating atom vocab data") 
    234246        for category in self._atom.parameters: 
    235             if not isValidTermURI(category.scheme): 
    236                 self.__addError(self.INVALID_VOCAB_TERM, \ 
    237                                 "Invalid vocab term: '%s'" %category.scheme) 
     247            self.__validateTermURL(category.scheme) 
    238248 
    239249        # also check the terms used in the links 
    240250        for link in self._atom.relatedLinks: 
    241251            if link.hasValue(): 
    242                 if link.rel not in self.VALID_RELS: 
    243                     if not isValidTermURI(link.rel): 
    244                         self.__addError(self.INVALID_VOCAB_TERM, \ 
    245                                         "Invalid vocab term: '%s'" %link.rel) 
     252                self.__validateTermURL(link.rel) 
    246253        logging.info("Completed link validation") 
     254 
     255 
     256    def __validateTermURL(self, url): 
     257        ''' 
     258        Check the specified vocab url - and add any encountered errors 
     259        to the global error collection.  Also add any validated urls 
     260        to the global valid term collection. 
     261        @param url: url string representing a vocab term 
     262        ''' 
     263        # don't lookup link, if it has already been validated before 
     264        if url in self._validVocabTerms or url in self.VALID_RELS: 
     265            logging.info("- term is valid") 
     266            return 
     267         
     268        if not isValidTermURI(url): 
     269            logging.info("- term is invalid") 
     270            self.__addError(self.INVALID_VOCAB_TERM, \ 
     271                            "Invalid vocab term: '%s'" %url) 
     272        else: 
     273            logging.info("- term is valid") 
     274            self._validVocabTerms.append(url) 
    247275         
    248276 
Note: See TracChangeset for help on using the changeset viewer.