From SMW+ A Semantic Web Enterprise Wiki
|
This document describes the different features of the Query Result Cache (QRC), including a description of the heuristics implemented to invalidate cache entries and strategies to update cache entries. A second part of this document will provide an administrator's guide for installing and configuring QRC.
| User manual information | |
|---|---|
Applies to version: Help:Halo extension 1.5.3,Help:Halo extension 1.5.6,
| |
Note: Use the tree-view on the right hand side to navigate in the documentation
| |
Help out!
List of issues for this article: [open in bugzilla] |
Contents |
Introduction
The Query Results Cache (QRC) is a component of the SMWHalo extension. Its goal is to improve the response time of article views, while at the same time trying to display the most current query results to the user. QRC achieves this goal by caching the results of inline queries, which are formulated via the #ask or the #sparql parser functions. It then monitors the activities in the wiki to determine whether the cache for certain queries is still valid or if they should be recomputed.
Features Overview
The implementation of the QRC is split into two different components. The main features are implemented as a MediaWiki extension (the QRC is part of the SMWHalo extension). The Update Triggerer is implemented as a Java-based application, which is deployed together with the Triple Store Connector (TSC). We decided to split the features into two in order to allow the Update Triggerer to work as a background process independent of the activities of the wiki.
Nevertheless, using QRC without a TSC (and thus without the Java-based Update Triggerer) is possible. In such a setting users will benefit from improved response times when viewing article with queries. Cache entries, though, will only be updated, when a user edits or purges an article (this is equivalent to the situation where QRC is not installed) or if the Cache Invalidator determines that a query cache entry becomes invalid. The Java-based Update Triggerer is required, if the cache entries should be updated continuously, so that users will be presented with the most current query results as possible.
The features of the QRC, which we will describe in this section, are displayed in the UML component diagram in Figure 1 below.
Cache Core
The Cache Core is implemented on top of SMW's Storage Layer and it is transparent to all other MediaWiki features, which use SMW's Storage Layer for storing or querying semantic data. A MySQL database back-end is used for storing cached query results.
Queries, which are sent to the Storage Layer due to article edit or article purge actions, are always processed by SMW or the Triple Store and they are never answered by the cache. But the results of the query processing will be stored in the cache. Queries, which are sent to the cache because of article view actions, are answered by the cache, if an appropriate cache entry exists. A cache entry is appropriate if:
- The cache entry exists and if it has not yet been marked as invalid.
- The cache entry exists, it has been marked as invalid and if the QRC option $showInvalidatedCacheEntries is set to true. This option can be configured by an administrator depending on whether timely query results or improved response times are more important in the Wiki environment.
Query Analyser
The Query Analyser is called by the Cache Core, if it receives a new query. The Query Analyser can detect, which properties and categories are used in the query. This information is stored as query metadata in SMW's Storage Layer as hidden annotations of the article, in which the query is contained.
Cache Invalidator
The Cache Invalidator is called when the semantic data of articles is send to SMW's Storage Layer (due to an article edit or create action). It determines, which property and category annotations have been added, removed or modified in the new version of the article and then invalidates the cache entries of all queries, that make use of those categories and properties.
Please note, that currently the Cache Invalidator does not do any inferencing and thus can only be seen as a heuristic component. For example, if a query asks for all instances of the category Person, and a new article about an Employee (a subclass of Person) is created, then the query will not be invalidated. Please also note, that no cache invalidation is done, if triples are modified directly in the Triple Store and not within the Wiki through one of Semantic MediaWiki's features.
An enabled parser cache can prevent that current QRC entries are displayed if a user views an article, since the parser cache will cache the results of parser functions like #ask and #sparql. If the QRC is configured to use cache entries, only if they are not invalidated, then an Administrator can configure the QRC to invalidate the Parser Cache of an article (option $invalidateParserCache), so that the article will be parsed again completely and so that the most current query cache entry is displayed.
Please note, that invalidating the parser cache can also lead to performance drawbacks. Imagine for example an article that makes use of several complex parser functions and one simple query. Invalidating the parser cache of that article results in reparsing all the complex parser functions if the article is viewed by a user.
Cache Updator
The Cache Updator is triggered by the Java-based Update Triggerer. It provides an API that offers two methods. The first method, named getQueries, enables the Update Triggerer to ask for a list of queries, which should be updated during the next update cycle. The second method, named updateQuery, then enables the Update Triggerer to actually update the cached results of that query.
The list of queries that are returned for getQueries is created based on a formula to determine the Update Priority (UP). This heuristic formula is based on a number of characteristics:
- Cache entry age (CA): The CA value of a query is the difference between the current timestamp and the timestamp, when the query was last updated due to an article edit or purge action or due to the Query Updator.
- Access frequency (AF): The AF value of a query is increased by one, whenever an article that contains the query is accessed by a user via a view, edit or purge action. The Administrator can decide whether to reset a query's AF to zero or to lower it via a configurable parameter, when the query's cache entry is updated by the Cache Updator. This makes sense in order to slowly lower the priority of a query, which has been used often in the past, but which is now used rarely.
- Invalidation frequeny (IF): The IF of a query is increased by one, whenever its cache entry is invalidated by the Cache Invalidator. There is also a configurable parameter available to decrease the IF value, if the query is updated by the Query Updator.
- Invalidated (I): I denotes, whether a cache entry has been invalidated or not, while '1' denotes an invalid entry and '0' denotes a valid cache entry.
- Cache entry age weight (CAW): This is a configurable parameter that can be used by an Administrator to set the importance of the CA value, when computing a query's Update Priority.
- Access frequency weight (AFW): This is a configurable parameter that can be used by an Administrator to set the importance of the AF value, when computing a query's Update Priority.
- Invalidation frequency weight (IFW): This is a configurable parameter that can be used by an Administrator to set the importance of the IF value, when computing a query's Update Priority.
- Invalidated weight (IW): This configurable parameter enables an Administrator to prioritize cache entries, which have been invalidated.
The formula for computing a query's Update Priority is then like follows:
UP = CA*CAW + AF*AFW + IF*IFW + I*IW
The Query Updater will return the top X queries, for which the highest value for the Update Priority has been computed.
The Query Updator updates a cache entry, if it receives the ID of a query via its second API method. It then will execute the query and store its result in the cache. If configured by the administrator, the parser cache entry of affected articles will be invalidated as described above.
Update Triggerer
The Java-based Update Triggerer is deployed together with the Triple Store Connector. It continuously asks the Query Updator every QU.RequestInterval minutes for a list of QU.RequestLimit queries. It then sends the queries one after the other back to the Query Updator in order to update the queries, before it then asks for the next queries to update.
The Java-based Update Triggerer is required, because a long-running process (which is required for the cache updates) cannot be reliably be executed by PHP in the background.
Administrators's Guide
The administrator's guide describes how to install and configure the Query Results Cache. Installation and configuration of the Mediawiki and the Java parts are described separately. Please note, that the Query Results Cache also works without the Java part as described above, although the cyclic query cache updates feature then would be missing.
Installation
MediaWiki Side
- The Query Results Cache is part of the Halo Extension (since Version 1.5.1). So please first install the Halo Extension.
- Edit your LocalSettings.php and add enableQueryResultsCache(); after the Halo Extension has been initialized.
- Run the Halo Extension's database setup script, which is located in extensions/SMWHalo(maintenance/SMW_setup.php.
Java Application
- The Update Triggerer is delivered together with TSC 1.5.1 (in both versions basic and professional)
- Windows starter: qupdator.exe
- Linux starter: qupdator.sh
- no parameters are required (see configuration below)
Configuration
MediaWiki Configuration
The QRC can be configured in /extensions/SMWHalo/includes/QueryResultsCache/QRC_Settings.php.
<?php /* * Weight the cache entry's age, when computing a query's update priority * * The cache entry's age in seconds * $cacheEntryAgeWeight will be added to the query's update priority */ $cacheEntryAgeWeight = 1; /* * Weight the access frequency of a cache entry, when computing a query's update priority * * The cache entry's accress frequency * $accessFrequencyWeight will be added to the query's update priority */ $accessFrequencyWeight = 60*60; /* * Weight the unvalidation frequency of a cache entry, when computing a query's update priority * * The cache entry's invalidation frequency * $invalidationFrequencyWeight will be added to the query's update priority */ $invalidationFrequencyWeight = 60*60; /* * Weight invalid cache entry's, when computing a query's update priority * * The cache entry's invalidation status (1 or 0) * $invalidWeight will be added to the query's update priority */ $invalidWeight = 60*60*10; /* * Decrease a cache entry's access frequency when the cache entry is updated by the Cache Updator * * The cache entry's new access frequency will be set to old access frequency * $accessFrequencyAgingFactor */ $accessFrequencyAgingFactor = 0.9; /* * Decrease a cache entry's invalidation frequency when the cache entry is updated by the Cache Updator * * The cache entry's new invalidation frequency will be set to old invalidation frequency * $invalidationFrequencyAgingFactor */ $invalidationFrequencyAgingFactor = 0.9; /* * Denotes whether cache entries will be displayed although they are invalidated. * * Choose between performance (false) and accuracy (true). */ $showInvalidatedCacheEntries = true; /* * Denotes whether the Parser Cache of an article will be invalidated, if a cache entry for a query in this * article has been updated or invalidated. * * Choose between performance (false) and accuracy (true). */ $invalidateParserCache = true;
Java Configuration
The config file is located in {$tsc-dir}/config/qupdator.properties
# # Required options # WikiHost=http://localhost/mediawiki # # Basic HTTP authentication (optional) # #WikiUser= #WikiPassword= # # Periodic updator options # # QU.RequestInterval denotes the update interval between 2 update requests. (in minutes) QU.RequestInterval=10 # QU.RequestDelay denotes the pause between 2 subsequent update operations (in miliseconds) QU.RequestDelay=0 # QU.RequestLimit denotes the number of query cache entries at one request. QU.RequestLimit=30
| Static facts | Derived facts | ||||||||||||||||||||
Facts about Description of the Query result cacheRDF feed
| |||||||||||||||||||||




