From SMW+ A Semantic Web Enterprise Wiki
|
SMW+ is shipped with caches to minimize latency. Nevertheless, it is advisable that the Wiki-admin is aware about the configurations of the various caches such that he can optimize the configuration in case he determines a degradation of services of his Wiki. This article explains the caches SMW+ is shipped with and how to install a squid on top of it.
If you are using an earlier version, please go to the product documentation SMW+ and select the relevant version.
| User manual information | |
|---|---|
Applies to version: Help:SMW+ 1.5.3,
| |
Note: Use the tree-view on the right hand side to navigate in the documentation
| |
Help out!
You feel that the documentation needs a cleanup or misses information?
Report a quality issue |
Contents |
Speeding up the performance of a wiki using SMW and SMW+
This article describes how the performance of an installation of SMW+ can be measured and improved. It is the result of a performance evaluation of SMWform and some details are related to this special installation.
Measuring the performance of SMW+
You can measure the wiki performance with the Net Panel of Firebug. Several actions of the wiki should be tested.
Install Firebug for Firefox
You can measure the access times to a web page in Firefox with the plugin Firebug. It contains a net panel in the tab Net that gives detailed information about the load times of a web page and all of its pieces. Please read
- the Introduction to the Net Panel and
- the Net Panel Timings.
With this tool and knowing how to read its diagrams you are well equipped to measure the performance of your wiki.
Besides measuring the performance manually you can also automate this task, for example with Apache JMeter. However, this is far beyond the scope of this article.
Testing different actions in the wiki
The wiki can do several things for you. You can read articles, edit and save them, log in and out, work interactively with toolbars that use ajax calls, etc.
Besides the detailed waterfall graph, the Net Panel shows the total time for loading a page at the bottom. When something like "5.13s (onload: 3.45s)" is reported, the first given time is what is interesting for the user. After this time the page is really complete and the user can work with it.
Here is an open ended list of actions you can test in your wiki:
- View a simple article with just some lines of text.
- View a large article with complex templates, images etc.
- View an article with semantic queries, both with and without a TSC connected.
- Open an article in wikitext edit mode.
- Open an article in WYSIWYG edit mode.
- Open an article in the Advanced Annotation Mode.
- Save an article in the different edit modes.
- ...
You can repeat these tests with different extensions enabled. Be aware that each extension will add an extra penalty to the response times. So keep your wiki as lean as possible.
You should measure each action you are interested in several times (e.g. 5 times) to get a meaningful average of access times. Normally the first time an action is requested the response is the slowest as the browser caches are filled. From then on the action should be much faster.
Usual response times
We tested a fresh installation of SMW+ 1.5.3 Community Edition on a dual core laptop that acts both as server and client in the following configuration:
- SMW+ 1.5.3 Community Edition
- Collaboration extension
- Triple Store (ontobroker)
- SOLR server for Faceted Search
The actions were performed on the Main Page and a small article with just a few lines of text.
Here is a table of response time you can expect for this configuration:
| Action | Response time in seconds |
|---|---|
| View a page | 2.0 - 3.0 |
| View a page with complex queries (with or without TSC) | 3.5 - 4.5 |
| Open an article in wikitext edit mode | 2 - 3.5 |
| Save an article in wikitext edit mode | 3 - 5 |
| Open an article in WYSIWYG edit mode | 3 - 5 |
| Save an article in WYSIWYG edit mode | 3 - 5 |
| Open an article in Advanced Annotation mode | 2.5 - 3.5 |
| Save annotations in Advanced Annotation mode | 1 - 2 |
Please be aware that these measurements were taken on a single machine i.e. the browser ran on the server. Of course, a slow internet connection will add delays.
Check your installation
Before we describe the various ways to optimize the performance of your system in the next sections you should make sure that your system is set up properly. If you experience access times that are by far slower than stated in the table above, you should check the following:
- Are the services runnning?
- You can configure that SMW+ uses a Triple Store or the SOLR server. If this is the case you must make sure that these services are really running. Otherwise the system has to wait for timeouts which will cause a much higher latency.
- Are there needless extensions?
- Every extension that is added to the system has a performance footprint. The more extensions are installed the slower they system will react. Get rid of extensions that you do not use.
Performance tuning documentation
Lots of information about accelerating wikis and web servers are available in the web. You should at least read the first article of the following list before reading on:
- Caching in MediaWiki:
- Basics for caching in MediaWiki: http://www.mediawiki.org/wiki/Manual:Cache
- Configuring MediaWiki's caches: http://www.mediawiki.org/wiki/Manual:Configuration_settings#Cache
- Apache:
- Google caching recommendations: http://code.google.com/intl/de-DE/speed/page-speed/docs/caching.html
- Background information about HTTP caching in general: http://www.mnot.net/cache_docs/
Caches
Several caches accelerate the wiki in the following order according to the flow of the request for an article:
- Squid
- Apache cache
- MediaWiki's parser cache
- Query result cache
We will explain these caches in the following sections.
Squid
Squid is a high-performance proxy server that can also be used as a HTTP accelerator for the web server. Squid will store a copy of the pages served by web server and the next time the same page is requested, Squid will serve the copy.
Squid is only effective when articles are requested by anonymous users. At least, this holds for the HTML part of an article. However, requests for articles usually consist of several files like stylesheets (css), scripts (js) and images. These can be provided by the squid as well (for logged-in users as well as anonymous users).
If Squid can serve articles the performance is drastically improved. In SMWforum the request for the article's body can be answered in under 100ms instead of a few seconds.
See Squid caching for installation instructions.
Apache cache
When a request can not be answered by Squid it is sent to Apache. Static files can be cached by Apache if the cache is turned on. See the section Web server settings for the configuration of Apache.
Parser Cache
The next cache in the chain is Mediawiki's parser cache. It stores the complete HTML that the parser has generated from the wiki text of an article and thus can speed up page access considerably. On complex pages this can save several seconds.
The parser cache has to be activated in LocalSettings.php. See Parser Cache.
However, in SMWforum the parser cache is disabled by several extensions. It has to be verified if this is really needed in the following cases, as the cache speeds up complex articles considerably.
- /extensions/HashingFunctions/HashingFunctions.php
- lines 50 and 54
- /extensions/ApplicationProgramming/URLArguments/URLArguments.php
- line 46
- Question: Is the parser function actually used in an article or can this extension be disabled?
- /extensions/HTMLets/HTMLets.php
- line 67
- needed for the banner on the main page but seems to work also if option nocache="yes" is not set.
- /extensions/NoTitle/notitle.php
- line 50
We recommend to find all extensions that disable the parser cache and evaluate if they do this for a valid reason.
Query result cache
The query result cache (QRC) was implemented by ontoprise. Its features and the installation process are described in query result cache. We will give a short overview of the benefits of this cache.
As the name indicates, the QRC caches the results of a query. The idea behind this is that queries can take a long time if they retrieve lots of data or if they are complex. In this case the QRC simply returns a previously calculated result which is then processed as usual by the query result printers. Queries that are processed quickly are barely accelerated by the QRC.
When is the query result cache not used?
The QRC is not used when the content of a page is recalculated. This happens when it is
- edited and saved, (To be precise, this depends on the circumstances under which the article is saved. If the parser cache is disabled or invalidated, an article is parsed twice when it is saved. During the first parse cycle, the QRC is not used, because users should get the most current query results if they edit an article. During the second parse cycle, the QRC is used, if the parser cache does not provide valid entries.)
- purged (e.g. with ?action=purge),
- returned from the parser cache,
- returned by Squid.
When is the query cache used and effective?
The QRC is used if all of the following conditions are met:
- There is no Squid, or Squid is bypassed because the user is logged in.
- The parser cache is empty or it is disabled in general or depending on the content of an article. In the latter case extensions can disable the parser cache temporarily. (However instead of relying on the QRC you should investigate which extensions disable the parser cache and if this is really mandatory. The parser cache accelerates the wiki more than the QRC.)
- A page contains queries and these have been processed at least once.
- Processing of the queries by the Triple Store or SMW store takes a long time. (Otherwise the effect of the QRC is hardly noticeable.)
- Example
- We profiled the article Help:SMW+ which contains 42 queries. 11 of them are answered by the triple store and 31 by the SMW store. The triple store needed 1.8 seconds and SMW 0.5 seconds when the QRC was empty. The second time the page was opened, the QRC was filled and all queries were answered in 0.5 seconds.
Other benefits of the QRC
Even if the QRC can not accelerate the wiki because of the restrictions explained above, there is another great benefit. It can keep all wiki articles that contain queries up to date, even if they are cached.
Normally the caching strategy fails if the result of a query changes as the cache is ignorant of this. Let's assume that article A contains a query for facts that are defined in article B. If these facts are modified, the cache of B will be purged as the system knows that B was modified by an edit operation. However, article A will be delivered in its outdated form as the cache does not know the dependency to B.
The QRC knows which properties affect the query result in an article and it can invalidate the page's cache when such a property is changed. In this case A will be updated after the facts in B were changed.
Browser cache
One of the most important caches is that of the user's browser. The body of a wiki article is never cached but all other files like stylesheets, scripts and images. The browser cache can only be applied if the HTTP headers of these files are set correctly by the server (see section Web server settings).
Furthermore the user should be aware of how to control the cache of his browser.
- Load a page
- If the user clicks a link to a wiki article, only the corresponding HTML files are loaded from the server. All other auxiliary files are fetched from the browser cache (if they are stored there). This is the fastest way of loading articles with only few requests.
- Refresh a page
- If the user refreshes a page (presses F5), the browser asks the server if any of the files the page consists of have changed. This can lead to a large number of requests which are answered by
304 Not Modified. In this case these files are fetched from the browser cache. However, the duration of these validity checks can be very long. With a fast internet connection this may be only a few milliseconds but with a slow one this can take several seconds (see screenshots).
- CTRL-refresh a page
- If the user triggers a complete refresh (presses CTRL+F5) the complete browser cache is purged and all files are reloaded from the server.
See Browser behavior for further information.
Improving PHP performance
This section describes how the performance of the PHP interpreter can be improved.
eAccelerator
The whole server side wiki software is programmed in PHP. For every request the main entry script index.php has to load up to several hundred PHP files which have to be compiled to byte code which can then be executed by the PHP interpreter. There are several PHP accelerator that cache the byte code and provide it quickly when needed by the interpreter.
eAccelerator is one of these accelerators. Find the installation instructions at
With this acceleration a typical request is processed about twice as fast.
Memcache
MediaWiki has its own caches similar to the parser cache, a message cache and more. The data to be cached can be stored in memory across different requests and even different servers with the store Memcached. Its installation is recommended for servers with a high load.
See Memcached for installation instructions.
For Windows exists a compiled binary version of Memcached ready to use. It can be downloaded at [1]. The easiest way to install Memcached is the following:
- Download the zip archive memcached-1.2.6-win32-bin.zip and extract it into an arbitrary directory
- Open a command line window and change to the directory where the memcached.exe was extracted to.
- run once "memcached.exe -d install" which will install the memcached server as a service.
- run then "memcached.exe -d start" to start the service
On the next reboot Memcached is started automatically.
The Memcached server appears in the windows list of all services and can be stopped and restarted there. To remove Memcached from your system:
- stop the Memcached server (via the Windows administration panel or on the command line with "memcached.exe -d stop")
- remove the service by running on the command line "memcached.exe -d uninstall"
- delete the executable
Apache configuration
Normally Apache is used as web server for MediaWiki. Some settings can improve the performance of Apache. If you change any settings of the Apache webserver or of the PHP installation you must restart the webserver so that the changes come into effect. On Linux this is done with/etc/init.d/apache2 restart. If you use Windows with the XAMPP installation go into the XAMPP installation directory and double click on xampp_restart.exe.
- Enable compression for Apache.
- All modern browsers can receive and process compressed files. Apache should be configured to compress them on demand. See: mod_deflate
- Enable it by:
a2enmod deflate
- or in a XAMPP installation remove the # of the line:
LoadModule deflate_module modules/mod_deflate.so
- You also need to set which files should be compressed in the Apache settings. To use compression on CSS, Javascripts, html and XML and plain text files edit the http.conf file and add:
<IfModule deflate_module>
AddOutputFilterByType DEFLATE text/html text/xml text/css application/x-javascript text/javascript
</IfModule>
- finally restart the Apache.
- Enable the cache of Apache.
- Static files that are often delivered by Apache can be cached. See mod_cache for more details. The following command lines activate a memory cache:
cd /etc/apache2/ a2enmod cache mem_cache /etc/init.d/apache2 restart
- Before you may use the cache, it must also be configured in the http.conf file.
- Add expiration to HTTP header of css, js and images
- Certain HTTP headers are required by the browser cache. The field "Expires" can be added with the following command lines. See mod_expires for more details.
a2enmod expires
or in an XAMPP installation edit the http.conf file and remove the # of the line:
LoadModule expires_module modules/mod_expires.so
- Then edit the Apache config file:
# enable expirations ExpiresActive On ExpiresByType text/* "access plus 2 months" ExpiresByType application/x-javascript "access plus 2 months" ExpiresByType image/* "access plus 2 months"
- And then restart Apache.
- Note: we have observed that files that get compressed by the deflate module suddenly cannot be cached by the Browser anymore. So if you don't use any squid or other proxy, it may be useful to find a good trade off between compressing content and setting an expire header. One suggestion is to compress only html and text which is normaly generated by Medawiki anyway so that css, javascript and image files are not compressed but can be cached by the browser.
MediaWiki settings
Set the following variables in LocalSettings.php:
-
$wgUseETag = true- Whether MediaWiki should send an ETag header. [2] -
$wgUseGzip = true- Use GZip to store cached pages. [3]
Trouble shooting
Sometimes wrong system configurations (server or MediaWiki) lead to slow response times.
Query processing is slow
The wiki accesses the TSC through a web service which is normally configured like this:
$smwgWebserviceEndpoint='localhost:8092';
But sometimes resolving the IP-address of 'localhost' takes a long time as the system may at first try to find the IP-v6 address and if it fails the IP-v4 address. Specifying the IP-address may solve this:
$smwgWebserviceEndpoint='127.0.0.1:8092';
Under Windows it is also possible to define the IP address in the 'hosts' file (typically located here: c:\Windows\System32\drivers\etc\hosts):
# localhost name resolution is handled within DNS itself. 127.0.0.1 localhost # (IP-v4) #::1 localhost # (IP-v6)
Use the latest browser generation
Old browsers are usually very slow in processing complex script files. The latest browser generation is highly optimized for scripts and speeds them up considerably. Check out this comparison chart from Tech Crunch as of August, 2011.
DNS lookup problem
Sometimes the browser has problems resolving to domain name of the wiki site to an IP-address. Watch out for the DNS lookup times in Firebug. If problems occur you should use the IP-address of the site or preferably contact your system administrator.
| Static facts | Derived facts | ||||||||||||||||||||
Facts about Optimizing response times 1.5.3RDF feed
| |||||||||||||||||||||



