Being a sole developer is so much harder that working within the supportive framework of, say, a university. Whereas in the past I could pop alone to my mates in IT support and ask them to set a new server, and grizzle at them when it went down, now its all up to me. Both paid consultancy and hobby projects have really stretched my limited UNIX skills over the past few months.
Today however I feel elated. After a few days of struggle, I can now set up virtual hosts with Apache, Jetty and eXist-db. This means that I can at long last provide a clean URL for The Gloucester Road Story and a few others.
Here for the record is a summary of my approach.
Choose a VPS host
There is a confusing number of companies offering UNIX VPS now . I've tried Amazon EC2 (thanks guys for the year's free micro-instance) ElasticHost (very helpful people but a tad expensive) and now I'm trialing BitFolk on a friend's recomendation. The server has about 1 Gb, 20 Gb disk, 1 IP address and costs about ...
Choose a UNIX distro
I've been using Centos on servers but Ubuntu on desktops. So this time I installed Ubuntu Lucid Lynx to reduce confusion.
Choose Software
Apache2 (even if the tide is going Nginx's way), Java Open-JDK., eXist-db 1.4.2
Configure eXist
eXist is installed from the .jar into /usr/local/eXist. The only problem is in setting passwords for the guest and admin users - It seems almost impossible to get these set right using the web admin screens - I had to resort to using the Java client.
The only changes I made are to enable (in /usr/local/eXist/conf.xml) some additional modules I use - math, compression for example. I also created a new database user for each application.
The resources for each site are all stored in the database. So the gloucesterroadstory site is stored in the collection /db/apps/theroad.
Configure Apache
In addition to the default enabled modules, the following also need to be enabled:
I created files for each site in /etc/apache2/sites-available and made them live with symbolic links in sites-enabled. Here is the configuration I created for the Gloucester Road site:
<VirtualHost *:80> ServerAdmin kit.wallace@gmail.com ProxyRequests off ServerName thegloucesterroadstory.org ServerAlias www.thegloucesterroadstory.org <Proxy *> Allow from all </Proxy> ProxyPass / http://localhost:8080/exist/rest/db/apps/theroad/ ProxyPassReverse / http://localhost:8080/exist/rest/db/apps/theroad/ ProxyPassReverseCookieDomain localhost thegloucesterroadstory.org ProxyPassReverseCookiePath / / RewriteEngine on RewriteRule ^/$ /home.xq [P] RewriteRule ^/system - [F] </VirtualHost>
ServerName thegloucesterroadstory.org the site's domain name, with a DNS entry pointing to the server's IP address. I'm using 123-reg for DNS management.
<Proxy *> Allow from all </Proxy> the proxy.conf file denies proxying to all hosts so that must be overridden here
ProxyPass / http://localhost:8080/exist/rest/db/apps/theroad/ this is the host, port and path to the application in the eXist database via the REST interface
ProxyPassReverse / http://localhost:8080/exist/rest/db/apps/theroad/ URLs in headers in HTTP reponses are rewritten using this rule (this command would make more sense if the arguments were reversed since thats how thery are used)
ProxyPassReverseCookieDomain localhost thegloucesterroadstory.org The domain under which cookies need to be stored on the client needs to be the site domain name, not localhost. In my applications cookies are used for the session identifier because sessions are needed for user login.
ProxyPassReverseCookiePath / / The path attached to the cookie - just root here (not /exist which is the default)
RewriteRule ^/$ /home.xq [R] The domain name alone invokes the main page, home.xq.
RewriteRule ^/system - [F] Forbid access to the system subcollection. All other paths are passed unchanged
XQuery coding
Redirects
Use request:get-uri() to get the internal URI (e.g. http://localhost:8080/exist/rest/db/apps/theroad/home.xq) which will then be rewritten using ProxyPassReverse to thegloucesterroadstory.org/home.xq. I use it in this construction to transfer to a different page:
response:redirect-to(xs:anyURI(concat(request:get-uri(),"?action=login-form")))
Resource locations
Documents referenced by the HTML page, such as css, javascript and image files, need to be held in the application collection since any path above this will fail. This was not the case when the application is called with the full URL. XQuery library modules, for example common library functions can be placed anywhere.
Logging transactions
I usually monitor access to sites from within scripts so that appplication data such as elapsed time can be recorded along with the query string. Before proxying, I logged the host with
request:get-host()
but now I have to log the X-Forwarded-For IP address. I'm only interested in the first in the chain so now I use
tokenize(request:get-header("X-Forwarded-For"),", ")[1]
Reflection
I used a lot of sources to get this working. It's difficult to know where to invest time: there are a couple of pages of documentation on the eXist site (here and here) but they don't deal with applications in the database and are incomplete; there is a ton of documentation online; mailing lists to ask; friends to badger; Google to guery. All helpful but in the end, careful experimentation is vital. The Apache error log, the Firefox Live HTTP Headers add-on and the Firefox cookie view were valuable tools in debugging.
More work to do, especially on access control, but I'm a happy man today.
Keep up the good documentation!
Apache or Nginx in front of Jetty/eXist-db. I also found that I could not get XQuery scripts to execute in the URL-rewriting pipeline when the scripst where in the database - I need to revisit that issue.