Search

Virtual Hosts with Apache and eXist-db

Being a sole developer is so much harder that working within the supportive framework of, say, a university. Whereas in the past I could pop alone to my mates in IT support and ask them to set a new server, and grizzle at them when it went down, now its all up to me. Both paid consultancy and hobby projects have really stretched my limited UNIX skills over the past few months.

Today however I feel elated. After a few days of struggle, I can now set up virtual hosts with Apache, Jetty and eXist-db. This means that I can at long last provide a clean URL for The Gloucester Road Story and a few others.

Here for the record is a summary of my approach.

Choose a VPS host

There is a confusing number of companies offering UNIX VPS now . I've tried Amazon EC2 (thanks guys for the year's free micro-instance) ElasticHost (very helpful people but a tad expensive) and now I'm trialing BitFolk on a friend's recomendation. The server has about 1 Gb, 20 Gb disk, 1 IP address and costs about ...

Choose a UNIX distro

I've been using Centos on servers but Ubuntu on desktops.  So this time I installed Ubuntu Lucid Lynx to reduce confusion.

Choose Software

Apache2  (even if the tide is going Nginx's way), Java Open-JDK., eXist-db 1.4.2

Configure eXist

eXist is installed from the .jar into /usr/local/eXist. The only problem is in setting passwords for the guest and admin users - It seems almost impossible to get these set right using the web admin screens - I had to resort to using the Java client

The only changes I made are to enable  (in /usr/local/eXist/conf.xml) some additional modules I use - math, compression for example.  I also created a new database user for each application.

The resources for each site are all stored in the database. So the gloucesterroadstory site is stored in the collection /db/apps/theroad.

Configure Apache

In addition to the default enabled modules, the following also need to be enabled: 

  • proxy.conf
  • proxy.load
  • proxy_http.load
  • rewrite.load

I created files for each site in /etc/apache2/sites-available and made them live with symbolic links in sites-enabled. Here is the configuration I created for the Gloucester Road site:

 

                                                
 <VirtualHost *:80>
    ServerAdmin kit.wallace@gmail.com
    ProxyRequests off
    ServerName thegloucesterroadstory.org
    ServerAlias www.thegloucesterroadstory.org
    <Proxy *>
         Allow from all
    </Proxy>
    ProxyPass / http://localhost:8080/exist/rest/db/apps/theroad/ 
    ProxyPassReverse / http://localhost:8080/exist/rest/db/apps/theroad/ 
    ProxyPassReverseCookieDomain localhost thegloucesterroadstory.org 
    ProxyPassReverseCookiePath / / 
    RewriteEngine on 
    RewriteRule ^/$   /home.xq [P] 
    RewriteRule ^/system  -  [F]  
 </VirtualHost> 

ServerName thegloucesterroadstory.org     the site's domain name, with a DNS entry pointing to the server's IP address. I'm using 123-reg for DNS management.

<Proxy *> Allow from all </Proxy>   the proxy.conf file denies proxying to all hosts so that must be overridden here

ProxyPass / http://localhost:8080/exist/rest/db/apps/theroad/   this is the host, port and path to the application in the eXist database via the REST interface

ProxyPassReverse / http://localhost:8080/exist/rest/db/apps/theroad/   URLs in headers in HTTP reponses are rewritten using this rule (this command would make more sense if the arguments were reversed since thats how thery are used)

ProxyPassReverseCookieDomain localhost thegloucesterroadstory.org The domain under which cookies need to be stored on the client needs to be the site domain name, not localhost. In my applications cookies are used for the session identifier because sessions are needed for user login.

ProxyPassReverseCookiePath / /  The path attached to the cookie - just root here (not /exist which is the default)

RewriteRule ^/$ /home.xq [R] The domain name alone invokes the main page, home.xq.

RewriteRule ^/system - [F]   Forbid access to the system subcollection. All other paths are passed unchanged

XQuery coding

Redirects

 Use request:get-uri() to get the internal URI (e.g. http://localhost:8080/exist/rest/db/apps/theroad/home.xq) which will then be rewritten using ProxyPassReverse to thegloucesterroadstory.org/home.xq.  I use it in this construction to transfer to a different page:

response:redirect-to(xs:anyURI(concat(request:get-uri(),"?action=login-form")))

Resource locations

Documents referenced by the HTML page, such as css, javascript and image files, need to be held in the application collection since any path above this will fail. This was not the case when the application is called with the full URL. XQuery library modules, for example common library functions can be placed anywhere.

Logging transactions

I usually monitor access to sites from within scripts so that appplication data such as elapsed time can be recorded along with the query string. Before proxying, I logged the host with       

request:get-host()

but now I have to log the X-Forwarded-For IP address. I'm only interested in the first in the chain so now I use

tokenize(request:get-header("X-Forwarded-For"),", ")[1]

Reflection

I used a lot of sources to get this working. It's difficult to know where to invest time: there are a couple of pages of documentation on the eXist site (here and here) but they don't deal with applications in the database and are incomplete; there is a ton of documentation online; mailing lists to ask; friends to badger; Google to guery. All helpful but in the end, careful experimentation is vital.  The Apache error log, the Firefox Live HTTP Headers add-on and the Firefox cookie view were valuable tools in debugging.

More work to do, especially on access control, but I'm a happy man today. 

 

 

 

 

 

 

 

 

Glad to see you have had success. I have the same problem as a single developer. Every month or so I have to put on my "Unix" hat to configure everything. It is interesting to see you use Apache to do this. Many others and doing it with URL rewriting inside of eXist using the collection-config.xml and URL rewriting. But I think that using Apache also has advantages.

Keep up the good documentation!

Thanks Dan. One reason for using Apache is that I need to host a plain HTML site, updated via FTP in addition to the eXist-db sites. There is also a lack of consensus in the eXist-db documentation on best practice. Adam's advice is still to use
Apache or Nginx in front of Jetty/eXist-db. I also found that I could not get XQuery scripts to execute in the URL-rewriting pipeline when the scripst where in the database - I need to revisit that issue.