The Wallace Line.

Blog

Index

regexp parsing in XQuery with replace()

I've often bemoaned the lack of the xsl:analyse-string construct in XQuery. Amongst the workarounds I've used is the rather awful technique of generating a fragment of XSLT and then executing it with the eXist-specific transform module. Priscilla Warmsley also has some functions in her functx module e.g. functx_get-matches. But I wasnt happy.

Recently I've realized that I can use replace() more intelligently to do the job. Here's the idea encapsulated in a function, which also generates a custom XML node, a bit like the list function of PHP which I also miss in XQuery.



declare function local:match(
       $source as xs:string,
       $pattern as xs:string,
       $names as xs:string+,
       $sep ) as element(data) {
   let $target := string-join(for $i in (1 to count($names)) return concat ("$",$i),$sep)
   let $filledTarget := replace($source,$pattern,$target)
   return 
     if (contains($filledTarget,$sep)) (: it was matched :)
     then 
      element data 
       { for $string at $i in  tokenize($filledTarget,$sep)
         return 
           element {$names[$i]} {normalize-space($string)}
       }
    else ()
};

and used like this :



let $s := "x: 123 y:678"
let $data := local:match($s,"x:\s*(\d+)\s*y:\s*(\d+)",("x","y"),";;")
return $data

returns



<data>
 <x>123</x>
 <y>678</y>
</data>

and can access the parsed variables as $data/x, $data/y.

Here $target is created as "$1;;$2" . After a successful match, the matched groups are inserted into the target so $filledTarget = "123;;678". tokenize() separates this into ("123","678"). Finally the element "data" is constructed with the named subelements. You can choose the separator to be something which doesnt occur in the source string.

I've recently found this poor-man's parse very useful in writing scapers.