I've often bemoaned the lack of the xsl:analyse-string construct in XQuery. Amongst the workarounds I've used is the rather awful technique of generating a fragment of XSLT and then executing it with the eXist-specific transform module. Priscilla Warmsley also has some functions in her functx module e.g. functx_get-matches. But I wasnt happy.
Recently I've realized that I can use replace() more intelligently to do the job. Here's the idea encapsulated in a function, which also generates a custom XML node, a bit like the list function of PHP which I also miss in XQuery.
declare function local:match( $source as xs:string, $pattern as xs:string, $names as xs:string+, $sep ) as element(data) { let $target := string-join(for $i in (1 to count($names)) return concat ("$",$i),$sep) let $filledTarget := replace($source,$pattern,$target) return if (contains($filledTarget,$sep)) (: it was matched :) then element data { for $string at $i in tokenize($filledTarget,$sep) return element {$names[$i]} {normalize-space($string)} } else () };
and used like this :
let $s := "x: 123 y:678" let $data := local:match($s,"x:\s*(\d+)\s*y:\s*(\d+)",("x","y"),";;") return $data
returns
<data> <x>123</x> <y>678</y> </data>
and can access the parsed variables as $data/x, $data/y.
Here $target is created as "$1;;$2" . After a successful match, the matched groups are inserted into the target so $filledTarget = "123;;678". tokenize() separates this into ("123","678"). Finally the element "data" is constructed with the named subelements. You can choose the separator to be something which doesnt occur in the source string.
I've recently found this poor-man's parse very useful in writing scapers.