Search

Converting an indented list to a tree - the power of util:parse #existdb

Joe Wicentowski posted a nice example of the power of XQuery to do text processing.  The problem concerns converting an indented list to a tree.

I find that eXist's util:parse() can be helpful with this kind of transformation. I used this approach in a function to convert from json to XML: first generate the XML as a string and then parse the string to XML.

As Joe does in his approach, the indented list is first converted to a sequence of lines, each of which has a level attribute  ( $lines ) e.g.

                                                
<line level="0">The President left at 8:48 am</line>
<line level="1">Administration recommendations on Capitol Hill</line>
<line level="1">Improvements</line>
<line level="1">Richardson’s trip to New York</line>
<line level="1">Health programs</line>
<line level="2">Goals</line>
 

A recursive function generates nested lists and items as a string using one item look-ahead:

                                                
declare function local:nest($lines) {
  let $this := $lines [1]
  let $next := $lines [2]
  return
      if (empty($next))
      then concat("<item>",$this,"</item>", string-pad("</list></item>",$this/@level))
      else 
         if ($this/@level = $next/@level)  (:in the same list :)
         then concat("<item>",$this,"</item>", local:nest (subsequence($lines,2)))
         else if ($this/@level < $next/@level) (: going down :)
         then concat("<item>",$this,
                     string-pad("<list>",$next/@level - $this/@level),
                     local:nest(subsequence($lines,2))
                     )
         else  (: ($this/@level > $next/@level) so  going up :)
            concat("<item>",$this,"</item>",
                     string-pad("</list></item>",$this/@level - $next/@level),
                     local:nest(subsequence($lines,2))
                    )
};

Finally the string is parsed to XML to create  the tree:

   util:parse(local:nest($lines))

So Joe's text converted looks like this

The function (deprecated in eXist's XPath namespace) string-pad($s, $n) creates a string of $n $s's. You need to be a bit careful to ensure that the string is well-formed XML (I forgot that level changes may not be just +/- 1) so its a bit tricky to build the string correctly but at least this kind of processing is very fast in eXist.

Thanks to the both of you! I'm just writing a drill-in menu in XForms and my source data for the choices is text. Now not just one, but two approaches.
@Chris: Thanks so much for your kind words and for posting your approach. I intend to look more into string-pad(); we'll have to find an alternative now that it's is deprecated.

@Mark: Glad to hear these are helpful, Mark.

@Joe and @Mark - glad you found it useful.

@joe I dont think string-pad ($s,$n) is more than

string-join(for $i in 1 to $n return $s,"")