Best xml questions in July 2011

Add restrictions to WCF in method/data member

8 votes

I'm new to WCF and I try to add restriction to data member.

For exmple in this method:

[DataMember]
    public string StringValue
    {
        get { return stringValue; }
        set { stringValue = value; }
    }

I want to set max and min length. I know how to add the restriction to the XML code

 <xs:restriction base="xs:string">
  <xs:minLength value="2"/>
  <xs:maxLength value="10"/>
</xs:restriction>

but is there a way to add a restriction straight from the code?

According to MSDN, maxLength, minLength and length etc are ignored. There is no declarative way to enforce what you're asking for, as much as I wish there was. This is one of those places where the cracks between the .NET and XML worlds show. The only method I've found for enforcement is to build a message inspector and apply the transform in there.

Conditionally include attribute in XML literal

8 votes

I have the following XML literal:

<input type='radio'
       name={funcName}
       value='true' />

I'd like to include checked='checked' if cond is true.

I've tried this,

<input type='radio'
       name={funcName}
       value='true'
       { if (cond) "checked='checked'" else "" } />

but it doesn't work.

(I'd really like to avoid repeating the whole tag.)

Option also works, which reduces unnecessary use of null:

scala> val checked:Option[xml.Text] = None
checked: Option[scala.xml.Text] = None

scala> val xml = <input checked={checked} />
xml: scala.xml.Elem = <input ></input>

Sorting XML with XSLT - entire XML-schema is not known

7 votes

I am wondering whether XSLT makes it possible to sort an XML file if I don't know the entire XML-schema.

For example I would like to sort the following XML file.
Sort /CATALOG/CD elements by /CATALOG/CD/TITLE

<CATALOG attrib1="value1">
  <DVD2>
    <TITLE>The Godfather2</TITLE>
  </DVD2>
  <CD>
    <TITLE>Hide your heart</TITLE>
    <ARTIST>Bonnie Tyler</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>CBS Records</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1988</YEAR>
  </CD>
  <CD attrib4="value4">
    <TITLE>Empire Burlesque</TITLE>
    <ARTIST>Bob Dylan</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>Columbia</COMPANY>
    <PRICE>
      <CATALOG>
        <CD><TITLE>E</TITLE></CD>
        <CD><TITLE>I</TITLE></CD>
        <CD><TITLE>D</TITLE></CD>
      </CATALOG>
    </PRICE>
    <YEAR>1985</YEAR>
  </CD>
  <CD attrib2="value2">
    <TITLE attrib3="value3">Greatest Hits</TITLE>
    <ARTIST>Dolly Parton</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>RCA</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1982</YEAR>
  </CD>
  <DVD>
    <TITLE>The Godfather1</TITLE>
  </DVD>
</CATALOG>

The output should be:

<CATALOG attrib1="value1">
  <CD attrib4="value4">
    <TITLE>Empire Burlesque</TITLE>
    <ARTIST>Bob Dylan</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>Columbia</COMPANY>
    <PRICE>
      <CATALOG>
        <CD><TITLE>E</TITLE></CD>
        <CD><TITLE>I</TITLE></CD>
        <CD><TITLE>D</TITLE></CD>
      </CATALOG>
    </PRICE>
    <YEAR>1985</YEAR>
  </CD>
  <CD attrib2="value2">
    <TITLE attrib3="value3">Greatest Hits</TITLE>
    <ARTIST>Dolly Parton</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>RCA</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1982</YEAR>
  </CD>
  <CD>
    <TITLE>Hide your heart</TITLE>
    <ARTIST>Bonnie Tyler</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>CBS Records</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1988</YEAR>
  </CD>
  <DVD2>
    <TITLE>The Godfather2</TITLE>
  </DVD2>
  <DVD>
    <TITLE>The Godfather1</TITLE>
  </DVD>
</CATALOG>

The following is one of the many tries I did:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <!--<CATALOG>-->
    <xsl:for-each select="CATALOG/CD">
      <xsl:sort select="TITLE" />
      <xsl:copy-of select="."/>
    </xsl:for-each>
    <!--</CATALOG>-->
  </xsl:template>
</xsl:stylesheet>

The problem is that, with this XSLT, XML parts outside the CD list are not displayed.
I could uncomment the two commented-out parts of code, but that's exactly what I want to avoid.
In that case if any attributes are added to the CATALOG element, they would not be copied to output XML.
I don't want to re-build the XML file: I just want to do a sort knowing exact information only about some part of the XML-schema.

This functionality is easy to implement for example using .NET (with XmlDocument and XmlNode objects), or Python's lxmx library, but is it possible with XSLT?

Thanks!

Note: It is not easy to find a sample input XML which will avoid misunderstanding the question in all cases. But I will try to detail the problem as much as I can:

  • only CD elements right under CATALOG should be sorted (for example CD elements under the Bob Dylan section should be left untouched)
  • it is all the same whether elements other than CD (for example DVD and DVD2) are in the beginning or end of the list
  • no elements, attributes, values, comments, so nothing should be missing from the output XML
  • non-CD elements (for example DVD and DVD2) should not be sorted by the TITLE subelement

Keeping on the line of just modifying the identity transformation (which might not be really safe), I think that the following should be equivalent to @Tim's answer.

NOTE I'm not promoting this technique at all, unless you understand what's the general behavior of the identity transformation.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* 
                | node()[not(self::CD[parent::CATALOG])]"/>
            <xsl:apply-templates select="CD[parent::CATALOG]">
                <xsl:sort select="TITLE"/>
            </xsl:apply-templates>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

or, if you care about the other elements DVD and DVD2, you can do:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@*"/>
            <xsl:apply-templates select="CD[parent::CATALOG]">
                <xsl:sort select="TITLE"/>
            </xsl:apply-templates>
            <xsl:apply-templates select="node()
                [not(self::CD[parent::CATALOG])]"/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

Dynamic XML creation in Java

7 votes

I am trying to dynamically y create an XML file in Java to display a timetable. I have created a DTD for my XML file and I have an XSL file I would like to use to transform the XML. I don't know exactly how to continue.

What I've tried so far is onClick of some button a Servlet is called which generates the string of the content of the XML file (inserting the dynamic parts of the XML into the String. I now have a String containing the content of the XML file. I would now like to transform the XML file using an XSL file i have on my server and display the result in the page which has called the Servlet (doing this via AJAX).

I'm not sure if I'm in the direction, perhaps I shouldn't even create the XML code in String form from the beginning. So my question is, how do I continue from here? how do I transform the XML string, using the XSL file, and send it as a response to the AJAX call so I can plant the generated code into the page? Or if this is not the way to do it, how do I create a dynamic XML file in a different way producing the same result?

You can use JAXP for this. It's part of standard Java SE API.

StringReader xmlInput = new StringReader(xmlStringWhichYouHaveCreated);
InputStream xslInput = getServletContext().getResourceAsStream("file.xsl"); // Or wherever it is. As long as you've it as an InputStream, it's fine.

Source xmlSource = new StreamSource(xmlInput);
Source xslSource = new StreamSource(xslInput);
Result xmlResult = new StreamResult(response.getOutputStream()); // XML result will be written to HTTP response.

Transformer transformer = TransformerFactory.newInstance().newTransformer(xslSource);
transformer.transform(xmlSource, xmlResult);

Java : parsing xml with DOM, DOCTYPE gets erased !

6 votes

how come dom with java erases doctype when editing xml ?

got this xml file :

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE map[ <!ELEMENT map (station*) >
                <!ATTLIST station  id   ID    #REQUIRED> ]>
<favoris>
<station id="5">test1</station>
<station id="6">test1</station>
<station id="8">test1</station>
</favoris> 

my function is very basic :

public static void EditStationName(int id, InputStream is, String path, String name) throws ParserConfigurationException, SAXException, IOException, TransformerFactoryConfigurationError, TransformerException{
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

    DocumentBuilder builder = factory.newDocumentBuilder();
    Document dom = builder.parse(is);

    Element e = dom. getElementById(String.valueOf(id));
    e.setTextContent(name);
    // Write the DOM document to the file
    Transformer xformer = TransformerFactory.newInstance().newTransformer();
    FileOutputStream fos = new FileOutputStream(path);
    Result result = new StreamResult(fos);  
    Source source = new DOMSource(dom);


        xformer.setOutputProperty(
                OutputKeys.STANDALONE,"yes"     
                );

    xformer.transform(source, result);
}

it's working but the doctype gets erased ! and I just got the whole document but without the doctype part, which is important for me because it allows me to retrieve by id ! how can we keep the doctype ? why does it erase it? I tried many solution with outputkeys for example or omImpl.createDocumentType but none of these worked...

thank you !

(This response is in a way only a supplement to @Grzegorz Szpetkowski's answer, why it works)

You lose the doctype definition because you use the Transform class which produces an XSL transformation. There is no DOCTYPE declaration or docytype definition object/node in XSLT tree model. When a parser hands over the document to an XSLT processor, the doctype info is lost and therefore cannot be retained or duplicated. XSLT offers some control over the serialization of the output tree, including adding an <!DOCTYPE ... > declaration with a public or system identifier. The values for these identifiers need to be known beforehand and cannot be read from the input tree. Creating or retaining an embedded DTD or entity declarations is also not supported (although one workaround for this obstacle is to output it as text with disable-output-escaping="yes").

In order to preserve the DTD you need to output your document with an XML serializer instead of XSL transformation, like Grzegorz already suggested.

Which is the Best Tool for Creating XSL File?

6 votes

Which is the Best Tool for Creating XSLT File.

We use and love Altova XML Editor not expensive, nice UI and powerful.

Problem with conversion of org.dom4j.Document to org.w3c.dom.Document and XML Signature

6 votes

I have some classes that already use DOM4J to read XML files and provide getter methods to the data. Now, I need to add the possibility of checking XML digital signatures.

Using org.w3c.dom and following http://java.sun.com/developer/technicalArticles/xml/dig_signature_api/ everything works correctly.

So, I try to use DOMWriter to convert from org.dom4j.Document to org.w3c.dom.Document, but after this the signature validation doesn't work. I think it happens because DOMWiter is changing the XML tree (as doc4.asXML() seems to show).

I try to find something to set in order to mantain the integrity of the document, but DOMWriter don't have such methods.

Below is the code demonstrating the asymmetric conversion.

The file used for tests is http://www.robertodiasduarte.com.br/files/nfe/131090007910044_v1.10-procNFe.xml

Does someone know reasons/workarounds to this?

Thanks (and sorry my poor english).

package testevalidanfe;

import java.io.File;
import java.io.FileWriter;
import java.io.PrintWriter;
import javax.swing.JOptionPane;
import javax.xml.crypto.dsig.XMLSignature;
import javax.xml.crypto.dsig.XMLSignatureFactory;
import javax.xml.crypto.dsig.dom.DOMValidateContext;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.dom4j.io.XMLWriter;
import org.w3c.dom.Document;
import org.w3c.dom.Node;

public class Testevalidanfe {

    public static void main(String[] args) throws Exception {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        dbf.setNamespaceAware(true);
        DocumentBuilder db = dbf.newDocumentBuilder();
        Document d = db.parse("exemplo-nfe.xml");

        Node no = d.getElementsByTagNameNS(XMLSignature.XMLNS, "Signature").item(0);

        DOMValidateContext valContext = new DOMValidateContext(new X509KeySelector(), no);
        XMLSignatureFactory fac = XMLSignatureFactory.getInstance("DOM");
        XMLSignature signature = fac.unmarshalXMLSignature(valContext);

        JOptionPane.showMessageDialog(null, "Validation using org.w3c.dom: " + signature.validate(valContext));
        org.dom4j.io.DOMReader domreader = new org.dom4j.io.DOMReader();
        org.dom4j.Document doc4 = domreader.read(d);
        org.dom4j.io.DOMWriter domwriter = new org.dom4j.io.DOMWriter();
        d = domwriter.write(doc4);

        String after = doc4.asXML();

        PrintWriter writer = new PrintWriter(new File("after-convertion.xml"));
        writer.print(after);
        writer.close();

        no = d.getElementsByTagNameNS(XMLSignature.XMLNS, "Signature").item(0);

        valContext = new DOMValidateContext(new X509KeySelector(), no);
        fac = XMLSignatureFactory.getInstance("DOM");
        signature = fac.unmarshalXMLSignature(valContext);

        JOptionPane.showMessageDialog(null, "Validation after convert: " + signature.validate(valContext));
    }
}

package testevalidanfe;

import java.security.Key;
import java.security.PublicKey;
import java.security.cert.X509Certificate;
import java.util.Iterator;
import javax.xml.crypto.AlgorithmMethod;
import javax.xml.crypto.KeySelector;
import javax.xml.crypto.KeySelectorException;
import javax.xml.crypto.KeySelectorResult;
import javax.xml.crypto.XMLCryptoContext;
import javax.xml.crypto.XMLStructure;
import javax.xml.crypto.dsig.SignatureMethod;
import javax.xml.crypto.dsig.keyinfo.KeyInfo;
import javax.xml.crypto.dsig.keyinfo.X509Data;

public class X509KeySelector extends KeySelector {
    public KeySelectorResult select(KeyInfo keyInfo,
                                KeySelector.Purpose purpose,
                                AlgorithmMethod method,
                                XMLCryptoContext context)
    throws KeySelectorException {
        Iterator ki = keyInfo.getContent().iterator();
        while (ki.hasNext()) {
            XMLStructure info = (XMLStructure) ki.next();
            if (!(info instanceof X509Data))
                continue;
            X509Data x509Data = (X509Data) info;
            Iterator xi = x509Data.getContent().iterator();
            while (xi.hasNext()) {
                Object o = xi.next();
                if (!(o instanceof X509Certificate))
                    continue;
                final PublicKey key = ((X509Certificate)o).getPublicKey();
                if (algEquals(method.getAlgorithm(), key.getAlgorithm())) {
                    return new KeySelectorResult() {
                        public Key getKey() { return key; }
                    };
                }
           }
       }
       throw new KeySelectorException("No key found!");
    }

    static boolean algEquals(String algURI, String algName) {
        if ((algName.equalsIgnoreCase("DSA") &&
            algURI.equalsIgnoreCase(SignatureMethod.DSA_SHA1)) ||
            (algName.equalsIgnoreCase("RSA") &&
            algURI.equalsIgnoreCase(SignatureMethod.RSA_SHA1))) {
            return true;
        } else {
            return false;
        }
    }
}

For example, if the original XML starts with:

<nfeProc versao="1.10" xmlns="http://www.portalfiscal.inf.br/nfe">
<NFe xmlns="http://www.portalfiscal.inf.br/nfe">
<infNFe Id="NFe31090807301671000131550010001000216008030809" versao="1.10" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...

doc4.asXML() return this:

<nfeProc xmlns="http://www.portalfiscal.inf.br/nfe" versao="1.10">
<NFe>
<infNFe xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" Id="NFe31090807301671000131550010001000216008030809" versao="1.10">
...

I had a closer look at this, and it turns out that DOM4J DOMWriter is doing something odd w.r.t. namespaces that obviously confuses the canonicalization process. I haven't pin pointed the exact reason, but I think it has to do with DOMWriter inserting extra xmlns attributes in the DOM elements. You can see the effect if you turn on logging for the XML digital signature API (as described in the article you refer to), the canonicalized <SignedInfo> element lacks namespace declaration in the DOM document produced by DOM4J.

However, instead of using DOMWriter, you can produce a DOM document by transformation, using a DOM4J DocumentSource and a DOMResult.

/**
 * Create a DOM document from a DOM4J document 
 */
static Document copy(org.dom4j.Document orig) {
    try {
        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer t = tf.newTransformer();
        DOMResult result = new DOMResult();
        t.transform(new DocumentSource(orig), result);
        return (Document) result.getNode();
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

Using the resulting DOM document, the validation works.

http POST error

5 votes

I'm trying to send the xml to another webserver through Restclient's http POST request.This is the code :

response =  RestClient.post 'https://secure.rowebooks.co.uk/testorders/orders.aspx', :content_type => "text/xml", :myfile => File.read("#{Rails.root}/public/shared/#{@book}.xml")

But I'm getting this error

ERROR 2 Data at the root level is invalid. Line 1, position 1.ERROR3 Object reference not set to an instance of an object.

I've been told that I am receiving that error because the XML file is not in the content of the call. It must be in the content. I have no idea what does this mean.

Any suggestion / clue will be greatly appreciated.

Thanks

You should be doing it like this:

response =  RestClient.post( 'https://secure.rowebooks.co.uk/testorders/orders.aspx', 
File.read("#{Rails.root}/public/shared/#{@book}.xml"), 'Content-Type' => 'text/xml' )

XPath last occurrence of each element

5 votes

I have XML like

<root>
    <a>One</a>
    <a>Two</a>
    <b>Three</b>
    <c>Four</c>
    <a>Five</a>
    <b>
        <a>Six</a>
    </b>
</root>

and need to select the last occurrence of any child node name in root. In this case, the desired resulting list would be:

<c>Four</c>
<a>Five</a>
<b>
    <a>Six</a>
</b>

Any help is appreciated!

XSLT based solution:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="root/*">
        <xsl:variable name="n" select="name()"/>
        <xsl:copy-of
            select=".[not(following-sibling::node()[name()=$n])]"/>
    </xsl:template>
</xsl:stylesheet>

Produced output:

<c>Four</c>
<a>Five</a>
<b>
   <a>Six</a>
</b>

Second solution (you can use it as single XPath expression):

<xsl:template match="/root">
    <xsl:copy-of select="a[not(./following-sibling::a)]
        | b[not(./following-sibling::b)]
        | c[not(./following-sibling::c)]"/>
</xsl:template>

Open Source XPath Filter 2.0 implementation

5 votes

Does anyone know of an open source implementation for XPath Filter 2.0, preferrably in Java? But any other language would also be fine. The standard is not that new, so something should exist, but I can't find anything...

If there really is nothing adequate, has anyone ever implemented it and could tell me how difficult it is getting there with standard means (DOM model plus XPath)? Just a rough estimate, would it be a matter of days or rather of weeks for 2 people working full time on it?

Did you have a look at the Apache Santuario library?

It comes with a class that implements XML Signature XPath Filter v2.0:

TransformXPath2Filter

Further implementations are listed here (though I haven't checked any of these):

XML Signature XPath Filter2 Interop Report

XML,XSLT transformation

5 votes

I have 2 strings, an XML string I constructed using Java's DOM interface, and an external XSL file I want to bind to that XML file. I tried using Java's transform methods, but without luck (Meaning I can't seem to find any solution for this on the web).

How do I take an XML file and an XSL file and make an html string out of them both?

What I'm trying to do is to inject an XML page into my JSP page.

Just to clarify: This is done in a servlet, not in javascript.

A little more information:

I create the xml during runtime as a string, the xsl file I've got is stored on the server, what I want to do is to display the xml altered by the xsl file to the user when he clicks on a certain link on the site, and I want to embed that inside an existing jsp page, (in order to maintain the standard look of the site.

This is what I've got so far:

String convertedXML = new String();
TransformerFactory factory1 = 
    TransformerFactory.newInstance();
Source xsl = new StreamSource("my.xsl");
Result result11 = null;
try {
    Templates template = factory1.newTemplates(xsl);
    Transformer transformer1 = template.newTransformer();
    Source xml = new StreamSource(xmlString);
    result11 = new StreamResult(convertedXML);
    transformer1.transform(xml, result11);
} catch(Exception e) {
   System.out.println("Not Good");
}

the last line before the catch throws the next error:

javax.xml.transform.TransformerException:
java.io.FileNotFoundException: at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getOutputHandler(Unknown Source) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(Unknown Source) at controllers.UserController.schedulePage(UserController.java:394)

Could you put your file into WEB-INF and try to use following:

String path = "/WEB-INF/my.xsl";
ServletContext context = getServletContext();
InputStream xslIs = context.getResourceAsStream(filename);
Source xsl = new StreamSource(xslIs);

Linq-to-XML XElement.Remove() leaves unwanted whitespace

5 votes

I have an XDocument that I create from a byte array (received over tcp/ip).

I then search for specific xml nodes (XElements) and after retrieving the value 'pop' it off of the Xdocument by calling XElement.Remove(). After all of my parsing is complete, I want to be able to log the xml that I did not parse (the remaining xml in the XDocument). The problem is that there is extra whitespace that remains when XElement.Remove() is called. I want to know the best way to remove this extra whitespace while preserving the rest of the format in the remaining xml.

Example/Sample Code

If I receive the following xml over the socket:

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
   </book>
</catalog>

And I use the following code to parse this xml and remove a number of the XElements:

private void socket_messageReceived(object sender, MessageReceivedEventArgs e)
{
     XDocument xDoc;
     try
     {
         using (MemoryStream xmlStream = new MemoryStream(e.XmlAsBytes))
         using (XmlTextReader reader = new XmlTextReader(xmlStream))
         {
             xDoc = XDocument.Load(reader);
         }

         XElement Author = xDoc.Root.Descendants("author").FirstOrDefault();
         XElement Title  = xDoc.Root.Descendants("title").FirstOrDefault();
         XElement Genre  = xDoc.Root.Descendants("genre").FirstOrDefault();

         // Do something with Author, Title, and Genre here...

         if (Author != null) Author.Remove();
         if (Title  != null) Title.Remove();
         if (Genre  != null) Genre.Remove();

         LogUnparsedXML(xDoc.ToString());

     }
     catch (Exception ex)
     {
         // Exception Handling here...
     }
}

Then the resulting string of xml sent to the LogUnparsedXML message would be:

<?xml version="1.0"?>
<catalog>
   <book id="bk101">



      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
   </book>
</catalog>

In this contrived example it may not seem like a big deal, but in my actual application the leftover xml looks pretty sloppy. I have tried using the XDocument.ToString overload that takes a SaveOptions enum to no avail. I have also tried to call xDoc.Save to save out to a file using the SaveOptions enum. I did try experimenting with a few different linq queries that used XElement.Nodes().OfType<XText>() to try to remove the whitespace, but often I ended up taking the whitespace that I wish to preserve along with the whitespace that I am trying to get rid of.

Thanks in advance for assistance.

Joe

It's not easy to answer in a portable way, because the solution heavily depends on how XDocument.Load() generates whitespace text nodes (and there are several implementations of LINQ to XML around that might disagree about that implementation detail).

That said, it looks like you're never removing the last child (<description>) from the <book> elements. If that's indeed the case, then we don't have to worry about the indentation of the parent element's closing tag, and we can just remove the element and all its following text nodes until we reach another element. TakeWhile() will do the job.

EDIT: Well, it seems you need to remove the last child after all. Therefore, things will get more complicated. The code below implements the following algorithm:

  • If the element is not the last element of its parent:
    • Remove all following text nodes until we reach the next element.
  • Otherwise:
    • Remove all following text nodes until we find one containing a newline,
    • If that node only contains a newline:
      • Remove that node.
    • Otherwise:
      • Create a new node containing only the whitespace found after the newline,
      • Insert that node after the original node,
      • Remove the original node.
  • Remove the element itself.

The resulting code is:

public static void RemoveWithNextWhitespace(this XElement element)
{
    IEnumerable<XNode> textNodes
        = element.NodesAfterSelf().TakeWhile(node => node is XText);
    if (element.ElementsAfterSelf().Any()) {
        // Easy case, remove following text nodes.
        textNodes.ToList().ForEach(node => node.Remove());
    } else {
        // Remove trailing whitespace.
        textNodes.Cast<XText>().TakeWhile(text => !text.Value.Contains("\n"))
                 .ToList().ForEach(text => text.Remove());
        // Fetch text node containing newline, if any.
        XText newLineTextNode
            = element.NodesAfterSelf().OfType<XText>().FirstOrDefault();
        if (newLineTextNode != null) {
            string value = newLineTextNode.Value;
            if (value.Length > 1) {
                // Composite text node, trim until newline (inclusive).
                newLineTextNode.AddAfterSelf(
                    new XText(value.SubString(value.IndexOf('\n') + 1)));
            }
            // Remove original node.
            newLineTextNode.Remove();
        }
    }
    element.Remove();
}

From there, you can do:

if (Author != null) Author.RemoveWithNextWhitespace();
if (Title  != null) Title.RemoveWithNextWhitespace();
if (Genre  != null) Genre.RemoveWithNextWhitespace();

Though I would suggest you replace the above with something like a loop fed from an array or a params method call , to avoid code redundancy.

Getting Data from a Simple XML

4 votes

I am trying to extract some data from an XML input with 6 lines, using HXT. I want to keep HXT, too, because of the Curl integration and because I have other XML files with thousands of lines, later.

My XML looks like this:

<?xml version = "1.0" encoding = "UTF-8"?>
<find>
    <set_number>228461</set_number>
    <no_records>000000008</no_records>
    <no_entries>000000008</no_entries>
</find>

And I've been trying to get together how to parse that. Unfortunately, the Wiki page of HXT has not been a big help (or I just did overlook stuff).

data FindResult = FindResult {
        resultSetNumber :: String,
        resultNoRecords :: Int,
        resultNoEntries :: Int
    } deriving (Eq, Show)

resultParser :: ArrowXml a => a XmlTree FindResult
resultParser = hasName "find" >>> getChildren >>> proc x -> do
    setNumber <- isElem >>> hasName "set_number" >>> getChildren >>> getText -< x
    noRecords <- isElem >>> hasName "no_records" >>> getChildren >>> getText -< x
    noEntries <- isElem >>> hasName "no_entries" >>> getChildren >>> getText -< x
    returnA -< FindResult setNumber (read noRecords) (read noEntries)

find str = return . head =<< (runX $ readDocument [withValidate no, withCurl []] query >>> resultParser)
    where query = "http://" ++ server ++ "/find?request=" ++ str

What I always get is

*** Exception: Prelude.head: empty list

so, I guess, the parsing must go horribly wrong, since I checked and correctly get the XML from the query.

The following works for me (modelled after this example):

{-# LANGUAGE Arrows #-}

module Main
       where

import Text.XML.HXT.Core
import System.Environment

data FindResult = FindResult {
        resultSetNumber :: String,
        resultNoRecords :: Int,
        resultNoEntries :: Int
    } deriving (Eq, Show)

resultParser :: ArrowXml a => a XmlTree FindResult
resultParser =
  deep (isElem >>> hasName "find") >>> proc x -> do
    setNumber <- getText <<< getChildren <<< deep (hasName "set_number") -< x
    noRecords <- getText <<< getChildren <<< deep (hasName "no_records") -< x
    noEntries <- getText <<< getChildren <<< deep (hasName "no_entries") -< x
    returnA -< FindResult setNumber (read noRecords) (read noEntries)

main :: IO ()
main = do [src] <- getArgs
          res <- runX $ ( readDocument [withValidate no] src >>> resultParser)
          print . head $ res

Testing:

$ dist/build/test/test INPUT
FindResult {resultSetNumber = "228461", resultNoRecords = 8, resultNoEntries = 8}

Howto assign HTML semantics to my XML elements to get them rendered in my web browser?

4 votes

I have an xml file like this:

<?xml version="1.0" encoding="UTF-8"?>
<todo>
    <list><item>first</item><item>second</item></list>
    <list><item>first</item><item>second</item></list>
</todo>

Now I want to view the file in a browser. I want to have the <list> element rendered like a <ul> html-element, the <item> elements like <li> html-elements. i know that I can use xslt to transform the xml into an html document. but: is there a way to directly assign the html semantics to the elements of my list, e.g. with css (something like list{display:ul}) or a dtd?

Yes, this is possible.

See W3C-Website Style Sheets with XML.

You can use CSS to declare for each XML element how the browser should display it. But you have to be more verbose than in HTML, because for plain XML there are no predefined styles.

In the XML header add a reference to your CSS file:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="todo.css"?>
<todo>
   <list><item>first</item><item>second</item></list>
   <list><item>first</item><item>second</item></list>
</todo>

Here is an example CSS file (todo.css):

todo {
  display: block;
}

list {
  display: block;
  padding-left: 5mm;
  margin-top: 1cm;
}

item {
  display: list-item;
  list-style-type: circle;
}

For each element you can define the display style (block, inline, none, list-item).

For display: list-item you can additionally use the styles

  • list-style-image: url(bullet.gif) to decalare a bullet icon graphic
  • list-style-image with values inside or outside
  • list-style-type with values circle, disc, square, none

InvalidOperationException throw when calling XmlReader::ReadStartElement

4 votes

I wrote an application in C++ which generates an XML file out of class members. Now I want to read the generated file again and save all attributes and values back to the C++ classes.

My XML writer (writes with success):

void TDescription::WriteXml( XmlWriter^ writer )
{
    writer->WriteStartElement( "Description" );
    writer->WriteAttributeString( "Version", m_sVersion );
    writer->WriteAttributeString( "Author", m_sAuthor );
    writer->WriteString( m_sDescription );
    writer->WriteEndElement();
}

My XML reader (causes an exception):

void TDescription::ReadXml( XmlReader^ reader )
{
    reader->ReadStartElement( "Description" );
    m_sVersion = reader->GetAttribute( "Version" );
    m_sAuthor = reader->GetAttribute( "Author" );
    m_sDescription = reader->ReadString();
    reader->ReadEndElement();
}

My generated XML file:

<?xml version="1.0" encoding="utf-8"?>
<root Name="database" Purpose="try" Project="test">
     <!--Test Database-->
     <Description Version="1.1B" Author="it">primary</Description>
</root>

Here is the exception caused by the reader:

An unhandled exception of type 'System.InvalidOperationException' occurred in System.Xml.dll

Additional information: There is an error in XML document (2, 2).

What's the problem with the code? I think that the XmlReader methods were not used the right way!?

Due to answer 1, I have changed the code:

reader->ReadStartElement( "root" );
reader->ReadStartElement( "Description" );
m_sVersion = reader->GetAttribute( "Version" );
m_sAuthor = reader->GetAttribute( "Author" );
m_sDescription = reader->ReadString();
reader->ReadEndElement();
reader->ReadEndElement();

Now, I don't get an exception and m_sDescription gets the right value but m_sVersion and m_sAuthor are still empty.

You have to call ReadStartElement for "root" before that.

reader->ReadStartElement( "root" );     
reader->ReadStartElement( "Description" );

Edit: Read attribute

reader->ReadToFollowing( "Description" );
reader->MoveToFirstAttribute();
String ^ m_sVersion = reader->Value;     
reader->MoveToNextAttribute();
String ^ m_sAuthor = reader->Value;           
String ^ m_sDescription = reader->ReadString();     
reader->ReadEndElement();

xslt V1.0 - subtemplate with recursive loop returns empty value

4 votes

I'm trying to get the highest value of the sum of the childs of each cluster.

  • cluster1 : 10 + 20 = 30

  • cluster2 : 20 + 30 = 50 --> 50 is highest value

Problem: The return value of the subtemplate is "".
why? The variable tempMax is getting a node with my number in it instaid of just a number.

$tempMax = {Dimension:[1]}
+ [1] = /
+ + node()[1] = 50

How can I fix this? (xslt v1.0).


xml:

<?xml version="1.0"?>
<column-chart-stacked-full>
<clusters>
    <cluster number="1">
        <bar>
            <value>10</value>
        </bar>
        <bar>
            <value>20</value>
        </bar>
    </cluster>
    <cluster number="2">
        <bar>
            <value>20</value>
        </bar>
        <bar>
            <value>30</value>
        </bar>
    </cluster>
</clusters>
</column-chart-stacked-full>

my xsl:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml">

    <xsl:variable name="highestClusterVal">
        <xsl:call-template name="findMaxClusterVal"/>
    </xsl:variable>

    <xsl:template name="findMaxClusterVal">
        <xsl:param name="count" select="count(column-chart-stacked-  full/clusters/cluster)"/>
        <xsl:param name="limit" select="$count"/>
        <xsl:param name="max" select="0"/>
        <xsl:choose>
          <xsl:when test="$count &gt; 0">
            <xsl:variable name ="barSum" select="sum(column-chart-stacked-full/clusters/cluster[$count]/bar/value)"/>
            <xsl:variable name="tempMax">
              <xsl:choose>
                <xsl:when test="$max &lt; $barSum">
                  <xsl:value-of select="$barSum"/>
                </xsl:when>
                <xsl:otherwise>
                  <xsl:value-of select="$max"/>
                </xsl:otherwise>
              </xsl:choose>
            </xsl:variable>
            <!-- recursive loop -->
            <xsl:call-template name="findMaxClusterVal">
              <xsl:with-param name="count" select="$count - 1"/>
              <xsl:with-param name="limit" select="$limit"/>
              <xsl:with-param name="max" select="$tempMax"/>
            </xsl:call-template>
          </xsl:when>
          <xsl:otherwise>
            <!-- return max value -->
            <xsl:value-of select="$max"/>
         </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

</xsl:stylesheet>

return of $max

$max = {Dimension:[1]}
+ [1] = /
+ + node()[1] = 50

You are missing the opposite case in assigning tempMax:

        <xsl:variable name="tempMax">
            <xsl:if test="$max &lt; $barSum">
                <xsl:value-of select="$barSum"/>
            </xsl:if>      
            <xsl:if test="$max >= $barSum">
                <xsl:value-of select="$max"/>
            </xsl:if>
        </xsl:variable>

This is how I've tested it (changed using xsl:choose as suggested by @Mads, even if is logically equivalent).

[XSLT 1.0] Tested with Saxon 6.5

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output indent="yes"/>

    <xsl:template match="/">
        <xsl:call-template name="findMaxClusterVal"/>
    </xsl:template>

    <xsl:template name="findMaxClusterVal">
        <xsl:param name="count" select="count(column-chart-stacked-full/clusters/cluster)"/>
        <xsl:param name="limit" select="$count"/>
        <xsl:param name="max" select="0"/>
        <xsl:if test="$count &gt; 0">
            <xsl:variable name ="barSum" select="sum(column-chart-stacked-full/clusters/cluster[$count]/bar/value)"/>
            <xsl:variable name="tempMax">
                <xsl:choose>
                    <xsl:when test="$max &lt; $barSum">
                        <xsl:value-of select="$barSum"/>
                    </xsl:when>
                    <xsl:otherwise>
                        <xsl:value-of select="$max"/>
                    </xsl:otherwise>
                </xsl:choose>
            </xsl:variable>
            <!-- recursive loop -->
            <xsl:call-template name="findMaxClusterVal">
                <xsl:with-param name="count" select="$count - 1"/>
                <xsl:with-param name="limit" select="$limit"/>
                <xsl:with-param name="max" select="$tempMax"/>
            </xsl:call-template>
        </xsl:if>
        <!-- return max value -->
        <xsl:if test="$count = 0">
            <xsl:value-of select="$max"/>
        </xsl:if>
    </xsl:template>

</xsl:stylesheet>

applied on the input provided in the question, returns 50.

Applied on this changed input:

<column-chart-stacked-full>
    <clusters>
        <cluster number="1">
            <bar>
                <value>10</value>
            </bar>
            <bar>
                <value>20</value>
            </bar>
        </cluster>
        <cluster number="2">
            <bar>
                <value>20</value>
            </bar>
            <bar>
                <value>30</value>
            </bar>
        </cluster>
                <cluster number="1">
            <bar>
                <value>10</value>
            </bar>
            <bar>
                <value>20</value>
            </bar>
        </cluster>
        <cluster number="2">
            <bar>
                <value>70</value>
            </bar>
            <bar>
                <value>30</value>
            </bar>
        </cluster>
    </clusters>
</column-chart-stacked-full>

Returns 100.

XPATH Selecting by position returns incoherent results

2 votes

I need some help with an issue I can't figure out.

I have the following xml:

<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="prueba.xsl"?>
<ficha>
<titulo></titulo>
<bloque>
    <texto></texto>
    <pregunta id="1" tipo="checkSN">
        <texto>Acredita curso bienestar animal minimo 20 h</texto>
    </pregunta>
    <pregunta id="2" tipo="texto">
        <texto>Sistemática inspección</texto>
    </pregunta>
    <grupo> 
        <texto>trato adecuado enfermos</texto>          
        <pregunta id="3" tipo="desplegableSNP">
            <texto>Recetas correspondientes</texto>
        </pregunta>
        <pregunta id="4" tipo="multiple">
            <texto>Disponen de comida y bebida</texto>
        </pregunta> 
    </grupo>
    <grupo>
        <texto>
            Heridos/Enfermos
        </texto>
        <pregunta id="5" tipo="multiple">
            <texto>Se aprecian heridos o enfermos momento inspeccion</texto>            
        </pregunta>
        <pregunta id="6" tipo="multiple">
            <texto>Separados del resto</texto>          
        </pregunta>
        <pregunta id="7" tipo="multiple">
            <texto>Disponen de comida y bebida</texto>          
        </pregunta>
        <pregunta id="8" tipo="multiple">
            <texto>Disponen de comida y bebida</texto>          
        </pregunta> 
    </grupo>        
</bloque>
<bloque>
    <texto>Condiciones específicas de alojamiento y manejo</texto>  
</bloque>
</ficha>

And The folliwng XSL sheet:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
 <html>
    <head>
      <link rel="stylesheet" type="text/css" href="prueba.css" />
    </head>
    <body>  
  <h2><xsl:value-of select="/ficha/titulo"/></h2>

  <h3>1: <xsl:value-of select="//pregunta[1]/@id"/></h3>
  <h3>2: <xsl:value-of select="//pregunta[2]/@id"/></h3>
  <h3>3: <xsl:value-of select="//pregunta[3]/@id"/></h3>
  <h3>4: <xsl:value-of select="//pregunta[4]/@id"/></h3>
  <h3>5: <xsl:value-of select="//pregunta[5]/@id"/></h3>
  <h3>6: <xsl:value-of select="//pregunta[6]/@id"/></h3>
  <h3>7: <xsl:value-of select="//pregunta[7]/@id"/></h3>
  <h3>8: <xsl:value-of select="//pregunta[8]/@id"/></h3>
  <h3>c: <xsl:value-of select="count(//pregunta)"/></h3>

     </body>
  </html>
</xsl:template>
</xsl:stylesheet>

When I load them I got this result:

1: 1 2: 2 3: 7 4: 8 5: 6: 7: 8: c: 8

I don't understand why it's ignoring some nodes . If I include new nodes or move them, it always shows 4 results, from node at position 5 to 8 it never shows anything. I need to use this type of selecting because it's from a Java application, the stylesheet is just for testing.

Put //pregunta in parenthesis. Change your XPath expressions to (//pregunta)[1]/@id, (//pregunta)[2]/@id ...

Without parenthesis e.g. //pregunta[4] evaluates to all pregunta elements which are at the fourth position of their parent element.

However (//pregunta)[4] first calculates the sequence of all pregunta elements and then takes the fourth element of that sequence.

Select elements with unique values

2 votes

I'm trying to parse an OpenOffice spreadsheet to obtain rows with unique values in the first column.

I.E., I would like to retrieve from the following XML fragment all <table:table-row> elements with unique <text:p> values in the first child <table:table-cell>.

    <table:table table:name="foo">
        <table:table-row>
            <table:table-cell>
                <text:p>1</text:p>
            </table:table-cell>
            <table:table-cell>
                <text:p>foo</text:p>
            </table:table-cell>
        </table:table-row>
        <table:table-row>
            <table:table-cell>
                <text:p>2</text:p>
            </table:table-cell>
            <table:table-cell>
                <text:p>bar</text:p>
            </table:table-cell>
        </table:table-row>
        <table:table-row>
            <table:table-cell>
                <text:p>1</text:p>
            </table:table-cell>
            <table:table-cell>
                <text:p>baz</text:p>
            </table:table-cell>
        </table:table-row>
    </table:table>

I'll like to get the below output as Nodes

        <table:table-row>
            <table:table-cell>
                <text:p>1</text:p>
            </table:table-cell>
            <table:table-cell>
                <text:p>foo</text:p>
            </table:table-cell>
        </table:table-row>
        <table:table-row>
            <table:table-cell>
                <text:p>2</text:p>
            </table:table-cell>
            <table:table-cell>
                <text:p>bar</text:p>
            </table:table-cell>
        </table:table-row>

How can I do this with XPath?

This XPath produces desired output: /table:table/table:table-row[not(./table:table-cell[1]/text:p/text() = preceding-sibling::table:table-row/table:table-cell[1]/text:p/text())]

Getting XML with Grails REST plugin

2 votes

I'm having trouble working with the REST plugin in grails. Specifically I am trying to convert xml from a get request into a Map in a controller.

The data source I am trying to get data from returns XML that looks like this (this is shortened for simplicity):

<process id="345">
    <correctedBy>Joanne W.</correctedBy>
    <editBy>Joanne W.</editBy>
    <editDate>2009-12-23 00:00:00.0 EST</editDate>
    <produceBy>Stephen</produceBy>
    <produceDate>2010-01-14 00:00:00.0 EST</produceDate>
</process>

In my controller I have code to make the get request to this service

def getRest = {
        def wfRequest
        withHttp(uri: "http://myurl:8080") {
               wfRequest = get(path : '/application/controller/' + params.id,
                   requestContentType: XML) { resp, xml ->
                        render xml
                   }
        }
}

Ok so far, this will return the data from the xml, but all the tags are gone:

Joanne W.Joanne W.2009-12-23 00:00:00.0 ESTStephen2010-01-14 00:00:00.0 EST

Can anyone point me in the right direction on how to access the XML that is returned from this request? I'd like to step through each kay value pair in the "process" node of the xml and populate a map that would look like

[correctedBy: Joanne W., editBy: Joanne W., editDate: 2009-12-23 00:00:00.0 EST, produceBy: Stephen, produceDate: 2010-01-14 00:00:00.0 EST]

I'm finding the rest plugin documentation a little confusing, any help would be GREATLY appreciated.

Thanks!

Donald

It makes sense that when you say render xml it doesn't show the tags. At this point, XML is an XmlSlurper object, so it's just calling the toString().

See this for more information.

So since you have an XmlSlurper, you just need to use it.

'Characters only' check in xsl string?

2 votes

How can i check if a string include only characters (XSLT File)?

<xsl:variable name="IsValid1">
  <xsl:choose>
    <xsl:when test="string-length(//FirstName) &gt; 0 and string-length(//LastName) &gt; 0 and substring(//FirstName, 1, 3) != 'TST' and XXXX//FirtName only charactersXXXXX ">
    </xsl:when>
    <xsl:otherwise>
    </xsl:otherwise>
  </xsl:choose>
</xsl:variable>

In XPath 1.0 (XSLT 1.0) you can use contains(). In XPath 2.0 (XSLT 2.0) you use matches().

You may want for example check for alphabetic characters only (no numerics, no other signs, no spaces):

matches(//FirstName, '^[a-zA-Z]+$')

or alphanumeric,

matches(//FirstName, '^[a-zA-Z0-9]+$')