Best xml questions in February 2011

Can IE manipulate XML using jQuery?

9 votes

What I am not trying to do:

  • Simply "read" XML in IE using jQuery. Been there, done that. Works for the most part.
  • Load XML via AJAX. This is a legacy system using XML in a hidden field (oh yah, baby!) between postbacks to store a wizard data structure. Rewriting it would suck.

What I am trying to do:

  • Manipulate the XML document using jQuery in IE
  • Use the same code across all browsers, using the native jQuery functionality

What I would be okay with:

  • Overriding/overloading the same jquery methods to get them to work in IE when manipulating the XML DOM.

It just doesn't work and I feel like it just isn't possible in a 100% cross browser way using plain old jQuery methods.

Case in point:

<!DOCTYPE html>
<html>
<head>
    <title>IE Sucks</title>
    <script src="Scripts/jquery-1.5.min.js" type="text/javascript"></script>
    <script type="text/javascript">
        var xml =
            '<Browsers>' +
                '<CoolBrowsers>' +
                    '<Browser name="Opera"></Browser>' +
                    '<Browser name="Chrome"></Browser>' +
                    '<Browser name="Firefox"></Browser>' +
                '</CoolBrowsers>' +
                '<ShitBrowsers>' +
                    '<Browser name="IE6"></Browser>' +
                '</ShitBrowsers>' +
            '</Browsers>';

        $(function () {

            $("#xml").text(xml);

            var uncoolBrowser = $("<Browser />").attr("name", "IE7");

            // In 1.5, using this...
            var $xml = $($.parseXML(xml));

            // Nope. Works everywhere else, though!
            // var $xml = $(xml);     


            // Throws a "Type mismatch"
            // Works everywhere except IE
            // This is case sensitive (??? WTF ???)
            // Lowercase "shitbrowsers" nothing happens
            // Uppercase "SHITBROWSERS" nothing happens
            // Best part? $xml.find("ShitBrowsers").length === 1
            $xml.find("ShitBrowsers").append(uncoolBrowser);

            // Only way to output XML in IE
            $("#result").text($xml[0].xml);

            // Fuggetaboutit
            // Technically, it does work in IE but not when using $.parseXML()
            // $("#result").text($("<div></div>").append($xml.clone()).html());
        });
    </script>
</head>
<body>
    <pre id="xml"></pre>
    <pre id="result"></pre>
</body>
</html>

Is it possible? Can this simple scenario be done or has IE just forsaken us all? $(xml).everything, etc. works in FF, Opera, Chrome, and Safari.

Update

It is possible using voodoo magicks.

I've created a jQuery plugin that takes care of reconciling the differences between different browser handling of XML. I also made an .xml() function based on similar code elsewhere, though mine fixes an IE-only issue. This works in all browsers, IE7 & IE8 for sure, can't test IE6.

I have posted this on my github. If anyone has suggestions or improvements, let me know. There are several things I've already run into but I have been fixing them as I run into them.

This is more of a guess as I don't know offhand what .parseXml does but IE needs createElement for unknown node names. Can you try document.createElement('ShitBrowsers') for every new node you are going to manipulate?

This is the case with HTML5 and that's why there are shiv scripts. You can try just taking this:

http://html5shiv.googlecode.com/svn/trunk/html5.js

Copying it, append your new node names to var z, and then:

<!--[if lt IE 9]>
<script src="file.js"></script>
<![endif]-->

Create a graph image (png, jpg ..) from an XML file with Java

6 votes

Hello, I have an XML file and I want to create a graph with some entities, then store this graph in an image, JPG or PNG.

So is there a library in Java do like this?? Or is there some tricks by parsing XML files and ... ???

Here an example XML file:

<?xml version="1.0"?>
<process>
  <p n=1>Tove</p> 
  <p n=2>Jani</p> 
  <p n=2>Bill</p> 
  <p n=4>John</p> 
</process>

And the output will be like this:

enter image description here

You can extract the names using one of the myriad of Java XML libraries. Here's an example using XPath from a Java DOM:

private static List<String> findNames(Document doc)
                                           throws XPathExpressionException {
  XPath xpath = XPathFactory.newInstance().newXPath();
  NodeList nodes = (NodeList) xpath.evaluate("/process/p", doc, 
                                                    XPathConstants.NODESET);
  List<String> names = new ArrayList<String>();
  for (int i = 0; i < nodes.getLength(); i++) {
    names.add(nodes.item(i).getTextContent());
  }
  return names;
}

Note: it may be a typo, but your XML is not well formed - attribute values must be quoted. XML parsing will fail otherwise.

some boxes

You can use the AWT API to draw whatever you want:

private static final int BORDER = 1;
private static final int PADDING = 2;
private static final int SPACER = 5;

private static void draw(Graphics2D g, List<String> names) {
  FontMetrics metrics = g.getFontMetrics();
  Rectangle box = new Rectangle(1, 1, 0, 0);
  box.height = metrics.getHeight() + (PADDING * 2);
  g.setColor(Color.WHITE);
  for (String name : names) {
    box.width = metrics.stringWidth(name) + (PADDING * 2);
    g.drawString(name, box.x + BORDER + PADDING, PADDING + BORDER +
                                                    metrics.getHeight());
    g.drawRect(box.x, box.y, box.width, box.height);
    box.x += box.width + (BORDER * 2) + SPACER;
  }
}

This code just draws the names with some boxes around them. I'm sure my offsets are all over the place, but you probably get the idea.

There is an imageio API that can save in a few popular data formats:

private static void save(List<String> names, File file) throws IOException {
  BufferedImage image = new BufferedImage(600, 50, BufferedImage.TYPE_INT_RGB);
  Graphics2D g = image.createGraphics();
  try {
    draw(g, names);
  } finally {
    g.dispose();
  }
  ImageIO.write(image, "png", file);
}

How do I reference array values within string.Format?

6 votes

I am using XPath to exclude certain nodes within a menu. I want to expand on this to exclude nodes identified within an array.

This works to exclude all the nodes in the menu with id 2905 whose type is not content:

XmlNodeList nextLevelNodeList = currentNode
                                   .SelectNodes(string
                                                   .Format("
                                           Menu[not(MenuId = 2905)]
                                              /Item[
                                                 ItemLevel = {0} 
                                                    and 
                                                 ItemType != 'Javascript'
                                               ] | 
                                           Menu[MenuId = 2905]
                                              /Item[
                                                 ItemLevel = {0} 
                                                    and
                                                 ItemType = 'content'
                                               ]", iLevel));

What I'd like is to store the menuId and several others in an array and then reference that array within the string.Format function

Something like:

int[] excludeSubmenus = {2905, 323};
XmlNodeList nextLevelNodeList = currentNode
                                   .SelectNodes(string
                                                   .Format("
                                         Menu[not(MenuId in excludesubMenus)]
                                            /Item[
                                               ItemLevel={0} 
                                                  and 
                                               ItemType != 'Javascript'
                                             ] | 
                                         Menu[MenuId in excludeSubMenus]
                                            /Item[
                                               ItemLevel={0} 
                                                  and 
                                               ItemType='content'
                                             ]", iLevel));

Any advice would be greatly appreciated!

ta Nathan

Edit - include example xml

<Item>
    <ItemId>322</ItemId> 
    <ItemType>Submenu</ItemType> 
    <ItemLevel>2</ItemLevel> 
    <Menu>
        <MenuId>322</MenuId> 
        <MenuLevel>2</MenuLevel> 
        <Item>
            <ItemId>2905</ItemId> 
            <ItemType>Submenu</ItemType> 
            <ItemLevel>3</ItemLevel> 
            <Menu>
                <MenuId>2905</MenuId> 
                <MenuLevel>3</MenuLevel> 
                <Item>
                    <ItemId>19196</ItemId> 
                    <ItemType>content</ItemType> 
                    <ItemLevel>4</ItemLevel> 
                </Item>
                <Item>
                    <ItemId>19192</ItemId> 
                    <ItemType>Submenu</ItemType> 
                    <ItemLevel>4</ItemLevel> 
                </Item>
            </Menu>
        </Item>
        <Item>
            <ItemId>2906</ItemId> 
            <ItemType>Submenu</ItemType> 
            <ItemLevel>3</ItemLevel> 
            <Menu>
                <MenuId>323</MenuId> 
                <MenuLevel>3</MenuLevel> 
                <Item>
                    <ItemId>2432</ItemId> 
                    <ItemType>content</ItemType> 
                    <ItemLevel>4</ItemLevel> 
                </Item>
                <Item>
                    <ItemId>12353</ItemId> 
                    <ItemType>Submenu</ItemType> 
                    <ItemLevel>4</ItemLevel> 
                </Item>
            </Menu>
        </Item>
    </Menu>
</Item>

Use:

int[] excludeSubmenus = {2905, 323};

string notExpr = string.Empty;

for(int i=0; i < excludeSubmenus.Length; i++)
   {
    notExpr += string.Format("not(MenuId={0})", excludeSubmenus[i]);

    if(i != excludeSubmenus.Count-1)
       notExpr += " and ";
   } 

 XmlNodeList nextLevelNodeList = 
    currentNode.SelectNodes(
       string.Format("//Menu[MenuId in excludeSubMenus]/Item
                              [ItemLevel={1} and not(ItemType='Javascript')]",
                      notExpr, iLevel)
                     ); 

Do note: In the above code the strings have been split into different lines to enhance readability. In ypur code you must not split any string , or use the string + (concatenation) operator to achieve the same effect.

Google's Indexing XSLT Pages

6 votes

My site has been created with an XML as a data store, and XSLT used as a template. It appears that Google is not very good on indexing sites that are XML/XSLT based. Are there any efficient/easy to implement software components that can render the XSLT just for the Google bot indexer? It would be even better if they worked with PHP.

Take a look at the PHP XSLT processor.

http://php.net/manual/en/class.xsltprocessor.php

Use as follows:

<?php 
$sXml  = "<xml>"; 
$sXml .= "<sudhir>hello sudhir</sudhir>"; 
$sXml .= "</xml>"; 

# LOAD XML FILE 
$XML = new DOMDocument(); 
$XML->loadXML( $sXml ); 

# START XSLT 
$xslt = new XSLTProcessor(); 
$XSL = new DOMDocument(); 
$XSL->load( 'xsl/index.xsl', LIBXML_NOCDATA); 
$xslt->importStylesheet( $XSL ); 
#PRINT 
print $xslt->transformToXML( $XML ); 
?>

(From http://php.net/manual/en/book.xsl.php)

UPDATE

You asked in the comment how to intercept a request from a specific user agent (eg. the Googlebot). There are various ways to do this, depending on the web server technology you are using.

On Apache, one method would be to use mod_rewrite to internally divert the processing of the request to a PHP script containing code similar to what we see above. This script retrieves the XML from the originally requested URL and renders the transformation to the client. The rewrite rule would have a Rewrite Condition that compares the HTTP_USER_AGENT header to Google's. Here is an example of the rule (untested, but you should get the idea):

RewriteCond %{HTTP_USER_AGENT} ^(.*)Googlebot(.*)$ [NC]
RewriteRule ^(.*\.xml.*)$ /renderxslt.php?url=$1 [L]

Briefly, the condition is looking for a referrer starting with the string "googlebot" and the rewrite rule is matching any URL with the string ".xml" in it, and passing the full URL to the renderxslt.php page as a querystring parameter.

A port of mod_rewrite exis for IIS too (http://www.isapirewrite.com/).

Alternatively, with IIS you could use an ASP.NET HTTP module to intercept the request, again checking Request.Headers["HTTP_USER_AGENT"] for Google's signature. You can then proceed in a similar manner to above by reading the HTML generated by your PHP script, or altenatively by using the ASP.NET XML control:

<asp:Xml ID="Xml1" runat="server" DocumentSource="~/cdlist.xml" TransformSource="~/listformat.xsl"></asp:Xml>

Convert the first HTML table row into a heading row for each table using XSLT

5 votes

I have some html content inside my XML. Previously I could just use <xsl:copy-of select="customFields/customField[@name='mainContent']/html"/> to pull the content into the correct area. A new requirement is to convert the first <tr> inside each table's <tbody> into a set of thead/tr/th.

I am confused on how to convert, in fact not even shore where to start:

...

<customField name="mainContent" type="Html">
    <html>
        <h1>Page Heading</h1>
        <p>Gusto te minim tempor elit quam. Dolore vel accumsan parum option me. Demonstraverunt congue nisl soluta tincidunt seacula. Soluta saepius demonstraverunt praesent claritatem mutationem. Modo te ullamcorper vel augue veniam. Nunc investigationes dolor iriure typi in.</p>
        <p>Gusto te minim tempor elit quam. Dolore vel accumsan parum option me. Demonstraverunt congue nisl soluta tincidunt seacula. Soluta saepius demonstraverunt praesent claritatem mutationem. Modo te ullamcorper vel augue veniam. Nunc investigationes dolor iriure typi in.</p>
        <table cellspacing="0" cellpadding="0" summary="" border="0">
            <tbody>
                <tr>
                    <td>Heading 1</td>
                    <td>Heading 2</td>
                    <td>Heading 3</td>
                </tr>
                <tr>
                    <td>sample</td>
                    <td>sample</td>
                    <td>sample</td>
                </tr>
                <tr>
                    <td>sample</td>
                    <td>sample</td>
                    <td>sample</td>
                </tr>
                <tr>
                    <td>sample</td>
                    <td>sample</td>
                    <td>sample</td>
                </tr>
                <tr>
                    <td>sample</td>
                    <td>sample</td>
                    <td>sample</td>
                </tr>
            </tbody>
        </table>
    </html>
</customField>
...

into:

...
<customField name="mainContent" type="Html">
    <html>
        <h1>Page Heading</h1>
        <p>Gusto te minim tempor elit quam. Dolore vel accumsan parum option me. Demonstraverunt congue nisl soluta tincidunt seacula. Soluta saepius demonstraverunt praesent claritatem mutationem. Modo te ullamcorper vel augue veniam. Nunc investigationes dolor iriure typi in.</p>
        <p>Gusto te minim tempor elit quam. Dolore vel accumsan parum option me. Demonstraverunt congue nisl soluta tincidunt seacula. Soluta saepius demonstraverunt praesent claritatem mutationem. Modo te ullamcorper vel augue veniam. Nunc investigationes dolor iriure typi in.</p>
        <table cellspacing="0" cellpadding="0" summary="" border="0">
            <thead>
                <tr>
                    <th>Heading 1</th>
                    <th>Heading 2</th>
                    <th>Heading 3</th>
                </tr>
            </thead>
            <tbody>
                <tr>
                    <td>sample</td>
                    <td>sample</td>
                    <td>sample</td>
                </tr>
                <tr>
                    <td>sample</td>
                    <td>sample</td>
                    <td>sample</td>
                </tr>
                <tr>
                    <td>sample</td>
                    <td>sample</td>
                    <td>sample</td>
                </tr>
                <tr>
                    <td>sample</td>
                    <td>sample</td>
                    <td>sample</td>
                </tr>
            </tbody>
        </table>
    </html>
</customField>
...

I have some html content inside my XML. Previously I could just use <xsl:copy-of select="customFields/customField[@name='mainContent']/html"/> to pull the content into the correct area. A new requirement is to convert the first <tr> inside each table's <tbody> into a set of thead/tr/th.

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="tbody/tr[1]">
  <thead>
    <tr>
      <xsl:apply-templates/>
    </tr>
  </thead>
 </xsl:template>

 <xsl:template match="tbody/tr[1]/td">
  <th><xsl:apply-templates/></th>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<customField name="mainContent" type="Html">
    <html>
        <h1>Page Heading</h1>
        <p>Gusto te minim tempor elit quam. Dolore vel accumsan parum option me. Demonstraverunt congue nisl soluta tincidunt seacula. Soluta saepius demonstraverunt praesent claritatem mutationem. Modo te ullamcorper vel augue veniam. Nunc investigationes dolor iriure typi in.</p>
        <p>Gusto te minim tempor elit quam. Dolore vel accumsan parum option me. Demonstraverunt congue nisl soluta tincidunt seacula. Soluta saepius demonstraverunt praesent claritatem mutationem. Modo te ullamcorper vel augue veniam. Nunc investigationes dolor iriure typi in.</p>
        <table cellspacing="0" cellpadding="0" summary="" border="0">
            <tbody>
                <tr>
                    <td>Heading 1</td>
                    <td>Heading 2</td>
                    <td>Heading 3</td>
                </tr>
                <tr>
                    <td>sample</td>
                    <td>sample</td>
                    <td>sample</td>
                </tr>
                <tr>
                    <td>sample</td>
                    <td>sample</td>
                    <td>sample</td>
                </tr>
                <tr>
                    <td>sample</td>
                    <td>sample</td>
                    <td>sample</td>
                </tr>
                <tr>
                    <td>sample</td>
                    <td>sample</td>
                    <td>sample</td>
                </tr>
            </tbody>
        </table>
    </html>
</customField>

produces exactly the wanted, correct result:

<customField name="mainContent" type="Html">
   <html>
      <h1>Page Heading</h1>
      <p>Gusto te minim tempor elit quam. Dolore vel accumsan parum option me. Demonstraverunt congue nisl soluta tincidunt seacula. Soluta saepius demonstraverunt praesent claritatem mutationem. Modo te ullamcorper vel augue veniam. Nunc investigationes dolor iriure typi in.</p>
      <p>Gusto te minim tempor elit quam. Dolore vel accumsan parum option me. Demonstraverunt congue nisl soluta tincidunt seacula. Soluta saepius demonstraverunt praesent claritatem mutationem. Modo te ullamcorper vel augue veniam. Nunc investigationes dolor iriure typi in.</p>
      <table cellspacing="0" cellpadding="0" summary="" border="0">
         <tbody>
            <thead>
               <tr>
                  <th>Heading 1</th>
                  <th>Heading 2</th>
                  <th>Heading 3</th>
               </tr>
            </thead>
            <tr>
               <td>sample</td>
               <td>sample</td>
               <td>sample</td>
            </tr>
            <tr>
               <td>sample</td>
               <td>sample</td>
               <td>sample</td>
            </tr>
            <tr>
               <td>sample</td>
               <td>sample</td>
               <td>sample</td>
            </tr>
            <tr>
               <td>sample</td>
               <td>sample</td>
               <td>sample</td>
            </tr>
         </tbody>
      </table>
   </html>
</customField>

Do note:

The "overriden identity rule" design pattern is used. This is the most fundamental and powerful XSLT design pattern.

UPDATE:

As noticed by Flynn1179, the OP's definition of the problem (above) is inconsistent with the output he provides as wanted result. In this output not only is the first tr inside of the tbody converted to thead/tr (and its td children to th), but the thead is moved outside of the tbody.

In case this is really what the OP wants, here is modified solution also for this case:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="tbody/tr[1]">
  <thead>
   <tr>
    <xsl:apply-templates/>
   </tr>
  </thead>
  <tbody>
   <xsl:apply-templates 
        select="following-sibling::tr"/>
  </tbody>
 </xsl:template>

 <xsl:template match="tbody/tr[1]/td">
  <th>
   <xsl:apply-templates/>
  </th>
 </xsl:template>

 <xsl:template match="tbody">
  <xsl:apply-templates select="tr[1]"/>
 </xsl:template>
</xsl:stylesheet>

when applied on the same XML document, the result is:

<customField name="mainContent" type="Html">
   <html>
      <h1>Page Heading</h1>
      <p>Gusto te minim tempor elit quam. Dolore vel accumsan parum option me. Demonstraverunt congue nisl soluta tincidunt seacula. Soluta saepius demonstraverunt praesent claritatem mutationem. Modo te ullamcorper vel augue veniam. Nunc investigationes dolor iriure typi in.</p>
      <p>Gusto te minim tempor elit quam. Dolore vel accumsan parum option me. Demonstraverunt congue nisl soluta tincidunt seacula. Soluta saepius demonstraverunt praesent claritatem mutationem. Modo te ullamcorper vel augue veniam. Nunc investigationes dolor iriure typi in.</p>
      <table cellspacing="0" cellpadding="0" summary="" border="0">
         <thead>
            <tr>
               <th>Heading 1</th>
               <th>Heading 2</th>
               <th>Heading 3</th>
            </tr>
         </thead>
         <tbody>
            <tr>
               <td>sample</td>
               <td>sample</td>
               <td>sample</td>
            </tr>
            <tr>
               <td>sample</td>
               <td>sample</td>
               <td>sample</td>
            </tr>
            <tr>
               <td>sample</td>
               <td>sample</td>
               <td>sample</td>
            </tr>
            <tr>
               <td>sample</td>
               <td>sample</td>
               <td>sample</td>
            </tr>
         </tbody>
      </table>
   </html>
</customField>

How do I add a namespace when creating an XML file?

5 votes

I have to create an XML document in C#.

The root element has to look like this:

<valuation-request 
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
     xsi:noNamespaceSchemaLocation="valuations.xsd">

I'm using the following

XmlElement root = X.CreateElement("valuation-request");
root.SetAttribute("xmlns:xsi", "http://www.w3.org/2001/XMLSchema-instance");
root.SetAttribute("xsi:noNamespaceSchemaLocation", "valuations.xsd");

However this produces

<valuation-request 
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
     noNamespaceSchemaLocation="valuations.xsd"> //missing the xsi:

What am I missing?

Use the overload of SetAttribute, that takes namespace as well:

root.SetAttribute("noNamespaceSchemaLocation", 
    "http://www.w3.org/2001/XMLSchema-instance", 
    "valuations.xsd"
); 

How can I use Nokogiri to write a HUGE XML file?

5 votes

I have a Rails application that uses delayed_job in a reporting feature to run some very large reports. One of these generates a massive XML file and it can take literally days in the bad, old way the code is written. I thought that, having seen impressive benchmarks on the internet, Nokogiri could afford us some nontrivial performance gains.

However, the only examples I can find involve using the Nokogiri Builder to create an xml object, then using .to_xml to write the whole thing. But there isn't enough memory in my zip code to handle that for a file of this size.

So can I use Nokogiri to stream or write this data out to file?

Nokogiri is designed to build in memory because you build a DOM and it converts it to XML on the fly. It's easy to use, but there are trade-offs, and doing it in memory is one of them.

You might want to look into using Erubis to generate the XML. Rather than gather all the data before processing and keeping the logic in a controller, like we'd do with Rails, to save memory you can put your logic in the template and have it iterate over your data, which should help with the resource demands.

If you need the XML in a file you might need to do that using redirection:

erubis options templatefile.erb > xmlfile

This is a very simple example, but it shows you could easily define a template to generate XML:

<% 
asdf = (1..5).to_a 
%>
<xml>
  <element>
<% asdf.each do |i| %>
    <subelement><%= i %></subelement>
<% end %>
  </element>
</xml>

which, when I call erubis test.erb outputs:

<xml>
  <element>
    <subelement>1</subelement>
    <subelement>2</subelement>
    <subelement>3</subelement>
    <subelement>4</subelement>
    <subelement>5</subelement>
  </element>
</xml>

EDIT:

The string concatenation was taking forever...

Yes, it can simply because of garbage collection. You don't show any code example of how you're building your strings, but Ruby works better when you use << to append one string to another than when using +.

It also might work better to not try to keep everything in a string, but instead to write it immediately to disk, appending to an open file as you go.

Again, without code examples I'm shooting in the dark about what you might be doing or why things run slow.

How can i create a xml document based on a schema using php?

5 votes

I want to create a xml file from the data given by the users.

I can use simplexml or DOMDocument for creating xml files and even there is an option in DOMDocument to verify the xml document with a schema .

But what i need is instead of creating nodes and adding values using xml classes, can i create a xml file from the data stored somewhere else in respect with a schema?

I think in .net there is an option to write into xml from reading from dataset.. But i couldn't find such thing in PHP.

Is that possible and are there any classes for that?

If there are no predefined classes , at least any help on any ways of doing that?

Edit:

I'm editing this question because it seems that some of you is not clear about my requirement..

For example, if the schema is

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
attributeFormDefault="unqualified">
<xs:element name="formpatterns">
<xs:element name="pan" type="pantype"/>
<xs:element name="name" type="nametype"/>
<xs:element name="fatherName" type="nametype"/>
<xs:group ref="address"/>
<xs:element name="dob" type="xs:date"/>
</xs:schema>

and if user gives the data for pan, name, fathername, address, dob then i need to create the xml document automatically by matching the schema and data.

The schema may change from time to time , so i dont want to edit all the code to create/ change nodes and attributes. I need just to point the new schema , so that the code creates the xml based on that.

Have a look at https://github.com/moyarada/XSD-to-PHP it compiles PHP bindings from XSD, and then you can serialize PHP classes to XML.

C# deserializing xml with multiple possible namespaces

5 votes

I created an API wrapper class library for consuming a rest API from a 3rd party.

It was all working until they recently updated the API in the latest version of their product and added a namespace to the root element, now my deserialization code is failing.

An example of one of my classes:

[Serializable]
[XmlRootAttribute(ElementName = "exit_survey_list")]
public class SupportExitSurveyCollection : ApiResult { .... }

If I set the Namespace property in the XmlRootAttribute to the new namespace being returned, then it works properly again.

But I need to support both versions of the API (namespaced and not) because I cannot be sure which version of the API will be available.

I'd like to get this working without duplicating classes for different versions, but not sure if it's possible.

Thanks for any input/advice.

I don't think that is possible.

You could implement the IXmlSerializable interface, and control serialization yourself - that would work but it is probably not what you want, since it would require you to do a lot of the mapping yourself in code.

Another option would be to pre-process the messages and add the namespace if it is missing. Then you can have a single deserialization process.

Processing a large xml file with perl

5 votes

I have an XML file which is about 200MB in size, i wish to extract selected information on a line by line bases.

I have written a script with perl using the module XML::LibXML to parse the file contents in and then loop the contents and extract the information line by line. This is ineffective as it reads in the whole file to memory, but I like LibXML as I can use the XPath locations of the information i require.

Can I get suggestions for ways to make my code more effective.

Through searching i have been made aware of XML::SAX and XML::LibXML::SAX but i cannot find documentation which explains the usage and they don't seem to include any type of XPath addressing structure.

Have you considered the XML::Twig module, which is much more efficient for large file processing, as it states in the CPAN module description:

NAME

XML::Twig - A perl module for processing huge XML documents in tree mode.

SYNOPSIS

...

It allows minimal resource (CPU and memory) usage by building the tree only for the parts of the documents that need actual processing, through the use of the twig_roots and twig_print_outside_roots options.

...

SecurityElement.IsValidText returns true on "&" ... why?

5 votes

I have a TextBox that is eventually saved in a xml node. I am using the SecurityElement.Escape(string2Escape) to escape the invalid characters before saving the xml.

Problem: I tried using the IsValidText to test if i need to run the escape method, but it returns ''' and '&' as valid but then when you save the xml the system barfs because they are, in fact, not valid. It seems to only return false on '<' or '>'.

Simple solution, remove the check, but my question is why would this be the case?

The following is my failing code:

private string EscapeXML(string nodeText)
{
    if (!SecurityElement.IsValidText(nodeText))
    {
        return SecurityElement.Escape(nodeText);
    }
    return nodeText;
}

The SecurityElement constructor is apparently already doing some escaping on its own (including the "&" character), so the IsValidText seems to be only checking for the characters the constructor is not already taking care of. As a consequence, it doesn't look safe to use the SecurityElement's IsValidText/Escape combo, unless you're using SecurityElement to build the whole xml.

I'll try to explain better with an example:

using System;
using System.Diagnostics;
using System.Security;

class MainClass
{
    public static void Main (string[] args)
    {
        // the SecurityElement constructor escapes the & all by itself 
        var xmlRoot =
            new SecurityElement("test","test &");

        // the & is escaped without SecurityElement.Escape 
        Console.WriteLine (xmlRoot.ToString());

        // this would throw an exception (the SecurityElement constructor
        // apparently can't escape < or >'s
        // var xmlRoot2 =
        //    new SecurityElement("test",@"test & > """);

        // so this text needs to be escaped before construction 
        var xmlRoot3 =
            new SecurityElement("test",EscapeXML(@"test & > """));
        Console.WriteLine (xmlRoot3.ToString());

    }

    private static string EscapeXML(string nodeText)
    {
        return (SecurityElement.IsValidText(nodeText))?
            nodeText :
            SecurityElement.Escape(nodeText);
    }
}

How to tag code in C# XML documentation

5 votes

I have this function:

public static string Join(this IEnumerable<string> strings, string separator)
{
    return string.Join(separator, strings.ToArray());
}

which I want to document.
I want the <return> tag to say string.Join(separator, strings.ToArray()) since to anyone able to read C# code this says more than a thousand words. However, when I use

<return>string.Join(separator, strings.ToArray())</return>

then string.Join(separator, strings.ToArray()) will be formatted as plain text, which makes it almost unreadable. So I tried

<return><code>string.Join(separator, strings.ToArray())</code></return>

but this always creates a new paragraph...

So here's my question:
Is there a way to format a piece of text so that it appears as if it were code? I'd be satisfied with a fixed-width font.

The <c> tag sounds like it's what you're looking for. Check out MSDN's tag reference for more details.

That said, are you sure you want the documentation to refer directly to the actions performed by the function? What if you decide to change the implementation later? I know this is a pretty trivial example, but food for thought! :)

sorting entire xdocument based on subnodes

5 votes

Hello all,

I have an xml of the following format:

<?xml version="1.0" encoding="utf-8"?>
<contactGrp name="People">
  <contactGrp name="Developers">
    <customer name="Mike" ></customer>
    <customer name="Brad" ></customer>
    <customer name="Smith" ></customer>
  </contactGrp>
  <contactGrp name="QA">
    <customer name="John" ></customer>
    <customer name="abi" ></customer>
  </contactGrp>
</contactGrp>

I'd like to sort the list of customers based on their names, and return the document in the following format:

<?xml version="1.0" encoding="utf-8"?>
<contactGrp name="People">
  <contactGrp name="Developers">
    <customer name="Brad" ></customer>
    <customer name="Mike" ></customer>
    <customer name="Smith" ></customer>
  </contactGrp>
  <contactGrp name="QA">
    <customer name="abi" ></customer>
    <customer name="John" ></customer>
  </contactGrp>
</contactGrp>

I am using c# and currently xmldocument.

thank you

If you want to have a stylesheet and use it to transform the document then:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl"
>
    <xsl:output method="xml" indent="yes"/>

  <xsl:template match="/contactGrp">
    <contactGrp name="Developers">
      <xsl:apply-templates select="contactGrp"/>
    </contactGrp>
  </xsl:template>

  <xsl:template match="contactGrp/contactGrp">
    <contactGrp>
      <xsl:attribute name="name">
        <xsl:value-of select="@name"/>
      </xsl:attribute>

      <xsl:for-each select="customer">
        <xsl:sort select="@name"/>
        <xsl:copy-of select="."/>
      </xsl:for-each>

    </contactGrp>
  </xsl:template>

</xsl:stylesheet>

SyncML with Android and PHP Web Service

4 votes

I was just curious if anyone has used SyncML (Synchronization Markup Language) and if it's a good standard to use.

We'd need it for synchronising information from a tablet device to a web server (via web service) and vice versa.

Is SyncML too bloated? I was looking at some of the SyncML APIs and was quite daunting. So the big choice is to use this standard or build an in-house solution.

Even if I did do it in-house, we'd have to create some sort of way to define the data we're sending up, so definitely looking at building an XML schema, or alternatively use JSON.

Any opinions? Ideas?

SyncML and ActiveSync (and possibly some other prepared solution) have a signifficant advantage: There are some implementations that are probably stable. Another signifficant advantage is that the protocols are designed and tested. If you design your own protocol, you'll have to think about all possible situations in synchronization. So, even if your own protocol can be a bit simpler, you probably have to do more work and the result can be less stable.

Process XML in C# using external entity file

4 votes

I am processing an XML file (which does not contain any dtd or ent declarations) in C# that contains entities such as &eacute; and &agrave;. I receive the following exception when attempting to load an XML file...

XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(record);

Reference to undeclared entity 'eacute'.

I was able to track down the proper ent file here. How do I tell XmlDocument to use this ent file when loading my XML file?

This works ...

var settings = new XmlReaderSettings();

settings.ProhibitDtd = false;

string DTD = @"<!DOCTYPE doc [
    <!ENTITY % iso-lat1 PUBLIC ""ISO 8879:1986//ENTITIES Added Latin 1//EN//XML""
    ""http://www.oasis-open.org/docbook/xmlcharent/0.3/iso-lat1.ent"">
    %iso-lat1;
    ]> ";

string xml = string.Concat(DTD,"<xml><txt>ren&eacute;</txt></xml>");

XmlDocument xd = new XmlDocument();
xd.Load(XmlReader.Create(new MemoryStream(
        UTF8Encoding.UTF8.GetBytes(xml)), settings));

Looping over a large XML file

4 votes

I'm having problems looping over an XML file about 20-30 MB (650000 rows).

This is my meta-code:

<cffile action="READ" ile="file.xml" variable="usersRaw">

<cfset usersXML = XmlParse(usersRaw)>
<cfset advsXML = XmlSearch(usersXML, "/advs/advuser")>
<cfset users = XmlSearch(usersXML, "/advs/advuser/user")>

<cfset numUsers = ArrayLen(users)>
<cfloop index="i" from="1" to="#numUsers#">
    ... some selects...
    ... insert...
    <cfset advs = annunciXml[i]["vehicle"]>
    <cfset numAdvs = ArrayLen(advs)> 
    <cfloop index="k" from="1" to="#numAdvs#">        
        ... insert... or ... update...
    </cfloop>
</cfloop>

struct of xml file is (yes, is not very good :-)

<advs>
   <advuser>
      <user>
      </user>
      <vehicle>
      <vehicle>
   </advuser>
</advs>

After ~120,000 rows I get an error: "Out of memory".

How can I improve performance of my script?

How can I diagnose where there is max memory consumption?

@SamG is correct that ColdFusion XML parsing can't do it because of the DOM parser, but SAX is painful, instead use a StAX parser, which provides a much simpler iterator interface. See the answer to another question I provided for an example of how to do this with ColdFusion.

This is roughly what you'd do for your example:

<cfset fis = createObject("java", "java.io.FileInputStream").init(
    "#getDirectoryFromPath(getCurrentTemplatePath())#/file.xml"
)>
<cfset bis = createObject("java", "java.io.BufferedInputStream").init(fis)>
<cfset XMLInputFactory = createObject("java", "javax.xml.stream.XMLInputFactory").newInstance()>
<cfset reader = XMLInputFactory.createXMLStreamReader(bis)>

<cfloop condition="#reader.hasNext()#">
    <cfset event = reader.next()>
    <cfif event EQ reader.START_ELEMENT>
        <cfswitch expression="#reader.getLocalName()#">
            <cfcase value="advs">
                <!--- root node, do nothing --->
            </cfcase>
            <cfcase value="advuser">
                <!--- set values used later on for inserts, selects, updates --->
            </cfcase>
            <cfcase value="user">
                <!--- some selects and insert --->
            </cfcase>
            <cfcase value="vehicle">
                <!--- insert or update --->
            </cfcase>
        </cfswitch>
    </cfif>
</cfloop>

<cfset reader.close()>

extract DOM childNodes or rename a Element without use of iteration

4 votes
$xml = '<p><a>1</a><b><c>1</c></b></p>';
$dom = new DomDocument;
$dom->loadXML($xml);
$p   = $dom->childNodes->item(0);
echo $dom->saveXML($p);

the above will print back

<p>
  <a>1</a>
  <b><c>1</c></b>
</p>

assume need to replace the p node/eleemnt to new_p what is the ideal way except do a loop like below? (below is doable)

$fragment = '';
foreach ($p->childNodes as $a)
{
  $fragment .= $dom->saveXML($a);
}

$new_doc = new DomDocument;
$new_doc->loadXML('<new_node/>');
$f = $new_doc->createDocumentFragment();
$f->appendXML($fragment);
$new_doc->documentElement->appendChild($f);
echo $new_doc->saveXML();

expected results

<new_node><a>1</a><b><c>1</c></b></new_node>

As Mark already pointed out, manipulating XML is easiest with XSLT. And you don't have to write any loops, the thinking is done by the XSLT processor of your choice.

A simple how-to with XSLT

Here's how the XSLT might look like (Google for "Identity transform XSLT" for some tutorials).

The basics are simple: this type of XSLT transformation copies everything as-is, unless there's a specific rule (template-match in XSLT) that specifies an exception (in this case for <p> elements). Note: it doesn't matter how deep your p-tags are nested, which makes it ideal for transforming XML.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <!-- identity transform -->
    <xsl:template match="node() | @*">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>

    <!-- rename "p" with "new_p", copy everything inside p -->
    <xsl:template match="p">
        <new_p>
            <xsl:apply-templates select="@* | node()"/>
         </new_p>
    </xsl:template>

</xsl:stylesheet>

Calling XSLT from PHP

This is relatively straightforward, since PHP has a built-in module for XSL. Here's how you can do it (here's more information):

// create an XSLT processor and load the stylesheet as a DOM 
$xproc = new XsltProcessor();
$xslt = new DomDocument;
$xslt->load('yourstylesheet.xslt');    // this contains the code from above
$xproc->importStylesheet($xslt);


// your DOM or the source XML (copied from your question)
$xml = '<p><a>1</a><b><c>1</c></b></p>';
$dom = new DomDocument;
$dom->loadXML($xml);

// do the transformation
if ($xml_output = $xproc->transformToXML($dom)) {
    echo $xml_output;
} else {
    trigger_error('Oops, XSLT transformation failed!', E_USER_ERROR);
} 

Output is as expected (optional indentation can be set with <xsl:output indent="yes"/>:

<new_p>
    <a>1</a>
    <b><c>1</c></b>
</new_p>

As you can see: no loops or iterations ;)

PS: XSLT is a widely adopted and stable standard. You don't have to worry about proper escaping, parsing issues with CDATA sections or entities, because XSLT guarantees the output to be valid XML. This saves a whole lot of headaches as opposed to doing this by hand.

Is it possible to restrict elements to a max occur in a "choice" block?

4 votes

Hi everybody

I need to solve the following problem.

//pseudo algorithm

  • you have four elements: elm1, elm2, elm3, elm4
  • elm1 occurs 0-2 times
  • elm2 occurs 0-1 times
  • elm3 occurs 0-n times
  • elm4 occurs 0-n times
  • they can be ordered in any way, but occur restricted to their given count.

//pseudo end

It seems like a combination of sequence and choice, but both indicators have a characteristic, that don't allow me my desired behavior.

sample: elm4 elm1 elm2 elm1 elm3 elm3 elm3 elm4

please rescue me before I'll get insane :)

chris

If your n values not too big and you're desperate you can make a content model that accounted for every possible combination, but that grows complex exponentially.

The best solution is to use a tool that supports XML Schema 1.1 (such as Xerces or Saxon), which relaxes restrictions on all group occurrence values. From section G.1.3 of the spec:

  1. Several of the constraints imposed by version 1.0 of this specification on all-groups have been relaxed:

    a. Wildcards are now allowed in all groups.

    b. The value of maxOccurs may now be greater than 1 on particles in an all group. The elements which match a particular particle need not be adjacent in the input.

    c. all groups can now be extended by adding more members to them.

Failing that, the general XML Schema 1.0 solution is to specify a relaxed model in the schema (no limits on the element occurrences) and then enforce the constraints you care about in another layer, which might be custom code or XSLT, for instance.