Best xml questions in June 2012

Deserializing XML into JSON without using XmlDocument.Loadxml() function

7 votes

Good morning everyone, I have a bit of a unique problem. I am registering a dll as an assembly inside of a SQL Server database that takes in an SQLXml variable, along with two strings, and serializes the data into JSON format.

For reference, here is the method call:

[SqlProcedure]
public static void Receipt(SqlString initiatorPassword,
                           SqlString initiatorId,
                           SqlXml XMLOut,
                           out SqlString strMessge)

I would use Newtonsoft.Json or Jayrock for this application if this was any other type of app. Normally I would follow the answer given here and do something similar to:

XmlReader r = (XmlReader)XmlOut.CreateReader();
XmlDocument doc = new XmlDocument();
doc.load(r);

However, since I am using SQLClr, there are certain rules of the road. One of which is that .Load() and any other inherited method can't be used. I think the .Net framework said it best:

System.InvalidOperationException: Cannot load dynamically generated serialization assembly. In some hosting environments assembly load functionality is restricted, consider using pre-generated serializer. Please see inner exception for more information. ---> System.IO.FileLoadException:
LoadFrom(), LoadFile(), Load(byte[]) and LoadModule() have been disabled by the host.

I am not fluent in SqlClr by any means, but if I am understanding this blog correctly, this is caused by SqlCLR's rules not allowing .Load() and inherited methods without being signed for and having a strong name. My DLL and the 3rd party DLLs I'm using do not have a strong name nor can I rebuild and sign them myself. So, this leaves me stuck with attempting to complete this task without using load (Unless someone knows another way this can be done)

My only solution I could come up with is a very ugly while loop that isn't working properly, I have been getting a "Jayrock.Json.JsonException: A JSON member value inside a JSON object must be preceded by its member name" exception. Here is the while loop I wrote (not my best code, I know):

 int lastdepth = -1;
 Boolean objend = true;
 Boolean wt = false;
//Write Member/Object statements for the header omitted
JsonWriter w = new JsonTextWriter()
 while (m.Read())
                {
                    if ((lastdepth == -1) && (m.IsStartElement()))
                    {//Checking for root element
                        lastdepth = 0;
                    }
                    if ((m.IsStartElement()) && (lastdepth != -1))
                    {//Checking for Start element ( <html> )
                        w.WriteMember(m.Name);
                        if (objend)
                        { //Check if element is new Parent Node, if so, write start object
                            w.WriteStartObject();
                            objend = false;
                        }
                    }
                    if (m.NodeType == XmlNodeType.Text)
                    { //Writes text here.  NOTE: m.Depth > lastdepth here!!!!!!!
                        w.WriteString(m.Value);
                        wt = true;
                    }
                    if (m.NodeType == XmlNodeType.Whitespace) //If whitespace, keep on truckin
                    { m.Skip(); }
                    if ((m.NodeType == XmlNodeType.EndElement) && (wt == false) && (lastdepth > m.Depth))
                    {//End element that ends a series of "Child" nodes
                        w.WriteEndObject();
                        objend = true;
                    }
                    if ((m.NodeType == XmlNodeType.EndElement) && (wt == true))//Standard end of an el
                    { wt = false; }
                    lastdepth = m.Depth;
                }
                w.WriteEndObject();
                jout = w.ToString();
}

My question is, since I can't use .load() and my while loop is a mess to debug, what would be the best approach here? The other approach commonly discussed is deserialization into an Object with matching variables but I have a rather large XML coming out of SQL Server. My loop is an attempt at dynamic programming since there are ~200 fields that are being pulled to make this XML. Thank you in advance!

Note: I am using Jayrock and working in .Net Framework 2.0. I can not change the framework version at this time.

Code For JayRock

Your code is throwing an exception:

A JSON member value inside a JSON object must be preceded by its member name.

This exception comes from the method:

    private void EnsureMemberOnObjectBracket() 
    {
        if (_state.Bracket == JsonWriterBracket.Object)
            throw new JsonException("A JSON member value inside a JSON 
             object must be preceded by its member name.");
    }

The containing call from that code is from:

    public sealed override void WriteString(string value)
    {
        if (Depth == 0)
        {
            WriteStartArray(); WriteString(value); WriteEndArray();
        }
        else
        {
            EnsureMemberOnObjectBracket();
            WriteStringImpl(value);
            OnValueWritten();
        }
    }

The only time that your code calls a method which calls EnsureMemberOnObjectBracket is from one place:

if (m.NodeType == XmlNodeType.Text)
{ //Writes text here.  NOTE: m.Depth > lastdepth here!!!!!!!
 w.WriteString(m.Value);
 wt = true;
}

This means that there is an error here. Perhaps you could do some try/catch, or refinement of your code here.

How do I get the whole content between two xml tags in Python?

7 votes

I try to get the whole content between an opening xml tag and it's closing counterpart.

Getting the content in straight cases like title below is easy, but how can I get the whole content between the tags if mixed-content is used and I want to preserve the inner tags?

<?xml version="1.0" encoding="UTF-8"?>
<review>
  <title>Some testing stuff</title>
  <text sometimes="attribute">Some text with <extradata>data</extradata> in it.
  It spans <sometag>multiple lines: <tag>one</tag>, <tag>two</tag> 
  or more</sometag>.</text>
</review>

What I want is the content between the two text tags, including any tags: Some text with <extradata>data</extradata> in it. It spans <sometag>multiple lines: <tag>one</tag>, <tag>two</tag> or more</sometag>.

For now I use regular expressions but it get's kinda messy and I don't like this approach. I lean towards a XML parser based solution. I looked over minidom, etree, lxml and BeautifulSoup but couldn't find a solution for this case (whole content, including inner tags).

from lxml import etree
t = etree.XML(
"""<?xml version="1.0" encoding="UTF-8"?>
<review>
  <title>Some testing stuff</title>
  <text>Some text with <extradata>data</extradata> in it.</text>
</review>"""
)
(t.text + ''.join(map(etree.tostring, t))).strip()

The trick here is that t is iterable, and when iterated, yields all child nodes. Because etree avoids text nodes, you also need to recover the text before the first child tag, with t.text.

In [50]: (t.text + ''.join(map(etree.tostring, t))).strip()
Out[50]: '<title>Some testing stuff</title>\n  <text>Some text with <extradata>data</extradata> in it.</text>'

Or:

In [6]: e = t.xpath('//text')[0]

In [7]: (e.text + ''.join(map(etree.tostring, e))).strip()
Out[7]: 'Some text with <extradata>data</extradata> in it.'

XPath select all elements between two specific elements

5 votes

I have a following xml:

<doc>
    <divider />
    <p>text</p>
    <p>text</p>
    <p>text</p>
    <p>text</p>
    <p>text</p>
    <divider />
    <p>text</p>
    <p>text</p>
    <divider />
    <p>text</p>
    <divider />
</doc>

I want to select all p nodes after first divider element until first new occurrence of divider element. I tried with following xpath:

//divider[1]/following-sibling::p[following::divider]

but the problem is it selects all p elements before last divider element. I'm not sure how to do it suing xpath 1.

Same concept as bytebuster, but a different xpath:

/*/p[count(preceding-sibling::divider)=1]

What is the proper way to store a file name in XML?

5 votes

I'm using XDocument to cache a list of files.

<file id="20" size="244318208">a file with an &amp;ersand.txt</file>

In this example, I used XText, and let it automatically escape characters in the file name, such as the & with &amp;

<file id="20" size="244318208"><![CDATA[a file with an &ersand.txt]]></file>

In this one, I used XCData to let me use a literal string rather than an escaped one, so it appears in the XML as it would in my application.

I'm wondering if either of them is better than the other under any certain conditions, or if it is just personal taste. Also, if it means anything, the file names may or may not contain illegal characters.

Both are essentially the same and there is no specific "best practice".

Personally, I reserve <![CDATA[]]> for large amounts of text that requires lots of escaping (say bits of code or HTML markup).

In this specific case, I would rather escape the & to &amp; as in your first example.

What does "@*|node()" in a XSLT apply-template select mean?

5 votes

i reads some xslt examples and found that:

<xsl:apply-template select="@*|node()"/>

what does that mean?

It means apply the template to any attribute or node.

How to remove all empty XElements

4 votes

This one is a little tricky. Say I have this XmlDocument

<Object>
    <Property1>1</Property1>
    <Property2>2</Property2>
    <SubObject>
         <DeeplyNestedObject />
    </SubObject>
</Object>

I want to get back this

<Object>
    <Property1>1</Property1>
    <Property2>2</Property2>
</Object>

Since each of the children of <SubObject> are all empty elements I want to get rid of it. What makes it challenging is that you cant remove nodes as you're iterating over them. Any help would be much appreciated.

UPDATE Here's what I wound up with.

public XDocument Process()
{
    //Load my XDocument
    var xmlDoc = GetObjectXml(_source);

    //Keep track of empty elements
    var childrenToDelete = new List<XElement>();

    //Recursively iterate through each child node
    foreach (var node in xmlDoc.Root.Elements())
        Process(node, childrenToDelete);

    //An items marked for deletion can safely be removed here
    //Since we're not iterating over the source elements collection
    foreach (var deletion in childrenToDelete)
        deletion.Remove();

    return xmlDoc;
}

private void Process(XElement node, List<XElement> elementsToDelete)
{
    //Walk the child elements
    if (node.HasElements)
    {
        //This is the collection of child elements to be deleted 
        //for this particular node
        var childrenToDelete = new List<XElement>();

        //Recursively iterate each child
        foreach (var child in node.Elements())
            Process(child, childrenToDelete);

        //Delete all children that were marked as empty
        foreach (var deletion in childrenToDelete)
            deletion.Remove();

        //Since we just removed all this nodes empty children
        //delete it if there's nothing left
        if (node.IsEmpty)
            elementsToDelete.Add(node);
    }

    //The current leaf node is empty so mark it for deletion
    else if (node.IsEmpty)
        elementsToDelete.Add(node);
}

If anyone is interested in the use case for this it's for an ObjectFilter project I put together.

It'll be rather slow, but you could do this:

XElement xml;
while (true) {
    var empties = xml.Descendants().Where(x => x.IsEmpty && !x.HasAttributes).ToList();
    if (empties.Count == 0)
        break;

    empties.ForEach(e => e.Remove());
}

To make it faster, you could walk up the parent nodes after the first iteration and see if they're empty.

XElement xml;
var empties = xml.Descendants().Where(x => x.IsEmpty && !x.HasAttributes).ToList();
while (empties.Count > 0) {
    var parents = empties.Select(e => e.Parent)
                         .Where(e => e != null)
                         .Distinct()    //In case we have two empty siblings, don't try to remove the parent twice
                         .ToList();

    empties.ForEach(e => e.Remove());

    //Filter the parent nodes to the ones that just became empty.
    parents.RemoveAll(e => !e.IsEmpty && !e.HasAttributes);
    empties = parents;
}

Grouping nodes with same value from XML

4 votes

Hi I am new in Xslt/Xml.

I have XML like that:

<entry>
 <attribute1>A</attribute1>
 <attribute2>B</attribute2>
</entry>
<entry>
 <attribute1>A</attribute1>
 <attribute2>B</attribute2>
</entry>
<entry>
 <attribute1>C</attribute1>
 <attribute2>D</attribute2>
</entry>
<entry>
 <attribute1>E</attribute1>
 <attribute2>F</attribute2>
</entry>
...

I need table output:

A

Attribute1 Attribute2 Qty
   A           B       2
   C           D       1
   E           F       1

I need your help, I have no idea how to count unique entries and display it as one in table.

I am using XSLT version 1.0

I. Simple XSLT 1.0 transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:key name="kEntryByChildren" match="entry" use="."/>

 <xsl:template match=
 "entry[not(generate-id() = generate-id(key('kEntryByChildren', .)[1]))]"/>

 <xsl:template match="entry">
  <tr>
   <xsl:apply-templates/>
   <td><xsl:value-of select="count(key('kEntryByChildren', .))"/></td>
  </tr>
 </xsl:template>

 <xsl:template match="entry/*">
   <td><xsl:value-of select="."/></td>
 </xsl:template>

 <xsl:template match="/*">
   <table>
     <xsl:apply-templates/>
   </table>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML (the fragment is wrapped ito a single top element to obtain a well-formed XML document):

<t>
    <entry>
        <attribute1>A</attribute1>
        <attribute2>B</attribute2>
    </entry>
    <entry>
        <attribute1>A</attribute1>
        <attribute2>B</attribute2>
    </entry>
    <entry>
        <attribute1>C</attribute1>
        <attribute2>D</attribute2>
    </entry>
    <entry>
        <attribute1>E</attribute1>
        <attribute2>F</attribute2>
    </entry>
</t>

produces the wanted, correct result:

<table>
   <tr>
      <td>A</td>
      <td>B</td>
      <td>2</td>
   </tr>
   <tr>
      <td>C</td>
      <td>D</td>
      <td>1</td>
   </tr>
   <tr>
      <td>E</td>
      <td>F</td>
      <td>1</td>
   </tr>
</table>

When applied on this tricky XML document (if we used simple concatenation of the children's values, we would incorrectly conclude that the first three entry elements are "same"):

<t>
    <entry>
        <attribute1>AB</attribute1>
        <attribute2>C</attribute2>
    </entry>
    <entry>
        <attribute1>A</attribute1>
        <attribute2>BC</attribute2>
    </entry>
    <entry>
        <attribute1>A</attribute1>
        <attribute2>BC</attribute2>
    </entry>
    <entry>
        <attribute1>C</attribute1>
        <attribute2>D</attribute2>
    </entry>
    <entry>
        <attribute1>E</attribute1>
        <attribute2>F</attribute2>
    </entry>
</t>

the correct result is produced:

<table>
   <tr>
      <td>AB</td>
      <td>C</td>
      <td>1</td>
   </tr>
   <tr>
      <td>A</td>
      <td>BC</td>
      <td>2</td>
   </tr>
   <tr>
      <td>C</td>
      <td>D</td>
      <td>1</td>
   </tr>
   <tr>
      <td>E</td>
      <td>F</td>
      <td>1</td>
   </tr>
</table>

Explanation:

Proper use of the Muenchian grouping method.

Do note:

  1. This solution doesn't depend on the names and number of children of an entry element, and thus can be applied if there are more than two children, or varying number of children with unknown beforehand names.

  2. Here we assume that the concatenation of all children's string values is the same only when the same children have the same values.


II. Full XSLT 1.0 solution:

In case the assumption 2. above cannot be guaranteed, this is one possible XSLT 1.0 solution:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="entry">
  <xsl:variable name="vChildrenFp">
    <xsl:for-each select="*">
     <xsl:value-of select="concat(., '+')"/>
    </xsl:for-each>
  </xsl:variable>

  <xsl:variable name="vPrecedingSame">
    <xsl:for-each select="preceding-sibling::entry">
     <xsl:variable name="vthisFP">
       <xsl:for-each select="*">
         <xsl:value-of select="concat(., '+')"/>
       </xsl:for-each>
     </xsl:variable>

     <xsl:if test="$vthisFP = $vChildrenFp">1</xsl:if>
    </xsl:for-each>
  </xsl:variable>

  <xsl:if test="not(string($vPrecedingSame))">
      <xsl:variable name="vFollowingSame">
        <xsl:for-each select="following-sibling::entry">
         <xsl:variable name="vthisFP">
           <xsl:for-each select="*">
             <xsl:value-of select="concat(., '+')"/>
           </xsl:for-each>
         </xsl:variable>

         <xsl:if test="$vthisFP = $vChildrenFp">1</xsl:if>
        </xsl:for-each>
      </xsl:variable>

      <tr>
       <xsl:apply-templates/>
       <td><xsl:value-of select="string-length($vFollowingSame)+1"/></td>
      </tr>
  </xsl:if>
 </xsl:template>

 <xsl:template match="entry/*">
   <td><xsl:value-of select="."/></td>
 </xsl:template>

 <xsl:template match="/*">
   <table>
     <xsl:apply-templates/>
   </table>
 </xsl:template>
</xsl:stylesheet>

When applied on the same XML document (above), the same correct result is produced:

<table>
   <tr>
      <td>A</td>
      <td>B</td>
      <td>2</td>
   </tr>
   <tr>
      <td>C</td>
      <td>D</td>
      <td>1</td>
   </tr>
   <tr>
      <td>E</td>
      <td>F</td>
      <td>1</td>
   </tr>
</table>

Explanation:

  1. For each entry element we generate a "fingerprint" (FP) of its children and process this entry element if none of its preceding sibling entry element has the same children's fingerprint.

  2. The count of "same" entry elements is done in a similar way -- for any following sibling entry element with the same children's FP value, we output a single character ('1'). The total count is the string-length of the so generated string (of "1"s) plus 1.


III. XSLT 2.0 solution:

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:my="my:my" exclude-result-prefixes="my xs">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:param name="pExoticString" select="'+'"/>

 <xsl:template match="/*">
   <table>
     <xsl:for-each-group select="entry" group-by="my:fingerprint(.)">
       <tr>
           <xsl:apply-templates/>
           <td><xsl:value-of select="count(current-group())"/></td>
       </tr>
     </xsl:for-each-group>
   </table>
 </xsl:template>

 <xsl:template match="entry/*">
   <td><xsl:value-of select="."/></td>
 </xsl:template>

 <xsl:function name="my:fingerprint" as="xs:string">
  <xsl:param name="pParent" as="element()"/>

  <xsl:sequence select="string-join($pParent/*, $pExoticString)"/>
 </xsl:function>
</xsl:stylesheet>

This simple solution easily handles the complicated case. When applied on the last XML document, the wanted, correct result is produced:

<table>
   <tr>
            <td>AB</td>
            <td>C</td>
         <td>1</td>
   </tr>
   <tr>
            <td>A</td>
            <td>BC</td>
         <td>2</td>
   </tr>
   <tr>
            <td>C</td>
            <td>D</td>
         <td>1</td>
   </tr>
   <tr>
            <td>E</td>
            <td>F</td>
         <td>1</td>
   </tr>
</table>

Explanation:

Proper use of xsl:for-each-group, xsl:function, current-group() and string-join().

SQL SELECT using XML input

4 votes

I've currently got a C# application that responds to HTTP requests. The body of the HTTP request (XML) is passed to SQL Server, at which time the database engine performs the correct instruction. One of the instructions is used to load information about Invoices using the id of the customer(InvoiceLoad):

<InvoiceLoad ControlNumber="12345678901">
   <Invoice>
      <CustomerID>johndoe@gmail.com</CustomerID>
   </Invoice>
</InvoiceLoad>  

I need to perform a SELECT operation against the invoice table (which contains the associated email address).

I've tried using:

SELECT 'Date', 'Status', 'Location' 
FROM Invoices 
WHERE Email_Address = Invoice.A.value(.)  

using an xml.nodes('InvoiceLoad/Invoice/CustomerId') Invoice(A)

command.

However, as this query may run THOUSANDS of times per minute, I want to make it as fast as possible. I'm hearing that one way to do this may be to use CROSS APPLY (which I have never used). Is that the solution? If not, how exactly would I go about making this query as fast as possible? Any and all suggestions are greatly appreciated!

I don't see why you would need a call to .nodes() at all - from what I understand, each XML fragment has just a single entry - right?

So given this XML:

<InvoiceLoad ControlNumber="12345678901">
   <Invoice>
      <CustomerID>johndoe@gmail.com</CustomerID>
   </Invoice>
</InvoiceLoad>  

you can use this SQL to get the value of the <CustomerID> node:

DECLARE @xmlvar XML

SET @xmlvar = '<InvoiceLoad ControlNumber="12345678901">
   <Invoice>
      <CustomerID>johndoe@gmail.com</CustomerID>
   </Invoice>
</InvoiceLoad>'

SELECT 
   @xmlvar.value('(/InvoiceLoad/Invoice/CustomerID)[1]', 'varchar(100)') 

and you can join this against your customer table or whatever you need to do.

If you have the XML stored in a table, and you always need to extract that value from <CustomerID>, you could also think about creating a computed, persisted column on that table that would extract that e-mail address into a separate column, which you could then use for easy joining. This requires a little bit of work - a stored function taking the XML as input - but it's really quite a nice way to "surface" certain important snippets of data from your XML.

Step 1: create your function

CREATE FUNCTION dbo.ExtractCustomer (@input XML)
RETURNS VARCHAR(255)
WITH SCHEMABINDING
AS BEGIN
    DECLARE @Result VARCHAR(255)

    SELECT 
        @Result = @Input.value('(/InvoiceLoad/Invoice/CustomerID)[1]', 'varchar(255)') 

    RETURN @result
END

So given your XML, you get the one <CustomerID> node and extract its "inner text" and return it as a VARCHAR(255).

Step 2: add a computed, persisted column to your table

ALTER TABLE dbo.YourTableWithTheXML
ADD CustomerID AS dbo.ExtractCustomer(YourXmlColumnHere) PERSISTED

Now, your table that has the XML column has a new column - CustomerID - which will automagically contain the contents of the <CustomerID> as a VARCHAR(255). The value is persisted, i.e. as long as the XML doesn't change, it doesn't have to be re-computed. You can use that column like any other on your table, and you can even index it to speed up any joins on it!

C# linq to xml, nested query with attributes

4 votes

Im really struggling to get my head round this.

Im using c#.

I want to get back an IEnumerable of products from an xml file.

Below is a sample of the xml structure.

I need to get a list of products that have the productEnriched custom attribute set as true.

Some products wont have any custom attribute section at all

my head has strated to hurt just thinking about it!

<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="http://www.mynamespace.com" catalog-id="MvgCatalog">
    <product>
        <custom-attributes>
            <custom-attribute attribute-id="productEnriched">true</custom-attribute>
        </custom-attributes>
    </product>
</category>

thanks for any help

To clear things up i have added a few more items to the example xml

I need to get a list of products only products that have a custom-attribute element with the attribute productEnriched and value of true some products in the xml wont have any custom-attribute or custom-attributes elements some products will have it but with a value of false i just need a list of products where it exists and has a value of true

<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="http://www.mynamespace.com" catalog-id="MvgCatalog">
    <product>
        <upc>000000000000</upc> 
        <productTitle>My product name</productTitle>
        <custom-attributes>
           <custom-attribute attribute-id="productEnriched">true</custom-attribute>
           <custom-attribute attribute-id="somethingElse">4</custom-attribute>
           <custom-attribute attribute-id="anotherThing">otherdata</custom-attribute>
        </custom-attributes>
    </product>
</category>

I need to get a list of products only products that have a custom-attribute element with the attribute productEnriched and value of true some products in the xml wont have any custom-attribute or custom-attributes elements some products will have it but with a value of false i just need a list of products where it exists and has a value of true

var xml = XElement.Load(@"your file.xml");
XNamespace ns = "http://www.mynamespace.com";
var products = xml.Elements(ns + "product");
var filtered = products.Where(
    product =>
        product.Element(ns + "custom-attributes") != null &&
        product.Element(ns + "custom-attributes").Elements(ns + "custom-attribute")
        .Any(
            ca => 
                ca.Value == "true" && 
                ca.Attribute("attribute-id") != null && 
                ca.Attribute("attribute-id").Value == "productEnriched"));

By the way, your XMLs are not valid - your opening tag (catalog) does not match your closing tag (category).

The format by itself is strange - is it your idea?

    <custom-attributes>
       <custom-attribute attribute-id="productEnriched">true</custom-attribute>
       <custom-attribute attribute-id="somethingElse">4</custom-attribute>
       <custom-attribute attribute-id="anotherThing">otherdata</custom-attribute>
    </custom-attributes>

Why put an attribute name as an attribute value and attribute value as an element value? It looks bloated and kind of "reinvents" XML with no clear purpose.

Why not:

    <custom-attributes>
       <custom-attribute productEnriched="true"/>
       <custom-attribute somethingElse="4"/>
       <custom-attribute anotherThing="otherdata"/>
    </custom-attributes>

Or:

    <custom-attributes productEnriched="true" somethingElse="4" anotherThing="otherdata"/>

Or perhaps just use elements:

    <product-parameters>
       <productEnriched>true</productEnriched>
       <somethingElse>4</somethingElse>
       <anotherThing>otherdata</anotherThing>
    </product-parameters>

Select adjacent sibling elements without intervening non-whitespace text nodes

4 votes

Given markup like:

<p>
  <code>foo</code><code>bar</code>
  <code>jim</code> and then <code>jam</code>
</p>

I need to select the first three <code>—but not the last. The logic is "Select all code elements that have a preceding-or-following-sibling-element that is also a code, unless there exist one or more text nodes with non-whitespace content between them.

Given that I am using Nokogiri (which uses libxml2) I can only use XPath 1.0 expressions.

Although a tricky XPath expression is desired, Ruby code/iterations to perform the same on a Nokogiri document are also acceptable.

Note that the CSS adjacent sibling selector ignores non-element nodes, and so selecting nokodoc.css('code + code') will incorrectly select the last <code> block.

Nokogiri.XML('<r><a/><b/> and <c/></r>').css('* + *').map(&:name)
#=> ["b", "c"]

Edit: More test cases, for clarity:

<section><ul>
  <li>Go to <code>N</code> and
      then <code>Y</code><code>Y</code><code>Y</code>.
  </li>
  <li>If you see <code>N</code> or <code>N</code> then…</li>
</ul>
<p>Elsewhere there might be: <code>N</code></p>
<p><code>N</code> across parents.</p>
<p>Then: <code>Y</code> <code>Y</code><code>Y</code> and <code>N</code>.</p>
<p><code>N</code><br/><code>N</code> elements interrupt, too.</p>
</section>

All the Y above should be selected. None of the N should be selected. The content of the <code> are used only to indicate which should be selected: you may not use the content to determine whether or not to select an element.

The context elements in which the <code> appear are irrelevant. They may appear in <li>, they may appear in <p>, they may appear in something else.

I want to select all the consecutive runs of <code> at once. It is not a mistake that there is a space character in the middle of one of sets of Y.

Use:

//code
     [preceding-sibling::node()[1][self::code]
    or
      preceding-sibling::node()[1]
         [self::text()[not(normalize-space())]]
     and
      preceding-sibling::node()[2][self::code]
    or
     following-sibling::node()[1][self::code]
    or
      following-sibling::node()[1]
         [self::text()[not(normalize-space())]]
     and
      following-sibling::node()[2][self::code]
     ]

XSLT - based verification:

<xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>

     <xsl:template match="/">
      <xsl:copy-of select=
       "//code
             [preceding-sibling::node()[1][self::code]
            or
              preceding-sibling::node()[1]
                 [self::text()[not(normalize-space())]]
             and
              preceding-sibling::node()[2][self::code]
            or
             following-sibling::node()[1][self::code]
            or
              following-sibling::node()[1]
                 [self::text()[not(normalize-space())]]
             and
              following-sibling::node()[2][self::code]
             ]"/>
     </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<section><ul>
      <li>Go to <code>N</code> and
          then <code>Y</code><code>Y</code><code>Y</code>.
      </li>
      <li>If you see <code>N</code> or <code>N</code> then…</li>
    </ul>
    <p>Elsewhere there might be: <code>N</code></p>
    <p><code>N</code> across parents.</p>
    <p>Then: <code>Y</code> <code>Y</code><code>Y</code> and <code>N</code>.</p>
    <p><code>N</code><br/><code>N</code> elements interrupt, too.</p>
</section>

the contained XPath expression is evaluated and the selected nodes are copied to the output:

<code>Y</code>
<code>Y</code>
<code>Y</code>
<code>Y</code>
<code>Y</code>
<code>Y</code>

Deserialize XML to object (need to return a list of objects)

Asked on Thu, 28 Jun 2012 by J-Y c# xml
4 votes

Started practicing with XML and C# and I have an error message of "There is an error in XML document (3,2)". After looking at the file, I can't see anything wrong with it (Mind you, I probably missed something since I'm a noob). I'm using a Console Application for C# right now. I'm trying to return a list of Adventurers and just a side note, the GEAR element is optional. Here is what I have so far:

XML File - Test1

<?xml version="1.0" encoding="utf-8"?>
<Catalog>
    <Adventurer>
        <ID>001</ID>
        <Name>John Smith</Name>
        <Address>123 Fake Street</Address>
        <Phone>123-456-7890</Phone>
        <Gear>
            <Attack>
                <Item>
                    <IName>Sword</IName>
                    <IPrice>15.00</IPrice>
                </Item> 
                <Item>
                    <IName>Wand</IName>
                    <IPrice>20.00</IPrice>
                </Item>         
            </Attack>
            <Defense>
                <Item>
                    <IName>Shield</IName>
                    <IPrice>5.00</IPrice>
                </Item>
        </Defense>  
        </Gear>
    </Adventurer>
    <Adventurer>
        <ID>002</ID>
        <Name>Guy noone likes</Name>
        <Address>Some Big House</Address>
        <Phone>666-666-6666</Phone>
        <Gear></Gear>
    </Adventurer>
</Catalog>

C# Classes

public class Catalog
{
    List<Adventurer> Adventurers { get; set; }
}

public class Adventurer
{
    public int ID { get; set; }
    public string Name { get; set; }
    public string Address { get; set; }
    public string Phone { get; set; }
    public Gear Gear { get; set; }
}

public class Gear
{
    public List<Item> Attack { get; set; }
    public List<Item> Defense { get; set; }
}

public class Item
{
    public string IName { get; set; }
    public decimal IPrice { get; set; }
}

Serialize Function - Where the Problem Occurs at Line 5

Catalog obj = null;
string path = @"C:\Users\Blah\Desktop\test1.xml";
XmlSerializer serializer = new XmlSerializer(typeof(Catalog));
StreamReader reader = new StreamReader(path);
obj = (Catalog)serializer.Deserialize(reader);
reader.Close();

Console.ReadLine();

The issue is the list of Adventurers in Catalog:

<?xml version="1.0" encoding="utf-8"?>
<Catalog>
    <Adventurers> <!-- you're missing this -->
        <Adventurer>
        </Adventurer>
        ...
        <Adventurer>
        </Adventurer>
    </Adventurers> <!-- and missing this -->
</Catalog>

You don't have the wrapping element for the Adventurers collection.

EDIT: By the way, I find the easiest way to build the XML structure and make sure it's compatible is to create the object(s) in C#, then run through the built-in XmlSerializer and use its XML output as a basis for any XML I create rather than forming it by hand.