Best xml questions in October 2010

What happens if the minSdkVersion is lower than the targetSdkVersion?

13 votes

I am getting the warning:

Attribute minSdkVersion (3) is lower than the project target API level (8)

How will this affect my app?

You can safely ignore the warning.

It's a weird warning - it means you are using tools for API level 8 (Android 2.2/Froyo) but targeting API level 3 (Android 1.5/Cupcake). That warning will always come up unless you you were using the SDK to target the Android release it coincides with - in this case, you would have to target Android 2.2 with your current SDK.

What are some of the pitfalls/tips one could give for developing a web service

11 votes

Looking to develop a web service (api) in PHP to offer customers an easier way to integrate with our platform. There are workflow calls that will be validated with user/pass as well as some reporting options.

Sorry I can't post more details or code on the subject and I have never developed a web service but have had experience in using them via SOAP.

Now I would also need to offer a state or status of the workflow and I think REST would be the best choice here, but still looking for opinions on that.

For reporting I would like to offer different options such as XML,Excel/CSV any reason I would pick one over the other?

What are some of the pitfalls I should lookout for?

What are some gems anyone could offer.

Thanks in advance to any help as this is very important for me to understand.

UPDATE #1:

  • What would be the most secure method?
  • What is the most flexible method (Platform independent)

UPDATE #2: a little bit about the data flow. Each user has creds to use the API and no data is shared between users. Usage is submit a request, the request is processed and a return is given. no updates. (Think Google) a search request is made and results are given, but in my case only one result is given. Don't know if this is needed so it's an FYI.

Always handle errors and exceptions.

Problems will always make their presence felt in the application/api. Either at start or through further development. Don't leave this as an end task, and make it clear when an error occurs, with well documented response messages.

Also if your service will handle many requests, and for the same resource id (independent from user) the same resource is returned be sure to cache the information. And this not only for performance reasons, but for the cases when errors stuck up. This ways you can at least serve something to the client (possibly useful, more context required to be explicit).

Removing nodes from XML

8 votes

I want to produce an XML document from another, filtering subnodes that match a specified criterion. How should I do that?

You can use RuleTransformer from scala.xml.transform.

Suppose you have action attribute with "remove" value


val removeIt = new RewriteRule {
    override def transform(n: Node): NodeSeq = n match {
      case e: Elem if (e \ "@action").text == "remove" => NodeSeq.Empty
      case n => n
    }
  }

new RuleTransformer(removeIt).transform(yourXML)

Generate XML documentation comments for /// in Visual Studio 2010 C++

7 votes

I need comment my function prototype (written in C/C++) with summary, returns, param tags. How can I persuade Visual Studio to insert xml tags after three forward slashes like in C#? I found one solution. When I rename xx.h xx.cs in C++ project, I can use /// for generating xml comments (IntelliSense in xml comments works too). There must be a better way, mustn’t there? It would kill me to write it manually. I’ll grateful for every useful comment.

/// <summary>
/// 
/// </summary>
/// <param name="aa"></param>
/// <returns></returns>
bool function1(TypeX aa);

This functionality isn't buit-in. You can try using Visual Studio add-ins. I haven't used Atomineer Utils Pro Documentation myself, but it looks promising. It generates documentation comments and supports C++. It costs $10 though.

What's the best library for parsing RSS/Atom in Perl?

6 votes

I notice that XML::RSS::Parser hasn't been updated since 2005. Is this still the recommended library for parsing RSS or Atomtom? Is there a better one or a better way?

I'm not sure it's ever been the "recommended library". If I know which kind of feed I need to parse, I use XML::RSS or XML::Atom as appropriate, but if (as is more likely) I just know it's a web feed, I use XML::Feed.

Adding an example of using XML::Feed as requested..

use XML::Feed;

my $feed = XML::Feed->parse(\$string_containing_feed);

foreach ($feed->entries) {
  print $_->title, "\n";
  print $_->content->body, "\n";
}

This is all pretty much copied from the module documentation.

Best way to generate xml in python?

5 votes

I'm creating an web api and need a good way to very quickly generate some well formatted xml. I cannot find any good way of doing this in python.

Note: Some libraries look promising but either lack documentation or only output to files.

Using lxml:

from lxml import etree

# create XML 
root = etree.Element('root')
root.append(etree.Element('child'))
# another child with text
child = etree.Element('child')
child.text = 'some text'
root.append(child)

# pretty string
s = etree.tostring(root, pretty_print=True)
print s

Output:

<root>
  <child/>
  <child>some text</child>
</root>

See the tutorial for more information.

Parse large RDF in Python

5 votes

I'd like to parse a very large (about 200MB) RDF file in python. Should I be using sax or some other library? I'd appreciate some very basic code that I can build on, say to retrieve a tag.

Thanks in advance.

If you are looking for fast performance then I'd recommend you to use Raptor with the Redland Python Bindings. The performance of Raptor, written in C, is way better than RDFLib. And you can use the python bindings in case you don't want to deal with C.

Another advice for improving performance, forget about parsing RDF/XML, go with other flavor of RDF like Turtle or NTriples. Specially parsing ntriples is much faster than parsing RDF/XML. This is because the ntriples syntax is simpler.

You can transform your RDF/XML into ntriples using rapper, a tool that comes with raptor:

rapper -i rdfxml -o ntriples YOUR_FILE.rdf > YOUR_FILE.ntriples

The ntriples file will contain triples like:

<s1> <p> <o> .
<s2> <p2> "literal" .

and parsers tend to be very efficient handling this structure. Moreover, memory wise is more efficient than RDF/XML because, as you can see, this data structure is smaller.

The code below is a simple example using the redland python bindings:

import RDF
parser=RDF.Parser(name="ntriples") #as name for parser you can use ntriples, turtle, rdfxml, ...
model=RDF.Model()
stream=parser.parse_into_model(model,"file://file_path","http://your_base_uri.org")
for triple in model:
    print triple.subject, triple.predicate, triple.object

The base URI is the prefixed URI in case you use relative URIs inside your RDF document. You can check documentation about the Python Redland bindings API in here

If you don't care much about performance then use RDFLib, it is simple and easy to use.

XML vs YAML vs JSON

5 votes

Assuming I'm starting a project from scratch, which is not dependent on any other project. I would like to use a format to store feeds, something like XML, since XML is not the only available format of its kind, I would like to know: why should I choose one over the rest?

I will be using perl.

'Feed' is a description of a product (name, price, type, short description, up to 120 words).

We can't really answer that without knowing a lot more. Just because you're not currently dependent on any other projects, are you likely to interact with them at some point in the future? If so, what technologies do they prefer? At the BBC, we've had some projects "JSON-only", only to find out that Java developers who wanted to access our API were begging us to provide a simple XML API simply because they have so many tool built around XML. They didn't even care about namespaces, attributes, or anything else; they just wanted those angle-brackets.

As for "storing feeds", I also not sure what you mean there. You explain the data in the feed, but what are you then going to do with those feeds? Parse them? Cache and reserve them? Write them out to cuneiform tablets? :)

I sounds like what you actually want is a database and you want to persist the data there and later make it serialisable as JSON/YAML/XML or whatever your desired format is. What I'd recommend is to be able to pull the data out into a Perl data structure and then have "formatters" which know how to serialise that data structure to the desired output. That way you can serialise to, say, JSON, and later if that's not good enough, easily switch to YAML or something else. In fact, if others need your data (one-way data tends not to be useful), they can ask for JSON, YAML, XML or whatever. You have more flexibility and aren't tied into a decision that you made up front.

That being said, I don't know your system, so it's tough to say what the right thing to do is. Also, not that JSON and YAML aren't exactly interchangeable with XML. Subtle differences can and will trip you up.

How can I programmatically determine the XML elements that can be inserted next?

5 votes

When I am editing an XML document that has an XmlSchema, how can I programmatically determine the elements that can be inserted next? I am using C# and I already know which element I am in. Is there an MSXML method I can call or something else? Thanks.

Tarzan,

As I understand it, you are trying to determine the legal XML that can be added at a specific place in the document, based on the schema being used. If that is correct, it is a very difficult problem to solve. If you have an "any" element in your XSD, your complexity increases because you can literally be any element! Also, XSD schemas can be subclassed (i.e., an element definition structure based on another structure), then that introduces more complexity. There are only couple of products (Oxygen, Visual Studio) that have attempted this with any success (that I know of).

If your schema is fairly simple, and doesn't include any of these deal breakers, you might be able to use the Schema Object Model to find the legal elements at your current location, but only if you know what portion of the XSD applies to your current element.

Does this make sense?

Erick

How can I insert an xml comment with Groovy MarkupBuilder?

5 votes

I would like to insert comments into my xml document with a Groovy MarkupBuilder. How is it possible?

You can use mkp.comment like so:

def writer = new StringWriter()
def builder = new groovy.xml.MarkupBuilder( writer )
builder.cars {
    mkp.comment "A comment"
    ford( type:'escort')
    ford( type:'fiesta')
 }

println writer

Which prints:

<cars><!-- A comment -->
  <ford type='escort' />
  <ford type='fiesta' />
</cars>

The mkp.XXX methods are described here

Where I can find a detailed comparison of Java XML frameworks?

4 votes

I'm trying to choose an XML-processing framework for my Java projects, and I'm lost in names.. XOM, JDOM, etc. Where I can find a detailed comparison of all popular Java XML frameworks?

As Blaise pointed out stick with the standards. But there are multiple standards created over the period to solve different problems/usecases. Which one to choose completely depends upon your requirement. I hope the below comparison can help you choose the right one.

Now there are two things you have to choose. API and the implementations of the API (there are many)

API

SAX: Pros

  • event based
  • memory efficient
  • faster than DOM
  • supports schema validation

SAX: Cons

  • No object model, you have to tap into the events and create your self
  • Single parse of the xml and can only go forward
  • read only api
  • no xpath support
  • little bit harder to use

DOM: Pros

  • in-memory object model
  • preserves element order
  • bi-directional
  • read and write api
  • xml MANIPULATION
  • simple to use
  • supports schema validation

DOM: Cons

  • memory hog for larger XML documents (typically used for XML documents less than 10 mb)
  • slower
  • generic model i.e. you work with Nodes

Stax: Pros

  • Best of SAX and DOM i.e. Ease of DOM and efficiency of SAX
  • memory efficient
  • Pull model
  • read and write api
  • supports subparsing
  • can read multiple documents same time in one single thread
  • parallel processing of XML is easier

Stax: Cons

  • no schema validation support (as far as I remember, not sure if they have added it now)
  • can only go forward like sax
  • no xml MANIPULATION

JAXB: Pros

  • allows you to access and process XML data without having to know XML
  • bi-directional
  • more memory efficient than DOM
  • SAX and DOM are generic parsers where as JAXB creates a parser specific to your XML Schmea
  • data conversion: JAXB can convert xml to java types
  • supports XML MANIPULATION via object API

JAXB: Cons

  • Requires XML Schema
  • requires a schema compilation step to be added to your build process
  • can only parse valid XML

Trax: For transforming XML from 1 form to another form using XSLT

Implementations

SAX, DOM, Stax, JAXB are just specifications. There are many open source and commercial implementations of these specifications. Most of the time you can just stick with what comes with JDK or your application server. But sometimes you need to use a different implementation that provided by default. And this is where you can appreciate the JAXP wrapper api. JAXP allows you to switch implementations through configuration without the need to modify your code. It also provides a parser/spec independent api for parsing, transformation, validation and querying XML documents.

Performance and other comparisons of various implementations


Now standards are good but once in a while you encounter this crazy usecase where you have to support parsing of XML document that is 100 gigabytes of size or you need ultra fast processing of XML (may be your are implementing a XML parser chip) and this is when you need to dump the standards and look for a different way of doing things. Its about using the right tool for the right job! And this is where I suggest you to have a look at vtd-xml

During the initial days of SAX and DOM, people wanted simpler API's than provided by either of them. JDOM, dom4j, XmlBeans, JiBX, Castor are the ones I know that became popular.

Ignoring specified encoding when deserializing XML

4 votes

I am trying to read some XML received from an external interface over a socket. The problem is that the encoding is specified wrong in the XML-header (it says iso-8859-1, but it is utf-16BE). It is documented that the encoding is utf-16BE, but apparently they forgot to set the correct encoding.

To ignore the encoding when I deserialize I use a StringReader like this:

    private static T DeserializeXmlData<T>(byte[] xmlData)
    {
        var xmlString = Encoding.BigEndianUnicode.GetString(xmlData);
        using (var reader = new StringReader(xmlString))
        {
            reader.ReadLine(); // Eat header line
            using (var xmlReader = XmlReader.Create(reader))
            {
                var serializer = new XmlSerializer(typeof(T));
                return (T)serializer.Deserialize(xmlReader);
            }
        }
    }

The above actually works fine, but I don't like the part where I just skip the header line by calling ReadLine. Is there a less brittle way to bypass the encoding specified in the XML-header?

Solution with StreamReader

By using a StreamReader, I can override the encoding specified in the XML-header. Specifying XmlReaderSettings.IgnoreProcessingInstructions or not did not do any difference. Interestingly the StreamReader ignores the specified encoding if it finds a unicode byte-order mark.

To recap:

  • If the XmlReader is initialized with a TextReader, XML-header encoding is ignored.
  • If a StringReader is used, the XmlReader fails if a unicode byte-order mark exists.
  • If a StreamReader is used, a unicode byte-order mark overrides the StreamReader encoding.
  • XmlReaderSettings.IgnoreProcessingInstructions = true doesn't make a difference when using a TextReader.

In conclusion, the most robust solution seems to be using a StreamReader, since it uses the byte-order mark, if present.

    private static T DeserializeXmlData<T>(byte[] xmlData)
    {
        using (var xmlDataStream = new MemoryStream(xmlData))
        {
            using (var reader = new StreamReader(xmlDataStream, Encoding.BigEndianUnicode))
            {
                using (var xmlReader = XmlReader.Create(reader))
                {
                    var serializer = new XmlSerializer(typeof (T));
                    return (T) serializer.Deserialize(xmlReader);
                }
            }
        }
    }

I think I'd just use a StreamReader, constructed with the right encoding and pass that to the XmlReader.Create(TextStream) method:

 using (var sr = new StreamReader(@"c:\temp\bad.xml", Encoding.BigEndianUnicode)) {
     using (var xr = XmlReader.Create(sr, new XmlReaderSettings())) {
         // etc...
     }
 }