Best scala questions in October 2010

Migrating Java to Scala

22 votes

What are the most important points to be aware of, and the workarounds, when gradually migrating an existing Java codebase to Scala? With a (potentially very long) intermediate phase where both languages are in use.

The sort of things I'm thinking about are:

  • different collection hierarchies
  • Java constructs that Scala can't handle well
  • Scala constructs that are impractical to use in Java
  • build tools
  • compilation order
  • immutability support in frameworks
  • etc.

Scala doesn't like:

  • inner Java classes
  • static methods and variables (especially in super classes)
  • raw types

Java doesn't like:

  • Scala objects traits
  • closures
  • actors (except Scarlett Johansson and Akka Actors since they have a Java API)
  • implicits, especially Manifests
  • advanced type constructs (higher kinded types, structural types, abstract type vars)

Are there any provable real-world languages? (scala?)

16 votes

I was taught about formal systems at university, but I was disappointed how they didn't seem to be used in the real word.

I like the idea of being able to know that some code (object, function, whatever) works, not by testing, but by proof.

I'm sure we're all familiar with the parallels that don't exist between physical engineering and software engineering (steel behaves predictably, software can do anything - who knows!), and I would love to know if there are any languages that can be use in the real word (is asking for a web framework too much to ask?)

I've heard interesting things about the testability of functional languages like scala.

As software engineers what options do we have?

Yes, there are languages designed for writing provably correct software. Some are even used in industry. Spark ADA is probably the most prominent example. I've talked to a few people at Praxis Critical Systems Limited who used it for code running on Boings (for engine monitoring) and it seems quite nice. (Here is a great summary / description of the language.) This language and accompanying proof system uses the second technique described below (it doesn't even allow dynamic memory allocation).


My impression and experience is that there are two techniques for writing correct software:

  • Technique 1: Write the software in a language you're comfortable with (C, C++ or Java for instance). Take a formal specification of such language, and prove your program correct.

    If your ambition is to be 100% correct (which is most often a requirement in automobile / aerospace industry) you'd be spending little time programming, and more time proving.

  • Technique 2: Write the software in a slightly more awkward language (some subset of ADA or tweaked version of OCaml for instance) and write the correctness proof along the way. Programming and proving goes hand in hand (the Curry-Howard correspondence even equates them completely!)

    In these scenarios you'll always end up with a correct program. (A bug will be guaranteed to be rooted in the specification.) You'll be likely to spend more time on programming but on the other hand you're proving it correct along the way.

Note that both approaches hinges on the fact you have a formal specification at hand (how else would you tell what is correct / incorrect behavior), and a formally defined semantics of the language (how else would you be able to tell what the actual behavior of your program is).

Here are a few more examples of formal approaches. If it's "real-world" or not, depends on who you ask :-)

I know of only one "provably correct" web-application language: UR. A Ur-program that "goes through the compiler" is guaranteed not to:

  • Suffer from any kinds of code-injection attacks
  • Return invalid HTML
  • Contain dead intra-application links
  • Have mismatches between HTML forms and the fields expected by their handlers
  • Include client-side code that makes incorrect assumptions about the "AJAX"-style services that the remote web server provides
  • Attempt invalid SQL queries
  • Use improper marshaling or unmarshaling in communication with SQL databases or between browsers and web servers

Removing nodes from XML

8 votes

I want to produce an XML document from another, filtering subnodes that match a specified criterion. How should I do that?

You can use RuleTransformer from scala.xml.transform.

Suppose you have action attribute with "remove" value


val removeIt = new RewriteRule {
    override def transform(n: Node): NodeSeq = n match {
      case e: Elem if (e \ "@action").text == "remove" => NodeSeq.Empty
      case n => n
    }
  }

new RuleTransformer(removeIt).transform(yourXML)

Is using scala on android worth it? Is there a lot of overhead? Problems?

8 votes

I was thinking of building an app on android with Scala instead of the regular Java (or the equivalent I guess). Is it worth it? Any problems and unnecessary headaches?

Working with Scala should be mostly painless, as the dex compiler just works with bytecode - which is exactly what Scala produces.

Your biggest problem then is the dependency on scala-library, as dex expects everything to be in a single Jar. This is best handled with Proguard (which will also remove unused code and give you a smaller executable, ideal for mobile)


Current best practice is to use SBT with the Android plugin; It'll take care of everything for you: http://github.com/jberkel/android-plugin

If you must use Eclipse and the plugin supplied by Google, then you're going to have a non-standard directory structure. I also wrote an article on how to deal with this: http://www.assembla.com/wiki/show/scala-ide/Developing_for_Android

But be warned... it takes a lot more effort that way!

Is there a dbunit-like framework that doesn't suck for java/scala?

6 votes

I was thinking of making a new, light-weight database population framework. I absolutely hate dbunit. Before I do, I want to know if someone already did it.

Things i dislike about dbunit:

1) The simplest format to write and get started is deprecated. They want you to use formats that are bloated. Some even require xml schemas. Yeah, whatever.

2) They populate rows not in the order you write them, but in the order tables are defined in the xml file. This is really bad because you can't order your data in such a way that foreign key constraints won't cause problems. This just forces you to go through the hassle of turning them off altogether.

This also wastes time and bloats up your junit base classes to include code to disable the foreign key constraints. You will probably have to test for the database type (hsqldb, etc.) and disable them in database-specific ways. This is way bad.

It could be better if dbunit helped in disabling foreign key constraints as part of their framework automatically, but they don't do this. They do keep track of dialects... so why not use them for this? Ultimately, all of this does is force the programmer to waste time and not get up and testing quickly.

3) XML is a pain to write. I don't need to say more about this. They also offer so many ways to do it, that I think it just complicates matters. Just offer one really solid way and be done with it.

4) When your data gets large, keeping track of the ids and their consistent/correct relationships is a royal pain.

Also, if you don't work on a project for a month, how are you to remember that user_id 1 was an admin, user_id 2 was a business user, user_id 3 was an engineer and user_id 4 was something else? Going back to check this is wasting more time. There should be a meaningful way to retrieve it other than an arbitrary number.

5) It's slow. I've found that unless hsqldb is used, it is painfully slow. It doesn't have to be. There are also numerous ways to mess up its configuration as it is not easy to do "out of the box". There is a hump that you must go through to get it working right. All this does is encourage people to not use it, or be pissed of when they do start to use it.

6) Some values tend to repeat a lot, likes dates. It'd be nice to specify defaults, or even have the framework put defaults in automatically, even without you telling it to put defaults in there. That way you can create objects just with the values you want, and leave the rest off. This sure beats specifying every nook and cranny of a column if it's not required.

7) Probably the most annoying thing is that the first entry must include ALL the values - even null placeholders - or future rows won't pick the columns that you actually specified.

DBunit doesn't have a sensible default for translating [NULL] to a real null value either. You have to manually add it. Tell me, who hasn't done this with dbunit? Everyone has. It shouldn't be like this!

What this means is that if you have a polymorphic object, you must declare all the foreign keys to the joining tables of each subclass in the first row, even though they are null. If you do a table for all subclasses pattern, you still have to specify all the fields on the first row. This is just awful.

Anything out there to satisfy me, or should I become the next framework developer of a much better database testing framework?

I'm not aware of any real alternative to DbUnit and none of the tools mentioned by @Joe are in my eyes:

  • Incanto: not DB agnostic
  • SQLUnit: a regression and unit testing harness for testing database stored procedures (that's not what DbUnit is about)
  • Cactus: a tool for In-container testing (I fail to see where it helps with databases)
  • Liquibase: a database migration tool (doesn't load/verify data)
  • ORMUnit: can initialize a database but that's all
  • JMock: doesn't compete with DbUnit at all

That being said, I've personally used DbUnit successfully several times, on small and huge projects, and I find it pretty usable, especially when using Unitils and its DbUnit module. This doesn't mean it's perfect and can't be improved but with decent tooling (either custom made or something like Unitils), using it has been a decent experience.

So let me answer some of your points:

1) The simplest format to write and get started is deprecated. They want you to use formats that are bloated. Some even require xml schemas. Yeah, whatever.

DbUnit supports flat or structured XML, XLS, CSV. What revolutionary format would you like to use? By the way, a DTD or schema is not mandatory when using XML. But it gives you nice things like validation and auto-completion, how is that bad? And Unitils can generate it easily for you, see Generate an XSD or DTD of the database structure.

It could be better if dbunit helped in disabling foreign key constraints as part of their framework automatically, but they don't do this. They do keep track of dialects... so why not use them for this? Ultimately, all of this does is force the programmer to waste time and not get up and testing quickly.

They are waiting for your patch.

Meanwhile, Unitils provides support to handle constraints transparently, see Disabling constraints and updating sequences.

3) XML is a pain to write. I don't need to say more about this. They also offer so many ways to do it, that I think it just complicates matters. Just offer one really solid way and be done with it.

I guess pain is subjective but I don't find it painful, especially when using a schema and autocompletion. What is the silver bullet you're suggesting?

4) When your data gets large, keeping track of the ids and their consistent/correct relationships is a royal pain.

Keep them small, that's a know best practice. You're going against a known best practice and then complain...

Also, if you don't work on a project for a month, how are you to remember that user_id 1 was an admin, user_id 2 was a business user, user_id 3 was an engineer and user_id 4 was something else? Going back to check this is wasting more time. There should be a meaningful way to retrieve it other than an arbitrary number.

Yes, task switching is counter productive. But since you're working with low level data, you have to know how they are represented, there is no magic solution unless you use a higher level API of course (but that's not the purpose of DbUnit).

5) It's slow. I've found that unless hsqldb is used, it is painfully slow. It doesn't have to be. There are also numerous ways to mess up its configuration as it is not easy to do "out of the box". There is a hump that you must go through to get it working right. All this does is encourage people to not use it, or be pissed of when they do start to use it.

That's inherent to databases and JDBC, not DbUnit. Use a fast database like H2 if you want things to be as fast as possible (if you have a better agnostic way to do things, I'd be glad to learn about it).

6) Probably the most annoying thing is that the first entry must include ALL the values - even null placeholders - or future rows won't pick the columns that you actually specified.

Not when using Unitils as mentioned in presentations like Unitils - Home - JavaPolis 2008 or Unit testing: unitils & dbmaintain.

Anything out there to satisfy me, or should I become the next framework developer of a much better database testing framework?

If you think you can make things better, maybe contribute to existing solutions. If that's not possible and if you think you can create the killer database testing framework, what can I say, do it. But don't forget, ranting is easy, coming up with solutions using your own solutions is less so.