Best .net questions in July 2011

If strings are immutable in .NET, then why does Substring take O(n) time?

149 votes

Given that strings are immutable in .NET, I'm wondering why they have been designed such that string.Substring() takes O(substring.Length) time, instead of O(1)?

i.e. what were the tradeoffs, if any?

UPDATE: I liked this question so much, I just blogged it. See http://blogs.msdn.com/b/ericlippert/archive/2011/07/19/strings-immutability-and-persistence.aspx


The short answer is: O(n) is O(1) if n does not grow large. Most people extract tiny substrings from tiny strings, so how the complexity grows asymptotically is completely irrelevant.

The long answer is:

An immutable data structure built such that operations on an instance permit re-use of the memory of the original with only a small amount (typically O(1) or O(lg n)) of copying or new allocation is called a "persistent" immutable data structure. Strings in .NET are immutable; your question is essentially "why are they not persistent"?

Because when you look at operations that are typically done on strings in .NET programs, it is in every relevant way hardly worse at all to simply make an entirely new string. The expense and difficulty of building a complex persistent data structure doesn't pay for itself.

People typically use "substring" to extract a short string -- say, ten or twenty characters -- out of a somewhat longer string -- maybe a couple hundred characters. You have a line of text in a comma-separated file and you want to extract the third field, which is a last name. The line will be maybe a couple hundred characters long, the name will be a couple dozen. String allocation and memory copying of fifty bytes is astonishingly fast on modern hardware. That making a new data structure that consists of a pointer to the middle of an existing string plus a length is also astonishingly fast is irrelevant; "fast enough" is by definition fast enough.

The substrings extracted are typically small in size and short in lifetime; the garbage collector is going to reclaim them soon, and they didn't take up much room on the heap in the first place. So using a persistent strategy that encourages reuse of most of the memory is also not a win; all you've done is made your garbage collector get slower because now it has to worry about handling interior pointers.

If the substring operations people typically did on strings were completely different, then it would make sense to go with a persistent approach. If people typically had million-character strings, and were extracting thousands of overlapping substrings with sizes in the hundred-thousand-character range, and those substrings lived a long time on the heap, then it would make perfect sense to go with a persistent substring approach; it would be wasteful and foolish not to. But most line-of-business programmers do not do anything even vaguely like those sorts of things. .NET is not a platform that is tailored for the needs of the Human Genome Project; DNA analysis programmers have to solve problems with those string usage characteristics every day; odds are good that you do not. The few who do build their own persistent data structures that closely match their usage scenarios.

For example, my team writes programs that do on-the-fly analysis of C# and VB code as you type it. Some of those code files are enormous and thus we cannot be doing O(n) string manipulation to extract substrings or insert or delete characters. We have built a bunch of persistent immutable data structures for representing edits to a text buffer that permit us to quickly and efficiently re-use the bulk of the existing string data and the existing lexical and syntactic analyses upon a typical edit. This was a hard problem to solve and its solution was narrowly tailored to the specific domain of C# and VB code editing. It would be unrealistic to expect the built-in string type to solve this problem for us.

What is a good choice of database for a small .NET application?

65 votes

I'm developing some small application with C# in .NET and I wanna have some small local Database next to it where I can save and retrieve records by Sql queries. I don't need anything powerful, just something to use instead of keeping records in files like .txt. so what's your suggestion for that? thanks p.s. I already tried to use .mdf and .sdf , but no success, do you think they work well too? how?

You have a couple of immediately recognisable and free options:

The SQL Server Compact download comes with the ADO.NET provider that you will need to reference in code. The SQLite download might not have it so here is a link:

http://sqlite.phxsoftware.com/

They both use SQL, though likely with a few limitations / quirks. Management Studio works with Compact, whereas with SQLite you will need another UI tool such as SQLite Administrator:

http://sqliteadmin.orbmu2k.de/

There are NoSQL alternatives, such as:

Personally I would avoid using MS Access in the face of other free options. You cannot go wrong with either Compact or SQLite, they are both lovely small databases that run relatively quickly in little RAM - personal preference as to the religious aspects about liking a Microsoft product I suppose :-)

I use Sterling for Windows Phone programming as it is built to use Isolated Storage. I have only seen articles on RavenDb, but I can tell you that it is a JSON based document storage framework.

Not to confuse the situation (go with SQLite or SQL Server Compact), but there are other embedded / local databases out there, some are relational others are object-oriented:

Not all of these are free. SQL / LINQ / in-proc support differs across them all. This list is just for curiosity.

There is now also Karvonite, however the code gallery link is broken. When it's live again I'll be looking into this one for WP7 development.

Is there an alternative to bastard injection? (AKA poor man's injection via default constructor)

47 votes

I most commonly am tempted to use "bastard injection" in a few cases. When I have a "proper" dependency-injection constructor:

public class ThingMaker {
    ...
    public ThingMaker(IThingSource source){
        _source = source;
    }

But then, for classes I am intending as public APIs (classes that other development teams will consume), I can never find a better option than to write a default "bastard" constructor with the most-likely needed dependency:

    public ThingMaker() : this(new DefaultThingSource()) {} 
    ...
}

The obvious drawback here is that this creates a static dependency on DefaultThingSource; ideally, there would be no such dependency, and the consumer would always inject whatever IThingSource they wanted. However, this is too hard to use; consumers want to new up a ThingMaker and get to work making Things, then months later inject something else when the need arises. This leaves just a few options in my opinion:

  1. Omit the bastard constructor; force the consumer of ThingMaker to understand IThingSource, understand how ThingMaker interacts with IThingSource, find or write a concrete class, and then inject an instance in their constructor call.
  2. Omit the bastard constructor and provide a separate factory, container, or other bootstrapping class/method; somehow make the consumer understand that they don't need to write their own IThingSource; force the consumer of ThingMaker to find and understand the factory or bootstrapper and use it.
  3. Keep the bastard constructor, enabling the consumer to "new up" an object and run with it, and coping with the optional static dependency on DefaultThingSource.

Boy, #3 sure seems attractive. Is there another, better option? #1 or #2 just don't seem worth it.

As far as I understand, this question relates to how to expose a loosely coupled API with some appropriate defaults. In this case, you may have a good Local Default, in which case the dependency can be regarded as optional. One way to deal with optional dependencies is to use Property Injection instead of Constructor Injection - in fact, this is sort of the poster scenario for Property Injection.

However, the real danger of Bastard Injection is when the default is a Foreign Default, because that would mean that the default constructor drags along an undesirable coupling to the assembly implementing the default. As I understand this question, however, the intended default would originate in the same assembly, in which case I don't see any particular danger.

In any case you might also consider a Facade as described in one of my earlier answers: Dependency Inject (DI) "friendly" library

BTW, the terminology used here is based on the pattern language from my book.

Caching reflection data

30 votes

What's the best way to cache expensive data obtained from reflection? For example most fast serializers cache such information so they don't need to reflect every time they encounter the same type again. They might even generate a dynamic method which they look up from the type.

Before .net 4

Traditionally I've used a normal static dictionary for that. For example:

private static ConcurrentDictionary<Type, Action<object>> cache;

public static DoSomething(object o)
{
   Action<object> action;
   if(cache.TryGetValue(o.GetType(), out action))
   {//Simple lookup, fast!
     action(o);
   }
   else
   {
     // Do reflection to get the action
     // slow
   }
} 

This leaks a bit of memory, but since it does that only once per Type and types lived as long as the AppDomain I didn't consider that a problem.

Since .net 4

But now .net 4 introduced http://msdn.microsoft.com/en-us/library/dd554932(VS.100).aspx. If I ever used DoSomething on an object declared in the collectible assembly that assembly won't ever get unloaded. Ouch.

So what's the best way to cache per type information in .net 4 that doesn't suffer from this problem? The easiest solution I can think of is a:

private static ConcurrentDictionary<WeakReference, TCachedData> cache.

But the IEqualityComparer<T> I'd have to use with that would behave very strangely and would probably violate the contract too. I'm not sure how fast the lookup would be either.

Another idea is to use an expiration timeout. Might be the simplest solution, but feels a bit inelegant.


In the cases where the type is supplied as generic parameter I can use a nested generic class which should not suffer from this problem. But his doesn't work if the type is supplied in a variable.

class MyReflection
{
    internal Cache<T>
    {
      internal static TData data;
    }

    void DoSomething<T>()
    {
      DoSomethingWithData(Cache<T>.data);
      //Obviously simplified, should have similar creation logic to the previous code.
    }
}

Update: One idea I've just had is using Type.AssemblyQualifiedName as the key. That should uniquely identify that type without keeping it in memory. I might even get away with using referential identity on this string.

One problem that remains with this solution is that the cached value might keep a reference to the type too. And if I use a weak reference for that it will most likely expire far before the assembly gets unloaded. And I'm not sure how cheap it is to Get a normal reference out of a weak reference. Looks like I need to do some testing and benchmarking.

ConcurrentDictionary<WeakReference, CachedData> is incorrect in this case. Suppose we are trying to cache info for type T, so WeakReference.Target==typeof(T). CachedData most likely will contain reference for typeof(T) also. As ConcurrentDictionary<TKey, TValue> stores items in the internal collection of Node<TKey, TValue> you will have chain of strong references: ConcurrentDictionary instance -> Node instance -> Value property (CachedData instance) -> typeof(T). In general it is impossible to avoid memory leak with WeakReference in the case when Values could have references to their Keys.

It was necessary to add support for ephemerons to make such scenario possible without memory leaks. Fortunately .NET 4.0 supports them and we have ConditionalWeakTable<TKey, TValue> class. It seems the reasons to introduce it are close to your task.

This approach also solves problem mentioned in your update as reference to Type will live exactly as long as Assembly is loaded.

29 votes

I am trying to do operator overloads for "+=", but I can't. I can only make an operator overload for "+".

How come?

Edit

The reason this is not working is that I have a Vector class (with an X and Y field). Consider the following example.

vector1 += vector2;

If my operator overload is set to:

public static Vector operator +(Vector left, Vector right)
{
    return new Vector(right.x + left.x, right.y + left.y);
}

Then the result won't be added to vector1, but instead, vector1 will become a brand new Vector by reference as well.

Updated the answer, see below.


Overloadable Operators, from MSDN:

Assignment operators cannot be overloaded, but +=, for example, is evaluated using +, which can be overloaded.

Even more, none of assignment operator can't be overloaded. I think, this is because there will be a some effect for the Garbage collection and memory management, which is potential security hole in CLR strong typed world.

Nevertheless, let see what exactly operator is. According to the famous Jeffrey Richter's book, each programming language has its own operators list, which are compiled in a special method calls, and CLR itself doesn't know anything about operators. So let's see what exactly stays behind the + and += operators.

See this simple code:

    Decimal d = 10M;
    d = d + 10M;
    Console.WriteLine(d);

Let view the IL-code for this instructions:

  IL_0000:  nop
  IL_0001:  ldc.i4.s   10
  IL_0003:  newobj     instance void [mscorlib]System.Decimal::.ctor(int32)
  IL_0008:  stloc.0
  IL_0009:  ldloc.0
  IL_000a:  ldc.i4.s   10
  IL_000c:  newobj     instance void [mscorlib]System.Decimal::.ctor(int32)
  IL_0011:  call       valuetype [mscorlib]System.Decimal [mscorlib]System.Decimal::op_Addition(valuetype [mscorlib]System.Decimal,
                                                                                                valuetype [mscorlib]System.Decimal)
  IL_0016:  stloc.0

Now lets see this code:

Decimal d1 = 10M;
d1 += 10M;
Console.WriteLine(d1);

And IL-code for this:

  IL_0000:  nop
  IL_0001:  ldc.i4.s   10
  IL_0003:  newobj     instance void [mscorlib]System.Decimal::.ctor(int32)
  IL_0008:  stloc.0
  IL_0009:  ldloc.0
  IL_000a:  ldc.i4.s   10
  IL_000c:  newobj     instance void [mscorlib]System.Decimal::.ctor(int32)
  IL_0011:  call       valuetype [mscorlib]System.Decimal [mscorlib]System.Decimal::op_Addition(valuetype [mscorlib]System.Decimal,
                                                                                                valuetype [mscorlib]System.Decimal)
  IL_0016:  stloc.0

They are equal! So the += operator is just a sugar for your program in C#, and you can simply overload + operator.

For example:

    class Foo
    {
        private int c1;

        public Foo(int c11)
        {
            c1 = c11;
        }

        public static Foo operator +(Foo c1, Foo x)
        {
            return new Foo(c1.c1 + x.c1);
        }
    }

    static void Main(string[] args)
    {
        Foo d1 =  new Foo (10);
        Foo d2 = new Foo(11);
        d2 += d1;
    }

This code will be compiled and successfully run as:

  IL_0000:  nop
  IL_0001:  ldc.i4.s   10
  IL_0003:  newobj     instance void ConsoleApplication2.Program/Foo::.ctor(int32)
  IL_0008:  stloc.0
  IL_0009:  ldc.i4.s   11
  IL_000b:  newobj     instance void ConsoleApplication2.Program/Foo::.ctor(int32)
  IL_0010:  stloc.1
  IL_0011:  ldloc.1
  IL_0012:  ldloc.0
  IL_0013:  call       class ConsoleApplication2.Program/Foo ConsoleApplication2.Program/Foo::op_Addition(class ConsoleApplication2.Program/Foo,
                                                                                                          class ConsoleApplication2.Program/Foo)
  IL_0018:  stloc.1

Update:

According to your Update - as the @Eric Lippert says, you really should have the vectors as immutable objects. Result of adding of the two vectors is a new vector, not the first one with different sizes.

If, for some reason you need to change first vector, you can use this overload (but as for me, this is very strange behaviour):

public static Vector operator +(Vector left, Vector right)
{
    left.x += right.x;
    left.y += right.y;
    return left;
}

C# and immutability and readonly fields... a lie?

22 votes

I have found that People claim that using all readonly fields in a class does not necessarily make that class's instance immutable because there are "ways" to change the readonly field values even after initialization (construction).

How? What ways?

So my question is when can we really have a "real" immutable object in C#, that I can safely use in threading?

Also do anonymous types create immutable objects? And some say LINQ uses immutable objecst internally. How exactly?

You've asked like five questions in there. I'll answer the first one:

Having all readonly fields in a class does not necessarily make that class's instance immutable because there are "ways" to change the readonly field values even after construction. How?

Is it possible to change a readonly field after construction?

Yes, if you are sufficiently trusted to break the rules of read-only-ness.

How does that work?

Every bit of user memory in your process is mutable. Conventions like readonly fields might make certain bits appear to be immutable, but if you try hard enough, you can mutate them. For example, you can take an immutable object instance, obtain its address, and change the raw bits directly. Doing so might require a great deal of cleverness and knowledge of the internal implementation details of the memory manager, but somehow the memory manager manages to mutate that memory, so you can too if you try hard enough. You can also use "private reflection" to break various parts of the safety system if you are sufficiently trusted.

By definition, fully trusted code is allowed to break the rules of the safety system. That's what "fully trusted" means. If your fully trusted code chooses to use tools like private reflection or unsafe code to break the memory safety rules, fully trusted code is allowed to do that.

Please don't. Doing so is dangerous and confusing. The memory safety system is designed to make it easier to reason about the correctness of your code; deliberately violating its rules is a bad idea.

So, is "readonly" a lie? Well, suppose I told you that if everyone obeys the rules, everyone gets one slice of cake. Is the cake a lie? That claim is not the claim "you will get a slice of cake". That's the claim that if everyone obeys the rules, you'll get a slice of cake. If someone cheats and takes your slice, no cake for you.

Is a readonly field of a class readonly? Yes but only if everyone obeys the rules. So, readonly fields are not "a lie". The contract is, if everyone obeys the rules of the system then the field is observed to be readonly. If someone breaks the rules, then maybe it isn't. That doesn't make the statement "if everyone obeys the rules, the field is readonly" a lie!

A question you did not ask, but perhaps should have, is whether "readonly" on fields of a struct is a "lie" as well. See Does using public readonly fields for immutable structs work? for some thoughts on that question. Readonly fields on a struct are much more of a lie than readonly fields on a class.

As for the rest of your questions -- I think you'll get better results if you ask one question per question, rather than five questions per question.

How to handle vague dates in .Net

21 votes

I have a system that takes information from an external source and then stores it to be displayed later.

One of the data items is a date. On the source system they have the concept of a fuzzy date i.e. not accurate to a specific day or sometimes not to a month as well. So I get dates in the format:

dd/mm/yyyy
mm/yyyy
yyyy

I can parse these to DateTime objects and work with these but when rendering later I need to be able to determine the accuracy of the date since parsing "2010" will result in a date of "01/01/2010". I want to show just the year so need to know it's original accuracy.

I've mocked up a quick class to deal with this:

public class FuzzyDate
{
    public DateTime Date { get; set; }
    public DateType Type { get; set; }
}

public enum DateType
{
    DayMonthYear,
    MonthYear,
    Year
}

This will do the job for me and I can do something on the parse to handle it but I feel like this is probably quite a common problem and there is probably an existing cleaner solution.

Is there something built into .Net to do this? I had a look at the culture stuff but that didn't quite seem right.

Any help would be appreciated.

To answer your question: There is nothing built into .NET to handle this gracefully.

Your solution is as valid as any I've seen. You will probably wish to embellish your class with overrides to the ToString() method that will render your date appropriately based on the DateType.

Here are a couple other threads that attempt to address this question:

Good luck!

string.Empty vs null.Which one do you use?

20 votes

Recently a colleague at work told me not to use string.Empty when setting a string variable but use null as it pollutes the stack?

He says don't do

string myString=string.Empty; but do string mystring=null;

Does it really matter? I know string is an object so it sort of makes sense.

I know is a silly question but what is your view?

null and Empty are very different, and I don't suggest arbitrarily switching between them. But neither has any extra "cost", since Empty is a single fixed reference (you can use it any number of times).

There is no "pollution" on the stack caused by a ldsfld - that concern is.... crazy. Loading a null is arguably marginally cheaper, but could cause null-reference exceptions if you aren't careful about checking the value.

Personally, I use neither... If I want an empty string I use "" - simple and obvious. Interning means this also has no per-usage overhead.

When is a .NET namespace implemented by a .NET Framework component?

20 votes

(Yet another question from my "Clearly I'm the only idiot out here" series.)

When I need to use a class from the .NET Framework, I dutifully look up the documentation to determine the corresponding namespace and then add a "using" directive to my source code:

using System.Text.RegularExpressions;

Usually I'm good to go at this point, but sometimes Intellisense doesn't recognize the new class and the project won't build. A quick check in the Object Browser confirms that I have the right namespace. Frustration ensues.

Using HttpUtility.UrlEncode() involved adding the appropriate directive:

using System.Web;

But it also required adding a reference to .NET Framework Component for System.Web, i.e. right-click the project in Solution Explorer, select Add Reference and add System.Web from the .NET tab.

How might I discern from the documentation whether a .NET namespace is implemented by a .NET Framework Component that must be referenced? I'd rather not hunt through the available components every time I use a namespace on the off chance that a reference is needed.

(For those who like to stay after class and clean the erasers: Will Organize Usings > Remove and Sort also remove references to componenents that are not used elsewhere in the project? How do you clean up unnecessary references?)

Check out this link for UrlEncode:

Namespace: System.Web

Assembly: System.Web (in System.Web.dll)

The Assembly line tells you which dll to reference.

Enum addition vs subtraction and casting

19 votes

Why does addition require a cast but subtraction works without a cast? See the code below to understand what I am asking

public enum Stuff
{
    A = 1,
    B = 2,
    C = 3
}

var resultSub = Stuff.A - Stuff.B; // Compiles
var resultAdd = Stuff.A + Stuff.B; // Does not compile
var resultAdd2 = (int)Stuff.A + Stuff.B; // Compiles     

note: For both addition and subtraction it does not matter whether result is out of range (of the enum) or not in all three examples above.

Good question - I was surprised that the first and third lines worked.

However, they are supported in the C# language specification - in section 7.8.4, it talks about enumeration addition:

Every enumeration type implicitly provides the following pre-defined operators, where E is the enum type and U is the underlying type of E:

E operator +(E x, U y)
E operator +(U x, E y)

At runtime, these operators are ealuated exactly as (E)((U)x + (U)y)

And in section 7.8.5:

Every enumeration type implicitly provides the following predefined operator, where E is the enum type and U is the underlying type of E:

U operator -(E x, E y)

This operator is evaluated exactly as (U)((U)x - (U)y)). In other words, the operator computes the difference between the ordinal values of x and y, and the type of the result is the underlying type of the enumeration.

E operator -(E x, U y);

This operator is evaluated exactly as (E)((U)x - y). In other words, the operator subtracts a value from the underlying type of the enumeration, yielding a value of the enumeration.

So that's why the compiler behaves like that - because it's what the C# spec says to do :)

I wasn't aware that any of these operators exist, and I've never knowingly seen them used. I suspect the reasons for their existence are buried somewhere in the language design meeting notes that Eric Lippert occasionally dives into - but I also wouldn't be surprised if they were regretted as adding features for little benefit. Then again, maybe they're really useful in some situations :)

low priority http upload in .net

18 votes

I'm writing a program that uploads huge amounts of data and I need to limit it's interference with web browsing and other user activities.

The upload is composed of many large-ish files that are transferred individually, the connection must be a standard HTTP POST (I have no control of the server) and I need to control the HTTP headers (the server uses them for authentication and metadata)

It's important that the upload will resume full speed when the user is no longer using the internet because otherwise it will never finish (I expect it will need to run for a week or more at full speed to complete).

I want to solve this problem by somehow making my HTTP connection low priority, detecting open browser windows and slowing down does not solve the problem because (a) the user may be using a non-browser app (FTP, twitter client, e-mail, etc.) and (b) I don't want to slow down if there's an open idle web browser window.

I've found BITS but I think it's not relevant for me since I need it to be a standard HTTP POST.

I'm using .net 3.5, the program is written in C# and I'm currently using HttpWebRequest for the upload.

Clarification: I’m writing consumer software that will run on the customer’s personal computer at home. My beta testers complain that the internet is slow when they run my program (understandable, since I am using all their bandwidth) so I want to give higher priority to other programs so their internet is no longer slow.

There is no fancy network infrastructure that can prioritize packets on the network and no IT team to install and configure anything, I do expect most customers will have a cheap wireless router they got for free from their ISP

Simultaneously keep track of number of bytes your app sends and the total bytes sent on the network using the System.Net.NetworkInformation.IPv4InterfaceStatistics class' bytesSent Property at a given interval. Subtract the total bytes your app has sent in that interval from the total bytes sent on the network (during the same interval). If the difference is high enough to where you need to throttle your uploading then do so. Once the difference becomes small enough, crank up the uploading.

Why is List.Sort() an instance method but Array.Sort() static?

18 votes

I'm trying to understand the design decision behind this part of the language. I admit i'm very new to it all but this is something which caught me out initially and I was wondering if I'm missing an obvious reason. Consider the following code:

List<int> MyList = new List<int>() { 5, 4, 3, 2, 1 };
int[] MyArray = {5,4,3,2,1};


//Sort the list
MyList.Sort();
//This was an instance method


//Sort the Array
Array.Sort(MyArray);
//This was a static method

Why are they not both implemented in the same way - intuitively to me it would make more sense if they were both instance methods?

The question is interesting because it reveals details of the .NET type system. Like value types, string and delegate types, array types get special treatment in .NET. The most notable oddish behavior is that you never explicitly declare an array type. The compiler takes care of it for you with ample helpings of the jitter. System.Array is an abstract type, you'll get dedicated array types in the process of writing code. Either by explicitly creating a type[] or by using generic classes that have an array in their base implementation.

In a largish program, having hundreds of array types is not unusual. Which is okay, but there's overhead involved for each type. It is storage required for just the type, not the objects of it. The biggest chunk of it is the so-called 'method table'. In a nutshell, it is a list of pointers to each instance method of the type. Both the class loader and the jitter work together to fill this table. This is commonly known as the 'v-table' but isn't quite a match, the table contains pointers to methods that are both non-virtual and virtual.

You can see where this leads perhaps, the designers were worried about having lots of types with big method tables. So looked for ways to cut down on the overhead.

Array.Sort() was an obvious target.

The same issue is not relevant for generic types. One method table can handle all of the method pointers for a reference type. Plus the specializations for value types.

C# (not ASP/MVC/WinForms) - Catch all exceptions in a class

15 votes

Some background info

I am programming in a system that uses a proprietary programming language, with the option of using specially attributed .Net classes in the proprietary code.

Unfortunately, the system doesn't handle unhandled exceptions bubbling up from .Net code well, if fact not at all; the system crashes with no explanation. This is annoying, because we often want to handle exceptions in the proprietary system, not in .Net code. The solution offered by the vendor of the system is to repackage the exception into a special object that the system does handle.

Our .Net code is written in a façade pattern, and the problem is that to make sure every exception that bubbles up from the .Net code is handled, every method in the facade must include a try/catch block that repackages any exceptions that may occur.

The question

I've read a lot of threads here describing similar scenarios, most of them WinForms- or web-related. Because our code is neither, the question is if there is some way to catch all exceptions in a class, so that we can repackage it and rethrow a modified version of them?

Obviously, the interface between the .Net dll's containing the classes and the proprietary language is completely beyond our control.

Edit

I tried the currentDomain.UnhandledException method suggested by @VMAtm, unfortunately to no avail. The event handler didn't fire, and the parent system got hold of the exception and then misbehaved as usual. That led me onto Google once more, and I found this paragraph here:

The first thing to understand is that the UnhandledException event is not an unhandled exception "handler". Registering for the event, contrary to what the documentation says :-(, does not cause unhandled exceptions to be handled. (Since then they wouldn't be unhandled, but I'll stop with the circular reasoning already...) The UnhandledException event simply notifies you that an exception has gone unhandled, in case you want to try to save state before your thread or application dies.

Jonathan Keljo, CLR Exceptions PM

That was too bad, I liked the idea of having a "global" try/catch block. What I guess it means is that I'm not successful in hiding the exception from the parent system. Since I don't know the first thing about how this is implemented in that system (and frankly, I don't know the first thing about how I'd go on to implement it myself) I'm on really thin ice with my assumptions, so if anyone can correct me in any way, please go ahead!

Ohh, the error I'm getting in the parent system is Exception has been thrown by the target of an invocation., which is as far as I know the message from the outer .Net exception occurring. If it's possible to read anything out of that, I don't know.

I'll have a go at the Castle Dynamic Proxy suggested by @jlew as well, but it looked a lot harder than the two AppDomain lines and scared me quite a bit :)

Solution

If you are having the same problem as I had, you should try the currentDomain.UnhandledException method suggested by @VMAtm first, because it's because of my parent system being especially anal it didn't work.

I got it working by using the Castle DynamicProxy setup. It was really very easy to set up. My test case was a façade class encapsulating the XmlAttribute class. The first thing I had to do was to write the proxy class:

public class AttribInterceptor : IInterceptor
{
    public void Intercept(IInvocation invocation)
    {
        try
        {
            invocation.Proceed();
        }
        catch (Exception e)
        {
            // Custom exception repackaging here
        }
    }
}

Then I had to instruct the façade object to actually use the proxy. I kept my old backend field, but added the following to the c'tor:

public class CapXmlAttribute : CapPmlNetObject
{
    private XmlAttributeBackend _xmlAttribute;

    public CapXmlAttribute()
    {
        var generator = new ProxyGenerator();
        _xmlAttribute = (XmlAttributeBackend) generator.CreateClassProxy(
            typeof (XmlAttributeBackend), new AttribInterceptor());
    }
}

The last step was setting all methods in the backend that is exposed to the façade as virtual. This was no problem for me, but might be a dealbreaker for others.

DynamicProxy really isn't that good documented, but I learned a lot from Krzysztof Koźmic's tutorial and Hamilton Verissimo's codeproject.

I would take a look at using something like Castle Dynamic Proxy. This will allow your class method calls to be intercepted in a generic way, which would give you a central place to put a "catch-all" exception handler. (That said, it's unclear to me how your classes are actually instantiated, which might make this approach problematic)

Is there a library of default unit tests for .NET interface implementations?

14 votes

For example, I have a type which implements IDictionary, and I need test coverage to ensure that it does it properly. I just wondered if there is a repository of standard, reusable tests for this kind of thing anywhere; if there isn't, I might create one.

I couldn't find anything which specifically met this need, so I made it. I've created a Github repo and added my IDictionary<TKey,TValue> test to it; hopefully people will fork and contribute more tests.

https://github.com/markrendle/InterfaceTests.Net

Is string interning really useful?

11 votes

I was having a conversation about strings and various languages a while back, and the topic of string interning came up. Apparently Java and the .NET framework do this automatically with all strings, as well as several scripting languages. Theoretically, it saves memory because you don't end up with multiple copies of the same string, and it saves time because string equality comparisons are a simple pointer comparison instead of an O(N) run through each character of the string.

But the more I think about it, the more skeptical I grow of the concept's benefits. It seems to me that the advantages are mostly theoretical:

  • First off, to use automatic string interning, all strings must be immutable, which makes a lot of string processing tasks harder than they need to be. (And yes, I've heard all the arguments for immutability in general. That's not the point.)
  • Every time a new string is created, it has to be checked against the string interning table, which is at least a O(N) operation. (EDIT: Where N is the size of the string, not the size of the table, since this was confusing people.) So unless the ratio of string equality comparisons to new string creation is pretty high, it's unlikely that the net time saved is a positive value.
  • If the string equality table uses strong references, the strings will never get garbage collected when they're no longer needed, thus wasting memory. On the other hand, if the table uses weak references, then the string class requires some sort of finalizer to remove the string from the table, thus slowing down the GC process. (Which could be pretty significant, depending on how the string intern table is implemented. Worst case, deleting an item from a hash table can require an O(N) rebuild of the entire table under certain circumstances.)

This is just the result of me thinking about implementation details. Is there something I've missed? Does string interning actually provide any significant benefits in the general case?

EDIT 2: All right, apparently I was operating from a mistaken premise. The person I was talking to never pointed out that string interning was optional for newly-created strings, and in fact gave the strong impression that the opposite was true. Thanks to Jon for setting the matter straight. Another accepted answer for him.

No, Java and .NET don't do it "automatically with all strings". They (well, Java and C#) do it with constant string expressions expressed in bytecode/IL, and on demand via the String.intern and String.Intern (.NET) methods. The exact situation in .NET is interesting, but basically the C# compiler will guarantee that every reference to an equal string constant within an assembly ends up referring to the same string object. That can be done efficiently at type initialization time, and can save a bunch of memory.

It doesn't happen every time a new string is created.

(On the string immutability front, I for one am extremely glad that strings are immutable. I don't want to have to take a copy every time I receive a parameter etc, thank you very much. I haven't seen it make string processing tasks harder, either...)

And as others have pointed out, looking up a string in a hash table isn't generally an O(n) operation, unless you're incredibly unlucky with hash collisions...

Personally I don't use string interning in user-land code; if I want some sort of cache of strings I'll create a HashSet<string> or something similar. That can be useful in various situations where you expect to come across the same strings several times (e.g. XML element names) but with a simple collection you don't pollute a system-wide cache.

Is there a CSS object model or CSS querying api for .net?

9 votes

Is a library out there that will allow me to write the following kind of code, which parses CSS and returns a queryable object model

string input = "p, span { font-family: arial; }";
var cssRules = new Parser().Parse(input);
var rule = cssRules.Find(new Selector("p")).First();

Assert.That(rule.Attribute("font-family").Value, Is.Equal.To("arial"));

I've taken a look at dotless http://www.dotlesscss.org/, downloaded their code and examined some of the relevant unit tests and fixtures. It looks promising but I can't quite work out how to use it to parse and query plain CSS.

The closest I know is CssParser from jsonfx.net:

http://css-parser.googlecode.com/svn/trunk/CssParser/

You can parse any css and browse through selectors afterwards using StyleSheet property of CssParser

Is it possible to create routes dynamically in .NET 4?

9 votes

In our application we use the new .NET 4 routing system to route certain requests to other parts of the site. We are only allowed to publish our site code in late evenings which means we have to stay late at work to publish any code changes. We frequently have the need to create custom routes to support legacy links to old content and route them to the new content. These are often needed right away and as our routes are defined in compiled global.asax we reach an impasse when we need these live immediately but cannot do a code push.

Is there a way that we could define routes in some sort of configuration file and have the site read them in programmatically without restarting the application?

Phil Haack gives a solution for dynamically registering routes here: http://haacked.com/archive/2010/01/17/editable-routes.aspx

I think that covers what you want, including not having the whole Web App restart.

Adding custom ValueProviderFactories to ASP.NET MVC3?

8 votes

I was looking to try and add a Protobuf ValueProviderFactory to MVC3 so that I could pick out the MIME type and deserialize the raw data into objects for action parameters. I could also use this to change the default Json serializer.

Looking at JsonValueProviderFactory.cs this shouldn't be too difficult, but the factories all appear to be hard-coded.

For Protobuf I may be able to do something with an IValueProvider but I haven't even checked yet what MVC3 does when it recieves an MIME type of application/x-protobuf.

Am I going about this the right way?

UPDATE

I found this blog post that talks about creating an IValueProvider. It then mentions at the bottom that this changed around MCV2. He changed it to a ValueProviderFactory and calls :

ValueProviderFactories.Factories.Add(new HttpCookieValueProviderFactory());

But in MVC3 this property is read only.

It turns out that it is not read only and you can add providers as follows:

ValueProviderFactories.Factories.Add(new MyValueProviderFactory());

I would have know this had I checked myself!

I've done some more searching today, and this blog post seems to suggest that the DependencyResolver will find any classes that inherit ValueProviderFactory. I'm using MEF for dependency resolution so I can just add an Export attribute and it'll get picked up automatically.

I now have a further issue writing a custom ValueProviderFactory for protobuf-net.

New Windows Application - What language?

7 votes

We are currently in pre phase of developing a desktop application for windows. But when hearing all the latest discussions on Windows 8, Silverlight, WPF, Jupiter I don't know what to believe anymore. Is it wrong starting a new project with WPF now? Should I switch to Silverlight? Or should I wait until more details of Windows 8 comes out?

Interesting topic, main point is if you do it windows based (Winforms or WPF) or web based (ASP.NET MVC or Silverlight).

For a new business application we are just starting now, business wants windows based because they want more features, IT wants web based to have no deployment issues and easier support on a centralized server instead of trying to figure out what the client machines has...

In fact I believe WPF is ready for LOB (even if there are still less third party controls comparing to winforms).

I would not invest on SL because still requires a plugin and with MVC / Ajax and HTML5 you can do the same and more with no plugins required and having same UI running on all browsers and platforms ( I focus very much of have my web app running also in iPad and Android tablets with no changes )...

Main point is the architecture, how you distribute it across servers and tiers so to have well distributed workloads and good reliability... then if you have a UI windows based or web based, as long these UIs consume the same server components exposed as WCF end points for example... is more a "kind" of detail...

OutputCache serving long-stale data

6 votes

I'm flumoxed... re this and this "meta" questions...

A very basic http request:

GET http://stackoverflow.com/feeds/tag?tagnames=c%23&sort=newest HTTP/1.1
Host: stackoverflow.com
Accept-Encoding: gzip,deflate

which hits a route decorated with:

[OutputCache(Duration = 300, VaryByParam = "tagnames;sort",
    VaryByContentEncoding = "gzip;deflate", VaryByCustom = "site")]

is repeatedly and incorrectly serving either a 304 (no change) if you include if-modified-since, or the old data for a 200, i.e.

HTTP/1.1 200 OK
Cache-Control: public, max-age=0
Content-Type: application/atom+xml; charset=utf-8
Content-Encoding: gzip
Expires: Fri, 01 Jul 2011 09:17:08 GMT
Last-Modified: Fri, 01 Jul 2011 09:12:08 GMT
Vary: *
Date: Fri, 01 Jul 2011 09:42:46 GMT
Content-Length: 14714
(payload, when decoded = some long-stale data)

As you can see, it is serving this nearly half an hour past the 5 minute slot; it looks like the internals of OutputCache simply didn't notice the time ;p It will expire eventually (in fact, it has just done so - my Fri, 01 Jul 2011 09:56:20 GMT request finally got fresh data), but not anywhere like punctually.

UPDATE:

I believed that it was working if we took away the accept-encoding header, but no; this fails too - it just fails on a different cycle (which is what we should expect since the keys are different, courtesy of VaryByContentEncoding):

GET http://stackoverflow.com/feeds/tag?tagnames=c%23&sort=newest HTTP/1.1
Host: stackoverflow.com

gives:

HTTP/1.1 200 OK
Cache-Control: public, max-age=0
Content-Type: application/atom+xml; charset=utf-8
Expires: Fri, 01 Jul 2011 10:09:58 GMT
Last-Modified: Fri, 01 Jul 2011 10:04:58 GMT
Vary: *
Date: Fri, 01 Jul 2011 10:17:20 GMT
Content-Length: 66815
(payload = some stale data)

Once again, you'll notice it is being served after Expires.

So: what could be wrong here?

Additional; while we are using a custom option, our GetVaryByCustomString() correctly calls base.GetVaryByCustomString(ctx, custom) for options it doesn't recognise, as per MSDN (indeed this works fine for the second example above).

Is there any chance you're using a custom output cache provider? Hypothetically, if there was a custom provider using say a sliding expiration instead of an absolute one, you'd see symptoms like this.