Best .net questions in October 2011

What is the purpose of a stack? Why do we need it? (MSIL)

86 votes

So I am learning MSIL right now to learn to debug my C# .net applications.

I've always wondered: what is the purpose of the stack?

Just to put my question in context:
Why is there a transfer from memory to stack or "loading?" On the other hand, why is there a transfer from stack to memory or "storing"? Why not just have them all placed in the memory? (reflective question)

  • is it because it's faster?
  • is it because it's RAM based?
  • for efficiency?

I'm trying to grasp this to help me understand IL codes much more deeply.

Thanks for the help.

I've always wondered: what is the purpose of the stack?

I assume you mean the evaluation stack of the MSIL language, and not the actual per-thread stack at runtime.

Why is there a transfer from memory to stack or "loading?" On the other hand, why is there a transfer from stack to memory or "storing"? Why not just have them all placed in the memory?

MSIL is a "virtual machine" language. Compilers like the C# compiler generate IL, and then at runtime another compiler called the JIT (Just In Time) compiler turns the IL into actual machine code that can execute.

So first lets answer the question "why have MSIL at all?" Why not just have the C# compiler write out machine code?

Because it is cheaper to do it this way. Suppose we didn't do it that way; suppose each language has to have its own machine code generator. You have twenty different languages: C#, JScript .NET, Visual Basic, Iron Python, F#... And suppose you have ten different processors. How many code generators do you have to write? 20 x 10 = 200 code generators. That's a lot of work. Now suppose you want to add a new processor. You have to write the code generator for it twenty times, one for each language.

Furthermore, it is difficult and dangerous work. Writing efficient code generators for chips that you are not an expert on is a hard job! Compiler designers are experts on the semantic analysis of their language, not on efficient register allocation of new chip sets.

Now suppose we do it the IL way. How many IL generators do you have to write? One per language. How many jit compilers do you have to write? One per processor. Total: 20 + 10 = 30 code generators. Moreover, the language-to-IL generator is easy to write because IL is a simple language, and the IL-to-machine-code generator is also easy to write because IL is a simple language. We get rid of all of the intricacies of C# and VB and whatnot and "lower" everything to a simple language that is easy to write a jitter for.

Having an intermediate language lowers the cost of producing a new language compiler dramatically. It also lowers the cost of supporting a new chip dramatically. You want to support a new chip, you find some experts on that chip and have them write an IL jitter and you're done; you then support all those languages on your chip.

OK, so we've established why we have MSIL; because having an intermediate language lowers costs. Why then is the language a "stack machine"?

Because stack machines are conceptually very simple for language compiler writers to deal with. Stacks are a simple, easily understood mechanism for describing computations. Stack machines are also conceptually very easy for jit compiler writers to deal with. Using a stack is a simplifying abstraction, and therefore again, it lowers our costs.

You ask "why have a stack at all?" Why not just do everything directly out of memory? Well, let's think about that. Suppose you want to generate IL code for:

int x = A() + B() + C() + 10;

Suppose we have the convention that "add", "call", "store" and so on always take their arguments off the stack and put their result (if there is one) on the stack. To generate IL code for this C# we just say something like:

load the address of x // stack now contains address of x
call A()              // stack contains address of x and result of A()
call B()              // addr of x, result of A(), result of B()
add                   // addr of x, result of A() + B()
call C()              // addr of x, result of A() + B(), result of C()
add                   // addr of x, result of A() + B() + C()
load 10               // addr of x, result of A() + B() + C(), 10
add                   // addr of x, result of A() + B() + C() + 10
store in address      // result is now stored in x, stack is empty.

Now suppose we did it without a stack. We'll do it your way, where every opcode takes the addresses of its operands and the address to which it stores its result:

allocate temporary store T1 for result of A()
call A() with address of T1
allocate temporary store T2 for result of B()
call B() with address of T2
allocate temporary store T3 for result of first addition
add contents of T1 to T2, then store the result into address of T3
allocate temporary store T4 for result of C()
call C() with address of T4
allocate temporary store T5 for result of second addition
...

you see how this goes? Our code is getting huge because we have to explicitly allocate all the temporary storage that would normally by convention just go on the stack. Worse, our opcodes themselves are all getting enormous because they all now have to take as an argument the address that they're going to write their result into, and the address of each operand. An "add" instruction that knows that it is going to take two things off the stack and put one thing on can be a single byte. An add instruction that takes two operand addresses and a result address is going to be enormous.

We use stack-based opcodes because stacks solve the common problem. Namely: I want to allocate some temporary storage, use it very soon and then get rid of it quickly when I'm done. By making the assumption that we have a stack at our disposal we can make the opcodes very small and the code very terse.

UPDATE: Some additional thoughts

Incidentally, this idea of drastically lowering costs by (1) specifing a virtual machine, (2) writing compilers that target the VM language, and (3) writing implementations of the VM on a variety of hardware, is not a new idea at all. It did not originate with MSIL, LLVM, Java bytecode, or any other modern infrastructures. The earliest implementation of this strategy I'm aware of is the pcode machine from 1966.

The first I personally heard of this concept was when I learned how the Infocom implementors managed to get Zork running on so many different machines so well. They specified a virtual machine called the Z-machine and then made Z-machine emulators for all the hardware they wanted to run their games on. This had the added enormous benefit that they could implement virtual memory management on primitive 8-bit systems; a game could be larger than would fit into memory because they could just page the code in from disk when they needed it and discard it when they needed to load new code.

What is a "mostly complete" (im)mutability approach for C#?

27 votes

Since immutability is not fully baked into C# to the degree it is for F#, or fully into the framework (BCL) despite some support in the CLR, what's a fairly complete solution for (im)mutability for C#?

My order of preference is a solution consisting of general patterns/principles compatible with

  • a single open-source library with few dependencies
  • a small number of complementary/compatible open-source libraries
  • something commercial

that

  • covers Lippert's kinds of immutability
  • offers decent performance (that's vague I know)
  • supports serialization
  • supports cloning/copying (deep/shallow/partial?)
  • feels natural in scenarios such as DDD, builder patterns, configuration, and threading

I'd also like to include patterns you as the community might come up with that don't exactly fit in a framework such as expressing mutability intent through interfaces (where both clients that shouldn't change something and may want to change something can only do so through interfaces, and not the backing class (yes, I know this isn't true immutability, but sufficient):

public interface IX
{
    int Y{ get; }
    ReadOnlyCollection<string> Z { get; }
    IMutableX Clone();
}

public interface IMutableX: IX
{
    new int Y{ get; set; }
    new ICollection<string> Z{ get; } // or IList<string>
}

// generally no one should get ahold of an X directly
internal class X: IMutableX
{
    public int Y{ get; set; }

    ICollection<string> IMutableX.Z { get { return z; } }

    public ReadOnlyCollection<string> Z
    {
        get { return new ReadOnlyCollection<string>(z); }
    }

    public IMutableX Clone()
    {
        var c = MemberwiseClone();
        c.z = new List<string>(z);
        return c;
    }

    private IList<string> z = new List<string>();       
}

// ...

public void ContriveExample(IX x)
{
    if (x.Y != 3 || x.Z.Count < 10) return;
    var c= x.Clone();
    c.Y++;
    c.Z.Clear();
    c.Z.Add("Bye, off to another thread");
    // ...
}

I upvoted both Chris's and James's answers because, combined, they suggest the best combination of framework (reusing F# collections' namespace, but from C#) and a means to create immutable types easily via templating. These can be complemented with the "interface-based" notion I described in my question.

So we have our collections via a library and our pattern via code-gen and interfaces; you can code-gen the cloning (or at least the partial method for it: cloning via serialization, field-by-field copy, MemberwiseClone() and so on) and, if desired, create a generic ICloneable<T> interface to support this.

Why won't this seemingly correct .NET code compile?

23 votes

I'm asking in case I'm missing something obvious, but I think I may have stumbled upon a bug in .NET's compiler.

I have two projects in a .NET solution, one visual basic, one C#.

C# code, consisting of three overloaded static methods with default values:

public static class Class1
{

    public static void TestFunc(int val1, int val2 = 0)
    {
    }

    public static void TestFunc(int val1 = 0)
    {
    }

    public static void TestFunc(string val1, int val2 = 0)
    { 
    }
}

Visual basic code, calling one of the overloaded methods:

Option Explicit On
Option Strict On
Imports ClassLibrary1

Module Module1
    Sub Main()
        Dim x As Integer
        Class1.TestFunc(x, 0)
    End Sub
End Module

Compiling this code will fail, saying:

'TestFunc' is ambiguous because multiple kinds of members with this name exist in class 'ClassLibrary1.Class1'.

Why would it see this method as ambiguous? There is only one Class1.TestFunc with an (int, int) signature. Is this a bug, or am I missing something?

If you try to compile this in VB.NET you'll get

Sub TestFunc(ByVal val1 As Integer, Optional ByVal val2 As Integer = 0)

End Sub

Sub TestFunc(Optional ByVal val1 As Integer = 0)

End Sub

you'll get Public Sub TestFunc(val1 As Integer, [val2 As Integer = 0])' and 'Public Sub TestFunc([val1 As Integer = 0])' cannot overload each other because they differ only by optional parameters.

so I'll say that VB.NET is more limited than C# in optional parameters overload.

Why is memory access in the lowest address space (non-null though) reported as NullReferenceException by .NET?

19 votes

This causes an AccessViolationException to be thrown:

using System;

namespace TestApplication
{
    internal static class Program
    {
        private static unsafe void Main()
        {
            ulong* addr = (ulong*)Int64.MaxValue;
            ulong val = *addr;
        }
    }
}

This causes a NullReferenceException to be thrown:

using System;

namespace TestApplication
{
    internal static class Program
    {
        private static unsafe void Main()
        {
            ulong* addr = (ulong*)0x000000000000FF;
            ulong val = *addr;
        }
    }
}

They're both invalid pointers and both violate memory access rules. Why the NullReferenceException?

This is caused by a Windows design decision made many years ago. The bottom 64 kilobytes of the address space is reserved. An access to any address in that range is reported with a null reference exception instead of the underlying access violation. This was a wise choice, a null pointer often produces address reads or writes that are not actually zero. Reading a field of a C++ class object for example, it has an offset from the start of the object. The code will bomb from reading at address 4 (or up).

C# doesn't have quite the same problem, the language guarantees that a null reference is caught before you can, say, call an instance method of a class. This is language specific, you can write managed code in C++/CLI and generate non-zero null pointer dereferences. That method will merrily execute, might not even bomb. Until it tries to deference an instance member which requires taking an offset from this, kaboom then. The C# guarantee is very nice, it makes diagnosing null reference problems much easier since they are generated at the call site and don't bomb somewhere inside the method. But doesn't come for free, explained in this blog post.

Why are delegates reference types?

18 votes

Quick note on the accepted answer: I disagree with a small part of Jeffrey's answer, namely the point that since Delegate had to be a reference type, it follows that all delegates are reference types. (It simply isn't true that a multi-level inheritance chain rules out value types; all enum types, for example, inherit from System.Enum, which in turn inherits from System.ValueType, which inherits from System.Object, all reference types.) However I think the fact that, fundamentally, all delegates in fact inherit not just from Delegate but from MulticastDelegate is the critical realization here. As Raymond points out in a comment to his answer (by the way: Raymond Chen on StackOverflow!), once you've committed to supporting multiple subscribers, there's really no point in not using a reference type for the delegate itself, given the need for an array somewhere.


See update at bottom.

It has always seemed strange to me that if I do this:

Action foo = obj.Foo;

I am creating a new Action object, every time. I'm sure the cost is minimal, but it involves allocation of memory to later be garbage collected.

Given that delegates are inherently themselves immutable, I wonder why they couldn't be value types? Then a line of code like the one above would incur nothing more than a simple assignment to a memory address on the stack*.

Even considering anonymous functions, it seems (to me) this would work. Consider the following simple example.

Action foo = () => { obj.Foo(); };

In this case foo does constitute a closure, yes. And in many cases, I imagine this does require an actual reference type (such as when local variables are closed over and are modified within the closure). But in some cases, it shouldn't. For instance in the above case, it seems that a type to support the closure could look like this: I take back my original point about this. The below really does need to be a reference type (or: it doesn't need to be, but if it's a struct it's just going to get boxed anyway). So, disregard the below code example. I leave it only to provide context for answers the specfically mention it.

struct CompilerGenerated
{
    Obj obj;

    public CompilerGenerated(Obj obj)
    {
        this.obj = obj;
    }

    public void CallFoo()
    {
        obj.Foo();
    }
}

// ...elsewhere...

// This would not require any long-term memory allocation
// if Action were a value type, since CompilerGenerated
// is also a value type.
Action foo = new CompilerGenerated(obj).CallFoo;

Does this question make sense? As I see it, there are two possible explanations:

  • Implementing delegates properly as value types would have required additional work/complexity, since support for things like closures that do modify values of local variables would have required compiler-generated reference types anyway.
  • There are some other reasons why, under the hood, delegates simply can't be implemented as value types.

In the end, I'm not losing any sleep over this; it's just something I've been curious about for a little while.


Update: In response to Ani's comment, I see why the CompilerGenerated type in my above example might as well be a reference type, since if a delegate is going to comprise a function pointer and an object pointer it'll need a reference type anyway (at least for anonymous functions using closures, since even if you introduced an additional generic type parameter—e.g., Action<TCaller>—this wouldn't cover types that can't be named!). However, all this does is kind of make me regret bringing the question of compiler-generated types for closures into the discussion at all! My main question is about delegates, i.e., the thing with the function pointer and the object pointer. It still seems to me that could be a value type.

In other words, even if this...

Action foo = () => { obj.Foo(); };

...requires the creation of one reference type object (to support the closure, and give the delegate something to reference), why does it require the creation of two (the closure-supporting object plus the Action delegate)?

*Yes, yes, implementation detail, I know! All I really mean is short-term memory storage.

The question boils down to this: the CLI (Common Language Infrastructure) specification says that delegates are reference types. Why is this so?

One reason is clearly visible in the .NET Framework today. In the original design, there were two kinds of delegates: normal delegates and "multicast" delegates, which could have more than one target in their invocation list. The MulticastDelegate class inherits from Delegate. Since you can't inherit from a value type, Delegate had to be a reference type.

In the end, all actual delegates ended up being multicast delegates, but at that stage in the process, it was too late to merge the two classes. See this blog post about this exact topic:

We abandoned the distinction between Delegate and MulticastDelegate towards the end of V1. At that time, it would have been a massive change to merge the two classes so we didn’t do so. You should pretend that they are merged and that only MulticastDelegate exists.

In addition, delegates currently have 4-6 fields, all pointers. 16 bytes is usually considered the upper bound where saving memory still wins out over extra copying. A 64-bit MulticastDelegate takes up 48 bytes. Given this, and the fact that they were using inheritance suggests that a class was the natural choice.

TypeDelegator equality inconsistency?

15 votes

Consider the following code:

    class MyType : TypeDelegator
    {
       public MyType(Type parent)
          : base(parent)
       {
       }
    }

    class Program
    {
       static void Main(string[] args)
       {
          Type t1 = typeof(string);
          Type t2 = new MyType(typeof(string));

          Console.WriteLine(EqualityComparer<Type>.Default.Equals(t1, t2)); // <-- false
          Console.WriteLine(EqualityComparer<Type>.Default.Equals(t2, t1)); // <-- true

          Console.WriteLine(t1.Equals(t2)); // <-- true
          Console.WriteLine(t2.Equals(t1)); // <-- true

          Console.WriteLine(Object.Equals(t1, t2)); // <-- false
          Console.WriteLine(Object.Equals(t2, t1)); // <-- true
       }
   }

How come the various versions of Equals return different results? The EqualityComparer.Default probably calls Object.Equals, so these results match, although inconsistent in themselves. And the normal instance version of Equals both return true.

This obviously creates problems when having a method return a Type that actually inherits from TypeDelegator. Imagine for example placing these types as keys in a dictionary, which by default use the EqualityComparer.Default for comparisons.

Is there any way to resolve this problem? I would like all the methods in the code above return true.

The following code returns a System.RuntimeType

Type t1 = typeof(string);

If you look at the code for Type there is:

public override bool Equals(Object o)
{
    if (o == null) 
        return false;

    return Equals(o as Type); 
}

BUT, System.RuntimeType has:

public override bool Equals(object obj) 
{
    // ComObjects are identified by the instance of the Type object and not the TypeHandle.
    return obj == (object)this;
} 

And if you view the assembly it executes a: cmp rdx, rcx, so just a direct memory compare.

You can reproduce it using the following:

bool a = t1.Equals((object)t2); // False
bool b = t1.Equals(t2); // True

So it looks like RuntimeType is overriding the Type Equals method to do a direct comparison... It would appear there is no easy way around the issue (without supplying a comparer).

EDITED TO ADD: Out of curiosity, I had a look at the .NET 1.0 & 1.1 implementation of RuntimeType. They don't have the override of Equals in RuntimeType, so the issue was introduced in .NET 2.0.

Why does the CLR re-use empty strings, but not empty arrays?

13 votes

I notice that

Console.WriteLine((object) new string(' ', 0) == (object) new string(' ', 0));

prints true, which indicates that the CLR keeps the empty string around and re-uses the same instance. (It prints false for any other number than 0.)

However, the same is not true for arrays:

Console.WriteLine(new int[0] == new int[0]);   // False

Now, if we look at the implementation of Enumerable.Empty<T>(), we find that it caches and re-uses empty arrays:

public static IEnumerable<TResult> Empty<TResult>()
{
    return EmptyEnumerable<TResult>.Instance;
}

[...]

public static IEnumerable<TElement> Instance
{
    get
    {
        if (EmptyEnumerable<TElement>.instance == null)
            EmptyEnumerable<TElement>.instance = new TElement[0];
        return EmptyEnumerable<TElement>.instance;
    }
}

So the framework team felt that keeping an empty array around for every type is worth it. The CLR could, if it wanted to, go a small step further and do this natively so it applies not only to calls to Enumerable.Empty<T>() but also new T[0]. If the optimisation in Enumerable.Empty<T>() is worth it, surely this would be even more worth it?

Why does the CLR not do this? Is there something I’m missing?

Strings may use interning, that makes them a different story (from all other kind of objects).

Arrays are essentially just objects. Re-using instances where that is not clear from the syntax or context isn't without side effects or risks.

static int[] empty = new int[0];
...
   lock (empty) { ... }

If some other code locked on another (they thought) empty int[] you might have a deadlock that is very hard to find.

Other scenarios include using arrays as the key in a Dictionary, or anywhere else their identity matters. The framework can't just go around changing the rules.

Combining arrays of strings together

12 votes

I'm looking to combine the contents of two string arrays, into a new list that has the contents of both joined together.

string[] days = { "Mon", "Tue", "Wed" };
string[] months = { "Jan", "Feb", "Mar" };

// I want the output to be a list with the contents
// "Mon Jan", "Mon Feb", "Mon Mar", "Tue Jan", "Tue Feb" etc...

How can I do it ? For when it's only two arrays, the following works and is easy enough:

List<string> CombineWords(string[] wordsOne, string[] wordsTwo)
{
    var combinedWords = new List<string>();
    foreach (var wordOne in wordsOne)
    {
        foreach (string wordTwo in wordsTwo)
        {
            combinedWords.Add(wordOne + " " + wordTwo);
        }
    }
    return combinedWords;
}

But I'd like to be able to pass varying numbers of arrays in (i.e. to have a method with the signature below) and have it still work.

List<string> CombineWords(params string[][] arraysOfWords)
{
    // what needs to go here ?
}

Or some other solution would be great. If it's possible to do this simply with Linq, even better!

Code below works for any number of arrays (and uses linq to some degree):

List<string> CombineWords(params string[][] wordsToCombine)
{
     if (wordsToCombine.Length == 0)
         return new List<string>();

     IEnumerable<string> combinedWords = wordsToCombine[0].ToList();
     for (int i = 1; i < wordsToCombine.Length; ++i)
     {
         var temp = i;
         combinedWords = (from x in combinedWords from y in wordsToCombine[temp]
                       select x + " " + y);
     }
     return combinedWords.ToList();
 }

The "Enum as immutable rich-object": is this an anti-pattern?

12 votes

I've often seen and used enums with attached attributes to do some basic things such as providing a display name or description:

public enum Movement {
    [DisplayName("Turned Right")]
    TurnedRight,
    [DisplayName("Turned Left")]
    [Description("Execute 90 degree turn to the left")]
    TurnedLeft,
    // ...
}

And have had a set of extension methods to support the attributes:

public static string GetDisplayName(this Movement movement) { ... }
public static Movement GetNextTurn(this Movement movement, ...)  { ... }

Following this pattern, additional existing or custom attributes could be applied to the fields to do other things. It is almost as if the enum can work as the simple enumerated value type it is and as a more rich immutable value object with a number of fields:

public class Movement
{
    public int Value { get; set; } // i.e. the type backing the enum
    public string DisplayName { get; set; }
    public string Description { get; set; }
    public Movement GetNextTurn(...) { ... }
    // ...
}

In this way, it can "travel" as a simple field during serialization, be quickly compared, etc. yet behavior can be "internalized" (ala OOP).

That said, I recognize this may be considered an anti-pattern. At the same time part of me considers this useful enough that the anti might be too strict.

I would consider this to be a poor pattern in C# simply because the language support for declaring and accessing attributes is so crippled; they aren't meant to be stores of very much data. It's a pain to declare attributes with non-trivial values, and it's a pain to get the value of an attribute. As soon as you want something remotely interesting associated with your enum (like a method that computes something on the enum, or an attribute that contains a non-primitive data type) you either need to refactor it to a class or put the other thing in some out-of-band place.

It's not really any more difficult to make an immutable class with some static instances holding the same information, and in my opinion, it's more idiomatic.

How do I protect OAuth keys from a user decompileing my project?

11 votes

I am writing my first application to use OAuth. This is for a desktop application, not a website or a mobile device where it would be more difficult to access the binary, so I am concerned on how to protect my application key and secret. I feel it would be trivial to look at the complied file and find the string that stores the key.

Am I over reacting or is this a genuine problem (with a known solution) for desktop apps?

This project is being coded in Java but I am also a C# developer so any solutions for .NET would be appreciated too.

EDIT: I know there is no perfect solution, I am just looking for mitigating solutions.

EDIT2: I know pretty much only solution is use some form of obfuscation. Are there any free providers for .NET and Java that will do string obfuscation?

There is no good or even half good way to protect keys embedded in a binary that untrusted users can access.

There are reasons to at least put a minimum amount of effort to protect yourself.

The minimum amount of effort won't be effective. Even the maximum amount of effort won't be effective against a skilled reverse engineer / hacker with just a few hours of spare time.

If you don't want your OAuth keys to be hacked, don't put them in code that you distribute to untrusted users. Period.

Am I over reacting or is this a genuine problem (with a known solution) for desktop apps?

It is a genuine problem with no known (effective) solution. Not in Java, not in C#, not in Perl, not in C, not in anything. Think of it as if it was a Law of Physics.


Your alternatives are:

  • Force your users to use a trusted platform that will only execute crypto signed code. (Hint: this is most likely not practical for your application because current generation PC's don't work this way. And even TPS can be hacked given the right equipment.)

  • Turn your application into a service and run it on a machine / machines that you control access to. (Hint: it sounds like OAuth 2.0 might remove this requirement.)

  • Use some authentication mechanism that doesn't require permanent secret keys to be distributed.

  • Get your users to sign a legally binding contract to not reverse engineer your code, and sue them if they violate the contract. Figuring out which of your users has hacked your keys is left to your imagination ... (Hint: this won't stop hacking, but may allow you to recover damages, if the hacker has assets.)


By the way, argument by analogy is a clever rhetorical trick, but it is not logically sound. The observation that physical locks on front doors stop people stealing your stuff (to some degree) says nothing whatsoever about the technical feasibility of safely embedding private information in executables.

And ignoring the fact that argument by analogy is unsound, this particular analogy breaks down for the following reason. Physical locks are not impenetrable. The lock on your front door "works" because someone has to stand in front of your house visible from the road fiddling with your lock for a minute or so ... or banging it with a big hammer. Someone doing that is taking the risk that he / she will be observed, and the police will be called. Bank vaults "work" because the time required to penetrate them is a number of hours, and there are other alarms, security guards, etc. And so on. By contrast, a hacker can spend minutes, hours, even days trying to break your technical protection measures with effectively zero risk of being observed / detected doing it.

Microsoft Roslyn vs. CodeDom

10 votes

From a press release yesterday on InfoWorld regarding the new Microsoft Roslyn:

The most obvious advantage of this kind of "deconstructed" compiler is that it allows the entire compile-execute process to be invoked from within .Net applications. Hejlsberg demonstrated a C# program that passed a few code snippets to the C# compiler as strings; the compiler returned the resulting IL assembly code as an object, which was then passed to the Common Language Runtime (CLR) for execution. Voilà! With Roslyn, C# gains a dynamic language's ability to generate and invoke code at runtime.

I've been able to do this since the release of .NET 4 with CSharpCodeProvider.CompileAssemblyFromSource which I in fact use in an ASP.Net project written awhile ago that does exactly that - allows a user to type in code into a textbox, choose assemblies/namespaces to reference, and then execute and display the output from that code on-the-fly for live environment code testing on Windows Azure.

Is CodeDom part of / a precurser to Roslyn? What's the special benefit of Roslyn over CodeDom?

Disclaimer: I work for Microsoft on the Roslyn team.

CodeDom is a precursor to Roslyn, but is only marginally related. Esssentially, CodeDom is a simple and (somewhat) langage agnostic way to generate code that was added in .NET 1.0 to support designers (a la WinForms). Because CodeDom was an attempt at providing a unified model that can generate code in C#, VB, and other languages, it lacks high fidelity with any of the languages that it supports (that's why you can't create a switch statement with CodeDom). CSharpCodeProvider.CompileAssemblyFromSource is simply a wrapper around executing csc.exe.

Roslyn is a completely different animal. It is a rewrite of both the C# and VB compilers from the ground up using managed code -- C# in C# and VB in VB (the versions of csc.exe and vbc.exe that ship today are written in native code). The advantage of building them in managed code is that use can reference the real compilers as libraries from .NET applications (no wrappers needed).

While building each component of the compiler pipeline, we've exposed public APIs on top:

  • Parser -> Syntax Tree API
  • Symbol Table/Metadata Import -> Symbol API
  • Binder -> Binding and Flow Analysis APIs
  • IL Emitter -> Emit API

Roslyn can be used as a sophisticated C# and VB source code generator, but that's where the similarity to CodeDom ends. The Roslyn Compiler APIs can be used to parse code, perform semantic analysis, compile and evaluate code dynamically, etc.

In addition to the compilers, the Roslyn team is also rebuilding the Visual Studio C# and VB IDE features on top of the public compiler APIs. So, the compiler APIs are rich enough to build the Visual Studio design-time tools, like IntelliSense and the Extract Method refactoring. Also, at layers above the compiler, Roslyn offers services for higher-level analysis or data transformation. For example, there are services for formatting code using the C# and VB formatting rules, or finding all references to a particular symbol within a solution.

Really, there isn't just one special benefit of Roslyn over CodeDom. Where CodeDom filled a very specific code generation need, Roslyn is tackling the entire language tooling space by providing a framework to allow you to build just about any sort of C# or VB language tool you can think of.

How to convert an expression tree to a partial SQL query?

9 votes

When EF or LINQ to SQL runs a query, it:

  1. Builds an expression tree from the code,
  2. Converts the expression tree into an SQL query,
  3. Executes the query, gets the raw results from the database and converts them to the result to be used by the application.

Looking at the stack trace, I can't figure out where the second part happens.

In general, is it possible to use an existent part of EF or (preferably) LINQ to SQL to convert an Expression object to a partial SQL query (using Transact-SQL syntax), or I have to reinvent the wheel?


Update: a comment asks to provide an example of what I'm trying to do.

Actually, the answer by Ryan Wright below illustrates perfectly what I want to achieve as a result, except the fact that my question is specifically about how can I do it by using existent mechanisms of .NET Framework actually used by EF and LINQ to SQL, instead of having to reinvent the wheel and write thousands of lines of not-so-tested code myself to do the similar thing.

Here is also an example. Again, note that there is no ORM-generated code.

private class Product
{
    [DatabaseMapping("ProductId")]
    public int Id { get; set; }

    [DatabaseMapping("Price")]
    public int PriceInCents { get; set; }
}

private string Convert(Expression expression)
{
    // Some magic calls to .NET Framework code happen here.
    // [...]
}

private void TestConvert()
{
    Expression<Func<Product, int, int, bool>> inPriceRange =
        (Product product, int from, int to) =>
            product.PriceInCents >= from && product.PriceInCents <= to;

    string actualQueryPart = this.Convert(inPriceRange);

    Assert.AreEqual("[Price] between @from and @to", actualQueryPart);
}

Where does the name Price come from in the expected query?

The name can be obtained through reflection by querying the custom DatabaseMapping attribute of Price property of Product class.

Where do names @from and @to come from in the expected query?

Those names are the actual names of the parameters of the expression.

Where does between … and come from in the expected query?

This is a possible result of a binary expression. Maybe EF or LINQ to SQL would, instead of between … and statement, stick with [Price] >= @from and [Price] <= @to instead. It's ok too, it doesn't really matter since the result is logically the same (I'm not mentioning performance).

Why there is no where in the expected query?

Because nothing indicates in the Expression that there must be a where keyword. Maybe the actual expression is just one of the expressions which would be combined later with binary operators to build a larger query to prepend with a where.

The short answer seems to be that you cannot use a part of EF or LINQ to SQL as a shortcut to translation. You need at least a subclass of ObjectContext to get at the internal protected QueryProvider property, and that means all the overhead of creating the context, including all the metadata and so on.

Assuming you are ok with that, to get a partial SQL query, for example, just the WHERE clause you're basically going to need the query provider and call IQueryProvider.CreateQuery() just as LINQ does in its implementation of Queryable.Where. To get a more complete query you can use ObjectQuery.ToTraceString().

As to where this happens, LINQ provider basics states generally that

IQueryProvider returns a reference to IQueryable with the constructed expression-tree passed by the LINQ framework, which is used for further calls. In general terms, each query block is converted to a bunch of method calls. For each method call, there are some expressions involved. While creating our provider - in the method IQueryProvider.CreateQuery - we run through the expressions and fill up a filter object, which is used in the IQueryProvider.Execute method to run a query against the data store

and that

the query can be executed in two ways, either by implementing the GetEnumerator method (defined in the IEnumerable interface) in the Query class, (which inherits from IQueryable); or it can be executed by the LINQ runtime directly

Checking EF under the debugger it's the former.

If you don't want to completely re-invent the wheel and neither EF nor LINQ to SQL are options, perhaps this series of articles would help:

Here are some sources for creating a query provider that probably involve much more heavy lifting on your part to implement what you want:

Are C# weak references in fact soft?

9 votes

The basic difference is that weak references are supposed to be claimed on each run of the GC (keep memory footprint low) while soft references ought to be kept in memory until the GC actually requires memory (they try to expand lifetime but may fail anytime, which is useful for e.g. caches especially of rather expensive objects).

To my knowledge, there is no clear statement as to how weak references influence the lifetime of an object in .NET. If they are true weak refs they should not influence it at all, but that would also render them pretty useless for their, I believe, main purpose of caching (am I wrong there?). On the other hand, if they act like soft refs, their name is a little misleading.

Personally, I imagine them to behave like soft references, but that is just an impression and not founded.

Implementation details apply, of course. I'm asking about the mentality associated with .NET's weak references - are they able to expand lifetime, or do they behave like true weak refs?

(Despite a number of related questions I could not find an answer to this specific issue yet.)

I have seen no information that indicates that they would increase the lifetime of the object they point to. And the articles I read about the algorithm the GC uses to determine reachability do not mention them in this way either. So I expect them to have no influence on the lifetime of the object.

Weak
This handle type is used to track an object, but allow it to be collected. When an object is collected, the contents of the GCHandle are zeroed. Weak references are zeroed before the finalizer runs, so even if the finalizer resurrects the object, the Weak reference is still zeroed.

WeakTrackResurrection
This handle type is similar to Weak, but the handle is not zeroed if the object is resurrected during finalization.

http://msdn.microsoft.com/en-us/library/83y4ak54.aspx


There are a few mechanism by which an object that's unreachable can survive a garbage collection.

  • The generation of the object is larger than the generation of the GC that happened. This is particularly interesting for large objects, which are allocated on the large-object-heap and are always considered Gen2 for this purpose.
  • Objects with a finalizer and all objects reachable from them survive the GC.
  • There might be a mechanism where former references from old objects can keep young objects alive, but I'm not sure about that.

Large Object Heap and String Objects coming from a queue

9 votes

I have a windows console app that is supposed to run without restarts for days and months. The app retrieves "work" from an MSMQ and process it. There are 30 threads that process a work chunk simultaneously.

Each work chunk coming from the MSMQ is approximately 200kb most of which is allocated in a single String object.

I have noticed that after processing about 3-4 thousands of these work chunks the memory consumption of the application is ridiculously high consuming 1 - 1.5 gb of memory.

I run the app through a profiler and noticed that most of this memory (maybe a gig or so) is unused in the large object heap but the structure is fragmented.

I have found that 90% of these unused (garbage collected) bytes were previously allocated String. I started suspecting then that the strings coming in from the MSMQ were allocated, used and then deallocated and are therefore the cause of the fragmentation.

I understand that things like GC.Collect(2 or GC.Max...) wont help since they gc the large object heap but don't compact it (which is the problem here). So I think that what I need is to cache these Strings and re-use them somehow but since Strings are immutable I would have to use StringBuilders.

My question is: Is there anyway to not change the underlying structure (i.e. using the MSMQ as this is something I cant change) and still avoid initializing a new String everytime to avoid fragmenting the LOH?

Thanks, Yannis

UPDATE: About how these "work" chunks are currently retrieved

Currently these are stored as WorkChunk objects in the MSMQ. Each of these objects contains a String called Contents and another String called Headers. These are actual textual data. I can change the storage structure to something else if needed and potentially the underlying storage mechanism if needed to something else than an MSMQ.

On the worker nodes side currently we do

WorkChunk chunk = _Queue.Receive();

So there is little we can cache at this stage. If we changed the structure(s) somehow then I suppose we could do a bit of progress. In any case, we will have to sort out this problem so we will do whatever is needed to avoid throwing out months of work.

UPDATE: I went on to try some of the suggestions below and noticed that this issue cannot be reproduced on my local machine (running Windows 7 x64 and 64bit app). this makes things so much more difficult - if anyone knows why then it would really help repdocung this issue locally.

Your problem appears to be due to memory allocation on the large object heap - the large object heap is not compacted and so can be a source of fragmentation. There is a good article here that goes into more detail including some debugging steps that you can follow to confirm that fragmentation of the large object heap is happening:

Large Object Heap Uncovered

You appear to have two three solutions:

  1. Alter your application to perform processing on chunks / shorter strings, where each chunk is smaller than 85,000 bytes - this avoids the allocation of large objects.
  2. Alter your application to allocate a few large chunks of memory up-front and re-use those chunks by copying new messages into the allocated memory instead. See Heap fragmentation when using byte arrays.
  3. Leave things as they are - As long as you don't experience out of memory exceptions and the application isn't interfering with other applications running on the system you should probably leave things as they are.

Its important here to understand the distinction between virtual memory and physical memory - even though the process is using a large amount of virtual memory, if the number of objects allocated is relatively low then it cam be that the physical memory use of that process is low (the un-used memory is paged to disk) meaning little impact on other processes on the system. You may also find that the "VM Hoarding" option helps - read "Large Object Heap Uncovered" article for more information.

Either change involves changing your application to perform either some or all of its processing using byte arrays and short substrings instead of a single large string - how difficult this is going to be for you will depend on what sort of processing it is that you are doing.

Is it possible in .Net to catch all unhandled exceptions from any method in a class before its passed up the call stack?

8 votes

Problem:

I would like to catch any exceptions from any method in a class so that I may record class specific data to the exception for logging before it is passed up the stack. I know that I can put a try-catch in every method of the class, but there are many methods and It seems there should be a more efficient way.

Example of what I am currently doing:

public class ClassA
{
    private int x;
    private int y;

    public void Method1()
    {
        try
        {
           //Some code
        }
        catch(Exception ex)
        {
            ex.Data.Add("x", x);
            ex.Data.Add("y", y);
            throw;
        }
    }

    public void Method2()
    {
        try
        {
            //Some code
        }
        catch (Exception ex)
        {
            ex.Data.Add("x", x);
            ex.Data.Add("y", y);
            throw;
        }
    }
}

Example of what I would like to do:

public class ClassB : IUnhandledErrorHandler
{
    private int x;
    private int y;

    public void Method1()
    {
        //Some code
    }

    public void Method2()
    {
        //Some code
    }

    void IUnhandledErrorHandler.OnError(Exception ex)
    {
        ex.Data.Add("x", x);
        ex.Data.Add("y", y);
        throw;
    }
}

public interface IUnhandledErrorHandler
{
    void OnError(Exception ex);
}

Note: This class is a service in a WCF project and implements a ServiceContract. I have tried adding an ErrorHandler to the service's ChannelDispatcher. However, when the error reaches the ErrorHandler it is already beyond the scope of the class where the error occurred, so I cannot access the class details.

Solution:

public class ClassC
{
    public ClassC()
    {
        AppDomain.CurrentDomain.FirstChanceException += OnError;
    }

    private int x;
    private int y;

    public void Method1()
    {
        //Some code
    }

    public void Method2()
    {
        //Some code
    }

    private void OnError(object sender, System.Runtime.ExceptionServices.FirstChanceExceptionEventArgs e)
    {
        e.Exception.Data["x"] = x;
        e.Exception.Data["y"] = y;
    }
}

If you run on .NET 4, you might use the FirstChanceException event from the AppDomain.

Why is .Net best practice to design custom attributes as sealed?

8 votes

I'm reading Pro C# 2010 and the .Net 4 Platform by Andrew Troelsen.

In Chapter 15 about Attributes exists a note:

Note: For security reasons, it is considered a .Net best practice to design all custom attributes as sealed.

The author doesn't explain why, can someone explain why?

CA1813: Avoid unsealed attributes: The .NET Framework class library provides methods for retrieving custom attributes. By default, these methods search the attribute inheritance hierarchy; for example Attribute.GetCustomAttribute searches for the specified attribute type, or any attribute type that extends the specified attribute type. Sealing the attribute eliminates the search through the inheritance hierarchy, and can improve performance.

Ref: http://msdn.microsoft.com/en-us/library/ms182267(v=VS.100).aspx

Attributes are simply metadata discovered at runtime. As it is quoted, if someone else derives from your custom attribute class, by default .NET will find them too, which may imply a security risk if the derived attribute class is modifying the behavior of your original attribute in a way to you never intended to.

Even though performance is the prime reason to seal attribute classes, here is a formidable article dealing with its security side: http://alabaxblog.info/?p=44

7 votes

I have seen a couple threads about this, but am not getting very straight answers in my searching. I have a web application that needs to take in doc, docx, xls, xlsx files and convert them into PDF. Right now we have a process that uses the Microsoft.Office.Interop.Word library which opens up the document, prints it to a PS file, then GPL GhostScript converts the PS file into a PDF.

This process works OKish, but overall there are several steps in there, and this was originally developed years ago when it was even harder to find a PDF print driver and interface it. In the spirit of updating, I am looking at trying to find a possible better way to handle this. The main reason is that in our application we use a web service call to perform the elevated operation of the conversion process, with newer windows server and in particular for Window 7 for development, the ability to open the file even with impersonation is causing some issues with the Interop library.

All of this I'm sure can be figured out and ironed out, but I was wondering if there is a newer and better way to go about this. I have looked into PDF995, but am not finding a great way to programmatically go in and print a file directly to a PDF. The code they provide is in C++ and I am not finding how to mimic the calls in C#.

If you're looking for a "free" solution, I think you might have the only viable option out there, but like John said, server-side interop is typically not a good idea. We've used the .NET Aspose components with a great deal of success. This is a pure managed solution with no interop or office required.

What is the best way to read an attribute from an xml string in c#

7 votes

I have the following xml as string:

<cfdi:Comprobante version="3.0"
                  xsi:schemaLocation="http://www.sat.gob.mx/cfd/3 http://www.sat.gob.mx/sitio_internet/cfd/3/cfdv3.xsd"
                  serie="A"
                  folio="6"
                  fecha="2011-07-22T13:51:42"
                  formaDePago="Pago en una sola exhibición"
                  sello="XlSJYAxauwYbI"
                  noCertificado="00001000000101242210"
                  certificado="YtEQOHw02OGx6E="
                  condicionesDePago="Paguese a mas tardar el 21/08/2011."
                  subTotal="123"
                  Moneda="MXN"
                  total="123"
                  tipoDeComprobante="ingreso">
  <cfdi:Complemento>
    <tfd:TimbreFiscalDigital FechaTimbrado="2011-07-22T13:51:47"
                             UUID="41C8A54F-4956-1BAD-F2CB-48E8343918FD"
                             noCertificadoSAT="00001000000102616613"
                             selloCFD="wrwerewe"
                             version="1.0"
                             xsi:schemaLocation="http://www.sat.gob.mx/TimbreFiscalDigital http://www.sat.gob.mx/sitio_internet/timbrefiscaldigital/TimbreFiscalDigital.xsd"/>
  </cfdi:Complemento>
</cfdi:Comprobante>

I want to read the attribute UUID inside the node tfd:TimbreFiscalDigital so I was wondering how to do this using c#, this might be silly but please understand I'm new in c#.

Note: This xml is inside a string, not in a file (our provider's webservice returns the xml as string, is not our fault)

Note2: I can use Linq, or any other library, that's not a prob

Thanks!!

Because you have namespace prefixes, you'll have to use XNamespace instances to help you reference the elements.

// We use these to establish our namespace prefixes
XNamespace cfdi = @"http://www.sat.gob.mx/cfd/3";
XNamespace tfd = @"http://www.sat.gob.mx/TimbreFiscalDigital";

var xdoc = XDocument.Parse(xml);

// Walk down the XML tree to tfd:TimbreFiscalDigital
var elt = xdoc.Element(cfdi + "Comprobante")
              .Element(cfdi + "Complemento")
              .Element(tfd + "TimbreFiscalDigital");
// Alternately
// var elt = xdoc.Descendants(tfd + "TimbreFiscalDigital")
//               .First();

var uuid = (string)elt.Attribute("UUID");

// You can convert attributes and element values to lots of built-in types
// See the Explicit Conversions for XAttribute and XElement on MSDN
var date = (DateTime)elt.Attribute("FechaTimbrado");

Further reading:

Is there a way to convert a System.IO.Stream to a Windows.Storage.Streams.IRandomAccessStream?

7 votes

In Windows 8; I would like to pass the contents of a MemoryStream to a class that accepts a parameter of type Windows.Storage.Streams.IRandomAccessStream. Is there any way to convert this MemoryStream to an IRandomAccessStream?

Found a more elegant solution:


public static class MicrosoftStreamExtensions
    {
    public static IRandomAccessStream AsRandomAccessStream(this Stream stream) {
        return new RandomStream(stream);
    }
    }
    class RandomStream : IRandomAccessStream
    {
        Stream internstream;
        public RandomStream(Stream underlyingstream)
        {
            internstream = underlyingstream;
        }
        public IInputStream GetInputStreamAt(ulong position)
        {
            //THANKS Microsoft! This is GREATLY appreciated!
            internstream.Position = (long)position;
            return internstream.AsInputStream();
        }

        public IOutputStream GetOutputStreamAt(ulong position)
        {
            internstream.Position = (long)position;
            return internstream.AsOutputStream();
        }

        public ulong Size
        {
            get
            {
                return (ulong)internstream.Length;
            }
            set
            {
                internstream.SetLength((long)value);
            }
        }
    }

reading windows font

7 votes

What is the best way to read all windows fonts into ComboBox? Basically, I'm tried do this:

equal to the Microsoft Word

I can do:

 string[] fonts = Directory.GetFiles(@"C:\windows\fonts");

and show each file into ComboBox, but this is correct? Have not an component that do this work?

Thanks in advance.

Try this:

using System.Drawing.Text;

InstalledFontCollection myFonts = new InstalledFontCollection();
foreach (FontFamily ff in myFonts.Families)
  comboBox1.Items.Add(ff.Name);
}