I am wondering about the benefits of having the string-type immutable from the programmers point-of-view.
Technical benefits (on the compiler/language side) can be summarized mostly that it is easier to do optimisations if the type is immutable. Read here for a related question.
Also, in a mutable string type, either you have thread-safety already built-in (then again, optimisations are harder to do) or you have to do it yourself. You will have in any case the choice to use a mutable string type with built-in thread safety, so that is not really an advantage of immutable string-types. (Again, it will be easier to do the handling and optimisations to ensure thread-safety on the immutable type but that is not the point here.)
But what are the benefits of immutable string-types in the usage? What is the point of having some types immutable and others not? That seems very inconsistent to me.
In C++, if I want to have some string to be immutable, I am passing it as const reference to a function (
const std::string&). If I want to have a changeable copy of the original string, I am passing it as
std::string. Only if I want to have it mutable, I am passing it as reference (
std::string&). So I just have the choice about what I want to do. I can just do this with every possible type.
In Python or in Java, some types are immutable (mostly all primitive types and strings), others are not.
In pure functional languages like Haskell, everything is immutable.
Is there a good reason why it make sense to have this inconsistency? Or is it just purely for technical lower level reasons?
What is the point of having some
types immutable and others not?
Without some mutable types, you'd have to go the whole hog to pure functional programming -- a completely different paradigm than the OOP and procedural approaches which are currently most popular, and, while extremely powerful, apparently very challenging to a lot of programmers (what happens when you do need side effects in a language where nothing is mutable, and in real-world programming of course you inevitably do, is part of the challenge -- Haskell's Monads are a very elegant approach, for example, but how many programmers do you know that fully and confidently understand them and can use them as well as typical OOP constructs?-).
If you don't understand the enormous value of having multiple paradigms available (both FP one and ones crucially relying on mutable data), I recommend studying Haridi's and Van Roy's masterpiece, Concepts, Techniques, and Models of Computer Programming -- "a SICP for the 21st Century", as I once described it;-).
Most programmers, whether familiar with Haridi and Van Roy or not, will readily admit that having at least some mutable data types is important to them. Despite the sentence I've quoted above from your Q, which takes a completely different viewpoint, I believe that may also be the root of your perplexity: not "why some of each", but rather "why some immutables at all".
The "thoroughly mutable" approach was once (accidentally) obtained in a Fortran implementation. If you had, say,
I = 0
then a program snippet doing, e.g.,
would print 23, then 0 -- the number 23 had been mutated, so all references to 23 in the rest of the program would in fact refer to 0. Not a bug in the compiler, technically: Fortran had subtle rules about what your program is and is not allowed to do in passing constants vs variables to procedures that assign to their arguments, and this snippet violates those little-known, non-compiler-enforceable rules, so it's a but in the program, not in the compiler. In practice, of course, the number of bugs caused this way was unacceptably high, so typical compilers soon switched to less destructive behavior in such situations (putting constants in read-only segments to get a runtime error, if the OS supported that; or, passing a fresh copy of the constant rather than the constant itself, despite the overhead; and so forth) even though technically they were program bugs allowing the compiler to display undefined behavior quite "correctly";-).
The alternative enforced in some other languages is to add the complication of multiple ways of parameter passing -- most notably perhaps in C++, what with by-value, by-reference, by constant reference, by pointer, by constant pointer, ... and then of course you see programmers baffled by declarations such as
const foo* const bar (where the rightmost
const is basically irrelevant if
bar is an argument to some function... but crucial instead if
bar is a local variable...!-).
Actually Algol-68 probably went farther along this direction (if you can have a value and a reference, why not a reference to a reference? or reference to reference to reference? &c -- Algol 68 put no limitations on this, and the rules to define what was going on are perhaps the subtlest, hardest mix ever found in an "intended for real use" programming language). Early C (which only had by-value and by-explicit-pointer -- no
const, no references, no complications) was no doubt in part a reaction to it, as was the original Pascal. But
const soon crept in, and complications started mounting again.
Java and Python (among other languages) cut through this thicket with a powerful machete of simplicity: all argument passing, and all assignment, is "by object reference" (never reference to a variable or other reference, never semantically implicit copies, &c). Defining (at least) numbers as semantically immutable preserves programmers' sanity (as well as this precious aspect of language simplicity) by avoiding "oopses" such as that exhibited by the Fortran code above.
Treating strings as primitives just like numbers is quite consistent with the languages' intended high semantic level, because in real life we do need strings that are just as simple to use as numbers; alternatives such as defining strings as lists of characters (Haskell) or as arrays of characters (C) poses challenges to both the compiler (keeping efficient performance under such semantics) and the programmer (effectively ignoring this arbitrary structuring to enable use of strings as simple primitives, as real life programming often requires).
Python went a bit further by adding a simple immutable container (
tuple) and tying hashing to "effective immutability" (which avoids certain surprises to the programmer that are found, e.g., in Perl, with its hashes allowing mutable strings as keys) -- and why not? Once you have immutability (a precious concept that saves the programmer from having to learn about N different semantics for assignment and argument passing, with N tending to increase with time;-), you might as well get full mileage out of it;-).