## Strange "half to even" rounding in different languages

GNU bash, version 4.2.24:

$> printf "%.0f, %.0f\n" 48.5 49.5 48, 50  Ruby 1.8.7 > printf( "%.0f, %.0f\n", 48.5, 49.5 ) 48, 50  Perl 5.12.4 $> perl -e 'printf( "%.0f, %.0f\n", 48.5, 49.5 )'
48, 50


gcc 4.5.3:

> printf( "%.0f, %.0f\n", 48.5, 49.5 );
48, 50


GHC, version 7.0.4:

> printf "%.0f, %.0f\n" 48.5 49.5
49, 50


Wikipedia says that this kind of rounding is called round half to even:

This is the default rounding mode used in IEEE 754 computing functions and operators.

Why is this rounding used by default in C, Perl, Ruby and bash, but not in Haskell?

Is it some sort of tradition or standard? And if it is a standard, why it's used by those languages and not used by Haskell? What is a point of rounding half to even?

GHCi> round 48.5
48
GHCi> round 49.5
50


The only difference is that printf isn't using round — presumably because it has to be able to round to more than just whole integers. I don't think IEEE 754 specifies anything about how to implement printf-style formatting functions, just rounding, which Haskell does correctly.

It would probably be best if printf was consistent with round and other languages' implementations, but I don't think it's really a big deal.

## Searching and marking paired patterns on a line

I need to search for and mark patterns which are split somewhere on a line. Here is a shortened list of sample patterns which are placed in a separate file, e.g.:

CAT,TREE
LION,FOREST
OWL,WATERFALL


A match appears if the item from column 2 ever appears after and on the same line as the item from column 1. E.g.:

THEREISACATINTHETREE. (matches)


No match appears if the item from column 2 appears first on the line, e.g.:

THETREEHASACAT. (does not match)


Furthermore, no match appears if the item from column 1 and 2 touch, e.g.:

THECATTREEHASMANYBIRDS. (does not match)


Once any match is found, I need to mark it with \start{n} (appearing after the column 1 item) and \end{n} (appearing before the column 2 item), where n is a simple counter which increases anytime any match is found. E.g.:

THEREISACAT\start{1}INTHE\end{1}TREE.


Here is a more complex example:

THECATANDLIONLEFTTHEFORESTANDMETANDOWLINTREENEARTHEWATERFALL.


This becomes:

THECAT\start{1}ANDLION\start{2}LEFTTHE\end{2}FORESTANDMETANDOWL\start{3}INA\end{1}TREENEARTHE\end{3}WATERFALL.


Sometimes there are multiple matches in the same place:

 THECATDOESNOTLIKETALLTREES,BUTINSTEADLIKESSHORTTREES.


This becomes:

 THECAT\start{1}\start{2}DOESNOTLIKETALL\end{1}TREES,BUTINSTEADLIKESSHORT\end{2}TREES.

• There are no spaces in the file.
• Many non-Latin characters appear in the file.
• Pattern matches need only be found on the same line (e.g. "CAT" on line 1 does not ever match with a "TREE" found on line 2, as those are on different lines).

How can I find these matches and mark them in this way?

Check this out (Ruby):

#!/usr/bin/env ruby
patterns = [
['CAT', 'TREE'],
['LION', 'FOREST'],
['OWL', 'WATERFALL']
]

lines = [
'THEREISACATINTHETREE.',
'THETREEHASACAT.',
'THECATTREEHASMANYBIRDS.',
'THECATANDLIONLEFTTHEFORESTANDMETANDOWLINTREENEARTHEWATERFALL.',
'CAT...TREE...CAT...TREE'
]

lines.each do |line|
puts line
matches = Hash.new{|h,e| h[e] = [] }
match_indices = []
patterns.each do |first,second|
offset = 0
while new_offset = line.index(first,offset) do
# map second element of the pattern to minimal position it might be matched
matches[second] << new_offset + first.size + 1
offset = new_offset + 1
end
end
global_counter = 1
matches.each do |second,offsets|
offsets.each do |offset|
second_offset = offset
while new_offset = line.index(second,second_offset) do
# register the end index of the first pattern and
# the start index of the second pattern with the global match count
match_indices << [offset-1,new_offset,global_counter]
second_offset = new_offset + 1
global_counter += 1
end
end
end
indices = Hash.new{|h,e| h[e] = ""}
match_indices.each do |first,second,global_counter|
# build the insertion string for the string positions the
# start and end tags should be placed in
indices[first] << "\\start{#{global_counter}}"
indices[second] << "\\end{#{global_counter}}"
end
inserted_length = 0
indices.sort_by{|k,v| k}.each do |position,insert|
# insert the tags at their positions
line.insert(position + inserted_length,insert)
inserted_length += insert.size
end
puts line
end


Result

THEREISACATINTHETREE.
THEREISACAT\start{1}INTHE\end{1}TREE.
THETREEHASACAT.
THETREEHASACAT.
THECATTREEHASMANYBIRDS.
THECATTREEHASMANYBIRDS.
THECATANDLIONLEFTTHEFORESTANDMETANDOWLINTREENEARTHEWATERFALL.
THECAT\start{1}ANDLION\start{2}LEFTTHE\end{2}FORESTANDMETANDOWL\start{3}IN\end{1}TREENEARTHE\end{3}WATERFALL.
CAT...TREE...CAT...TREE
CAT\start{1}\start{2}...\end{1}TREE...CAT\start{3}...\end{2}\end{3}TREE


EDIT

I inserted some comments and clarified some of the variables.

## Need explanations in Linux bash builtin exec command behavior

From this link I get the following about exec bash builtin command:

If command is supplied, it replaces the shell without creating a new process.

Now I have the following bash script:


#!/bin/bash
exec ls;
echo 123;
exit 0


This executed, I got this:

cleanup.sh ex1.bash file.bash file.bash~ output.log //files from the current directory

Now, if I have this script:


#!/bin/bash
exec ls | cat
echo 123

exit 0


I get the following output:


cleanup.sh
ex1.bash
file.bash
file.bash~
output.log
123


My question is:

If when exec is invoked it replaces the shell without creating a new process, why when put | cat, the echo 123 is printed, but without it, it isn't. So, I would be happy if someone can explain what's the logic of this behavior.

Thanks.

EDIT: After @torek response, I get an even harder to explain behavior:

1.exec ls>out command creates the out file and put in it the ls's command result;

2.exec ls>out1 ls>out2 creates only the files, but do not put inside any result. If the command works as suggested, I think the command number 2 should have the same result as command number 1 (even more, I think it should not have had created the out2 file).

@Pavan Manjunath is not entirely wrong: exec can be used to redirect file descriptors. However, that's not what is happening in this case.

In this particular case, you have the exec in a pipeline. In order to execute a series of pipeline commands, the shell must initially fork, making a sub-shell. (Specifically it has to create the pipe, then fork, so that everything run "on the left" of the pipe can have its output sent to whatever is "on the right" of the pipe.)

To see that this is in fact what is happening, compare:

{ ls; echo this too; } | cat


with:

{ exec ls; echo this too; } | cat


The former runs ls without leaving the sub-shell, so that this sub-shell is therefore still around to run the echo. The latter runs ls by leaving the sub-shell, which is therefore no longer there to do the echo, and this too is not printed.

(The use of curly-braces { cmd1; cmd2; } normally suppresses the sub-shell fork action that you get with parentheses (cmd1; cmd2), but in the case of a pipe, the fork is "forced", as it were.)

Redirection of the current shell happens only if there is "nothing to run", as it were, after the word exec. Thus, e.g., exec >stdout 4<input 5>>append modifies the current shell, but exec foo >stdout 4<input 5>>append tries to exec command foo. [Note: this is not strictly accurate; see addendum.]

Interestingly, in an interactive shell, after exec foo >output fails because there is no command foo, the shell sticks around, but stdout remains redirected to file output. (You can recover with exec >/dev/tty. In a script, the failure to exec foo terminates the script.)

With a tip of the hat to @Pumbaa80, here's something even more illustrative:

#! /bin/bash
shopt -s execfail
exec ls | cat -E
echo this goes to stdout
echo this goes to stderr 1>&2


(note: cat -E is simplified down from my usual cat -vET, which is my handy go-to for "let me see non-printing characters in a recognizable way"). When this script is run, the output from ls has cat -E applied (on Linux this makes end-of-line visible as a $sign), but the output sent to stdout and stderr (on the remaining two lines) is not redirected. Change the | cat -E to > out and, after the script runs, observe the contents of file out: the final two echos are not in there. Now change the ls to foo (or some other command that will not be found) and run the script again. This time the output is: $ ./demo.sh
this goes to stderr


and the file out now has the contents produced by the first echo line.

This makes what exec "really does" as obvious as possible (but no more obvious, as Albert Einstein did not put it :-) ).

Normally, when the shell goes to execute a "simple command" (see the manual page for the precise definition, but this specifically excludes the commands in a "pipeline"), it prepares any I/O redirection operations specified with <, >, and so on by opening the files needed. Then the shell invokes fork (or some equivalent but more-efficient variant like vfork or clone depending on underlying OS, configuration, etc), and, in the child process, rearranges the open file descriptors (using dup2 calls or equivalent) to achieve the desired final arrangements: > out moves the open descriptor to fd 1—stdout—while 6> out moves the open descriptor to fd 6.

If you specify the exec keyword, though, the shell suppresses the fork step. It does all the file opening and file-descriptor-rearranging as usual, but this time, it affects any and all subsequent commands. Finally, having done all the redirections, the shell attempts to execve() (in the system-call sense) the command, if there is one. If there is no command, or if the execve() call fails and the shell is supposed to continue running (is interactive or you have set execfail), the shell soldiers on. If the execve() succeeds, the shell no longer exists, having been replaced by the new command. If execfail is unset and the shell is not interactive, the shell exits.

(There's also the added complication of the command_not_found_handle shell function: bash's exec seems to suppress running it, based on test results. The exec keyword in general makes the shell not look at its own functions, i.e., if you have a shell function f, running f as a simple command runs the shell function, as does (f) which runs it in a sub-shell, but running (exec f) skips over it.)

As for why ls>out1 ls>out2 creates two files (with or without an exec), this is simple enough: the shell opens each redirection, and then uses dup2 to move the file descriptors. If you have two ordinary > redirects, the shell opens both, moves the first one to fd 1 (stdout), then moves the second one to fd 1 (stdout again), closing the first in the process. Finally, it runs ls ls, because that's what's left after removing the >out1 >out2. As long as there is no file named ls, the ls command complains to stderr, and writes nothing to stdout.