Today, just before I thought I'd finish my day and go home, I got a urgent problem. One of our component kept coring, making services unavailable to clients.
Investigation started right away and we found the cause after a while. The cause is quite simple but unexpected.
The reason is we have a function specified with a exception specification. For those who are not familiar with the exception specification, it's the famous throw(A) that you can put after a function declaration or definition. It's used to tell compiler what exceptions should be allow to be thrown in the function, particularly, if nothing is in the (), that means the function is not allowed to throw anything.
Well we have one such function that with a throw() however, it DO throw an exception. So what happens? The exception is caught and treated as unexpected, then terminate() is called, then terminate() calls abort(), program aborts with a core (SIGABRT 6).
It's NOT hard to understand all of this, but we had a painful time to find out that it's actually this exception specification had caused the problem, it's in the same line as the function definition, far away from when the line where actually throws something.
It's a shame that after years of debate on this topic, we still fail on that. Here are some links:
http://www.gotw.ca/publications/mill22.htm
http://www.gotw.ca/gotw/082.htm
http://www.boost.org/development/requirements.html#Exception-specification
All those gurus suggest NOT USING EXCEPTION SPECIFICATION !
After beaten senselessly, now I remember, never write a exception specification.
Also I have to put another point on that, a function with exception specification is just a trap for other programmers who is working on it, it may not notice the very existence of the throw() and throw something in the function body, or even harder to prevent, he may just call a function that throws inside. I'm such a victim, modified a function with throw() but throw something.
Wednesday, September 5, 2012
Saturday, June 30, 2012
Fedora 17 Installed!
Another step towards a better OS, Fedora 17 has been released quite some time ago but only I have a chance to upgrade to it for my PC. So now I'm posting on my new Fedora 17!
It has a new kernel version and most recent development tools. The only little thing is the reboot panic but solutions are out there.
It has a new kernel version and most recent development tools. The only little thing is the reboot panic but solutions are out there.
Friday, April 27, 2012
A Real-life Story about Floating Point Value Comparison
I got a problem today in our software. In general, when we have two input double values like v1= 800 and v2= 750, it's working correctly while if v1=1000 and v2 = 900, it's not working as expected.
The logic is based on the comparison result of the two values. While for both two cases above we have v1 > v2, the results are not the same, why?
After tracking down to where the comparison is made, I found out that it's essentially written like this:
if (FormatToString(v1) > FormatToString(v2)) DoSomething();
I'm pretty sure, at the first look to this line, it's not doing what whoever wrote it wanted to do. The above line is actually doing a lexical comparison of the two strings converted from two double values.
Take a look at how the operator > is overloaded for string type in the basic_string.h :
/**
* @brief Test if string follows string.
* @param lhs First string.
* @param rhs Second string.
* @return True if @a lhs follows @a rhs. False otherwise.
*/
template
inline bool
operator>(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
const basic_string<_CharT, _Traits, _Alloc>& __rhs)
{ return __lhs.compare(__rhs) > 0; }
Notice that in the comments, it's stated as "Test if string follows string" and the implementation is using the basic_string::compare() method which by default will do a lexical comparison like this : it returns 0 if all the characters in the compared contents compare equal, a negative value if the first character that does not match compares to less (the character's ASCII value) in the object than in the comparing string, and a positive value in the opposite case.
So come to our example:
"800" > "750" : Ture, but
"1000" > "900" : False because '1' < '9'.
Fix it is simple, but the question is why the guy wrote this ? A simple mistake (and has been there since several years without being detected and corrected) ? I don't know exactly but tracking up in the code I found some clue that might explain.
Some lines before this buggy line, I found a similar comparison but it's only testing the equality of two double values and there, the double values are formatted into strings then compared for equality:
if (FormatToString(v1) != FormatToString(v2)) DoSomething();
This works fine and hides all the double value maths complexity behind the FormatToString method who knows exactly how many digits should be considered.
I guess later on, there comes another need to test not only for equality but also to know which value is larger then the guy took a simple copy-paste-modifiy way which results in this bug.
The the question comes back to how to compare two double values ? This is a hard question to answer, read this paper and you will know why. However, in our case, a really simple solution does exist. Guess what?
Just patch the shorter one of the two strings with leading '0's, then the lexical comparison will be an equivalent to the value comparison!
"1000" > "900" : False, but
"1000" > "0900": True !
The logic is based on the comparison result of the two values. While for both two cases above we have v1 > v2, the results are not the same, why?
After tracking down to where the comparison is made, I found out that it's essentially written like this:
if (FormatToString(v1) > FormatToString(v2)) DoSomething();
I'm pretty sure, at the first look to this line, it's not doing what whoever wrote it wanted to do. The above line is actually doing a lexical comparison of the two strings converted from two double values.
Take a look at how the operator > is overloaded for string type in the basic_string.h :
/**
* @brief Test if string follows string.
* @param lhs First string.
* @param rhs Second string.
* @return True if @a lhs follows @a rhs. False otherwise.
*/
template
inline bool
operator>(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
const basic_string<_CharT, _Traits, _Alloc>& __rhs)
{ return __lhs.compare(__rhs) > 0; }
Notice that in the comments, it's stated as "Test if string follows string" and the implementation is using the basic_string::compare() method which by default will do a lexical comparison like this : it returns 0 if all the characters in the compared contents compare equal, a negative value if the first character that does not match compares to less (the character's ASCII value) in the object than in the comparing string, and a positive value in the opposite case.
So come to our example:
"800" > "750" : Ture, but
"1000" > "900" : False because '1' < '9'.
Fix it is simple, but the question is why the guy wrote this ? A simple mistake (and has been there since several years without being detected and corrected) ? I don't know exactly but tracking up in the code I found some clue that might explain.
Some lines before this buggy line, I found a similar comparison but it's only testing the equality of two double values and there, the double values are formatted into strings then compared for equality:
if (FormatToString(v1) != FormatToString(v2)) DoSomething();
This works fine and hides all the double value maths complexity behind the FormatToString method who knows exactly how many digits should be considered.
I guess later on, there comes another need to test not only for equality but also to know which value is larger then the guy took a simple copy-paste-modifiy way which results in this bug.
The the question comes back to how to compare two double values ? This is a hard question to answer, read this paper and you will know why. However, in our case, a really simple solution does exist. Guess what?
Just patch the shorter one of the two strings with leading '0's, then the lexical comparison will be an equivalent to the value comparison!
"1000" > "900" : False, but
"1000" > "0900": True !
Subscribe to:
Posts (Atom)