I've been doing a bit of work on the usetup code and came to realize that we have a bit of a widespread problem in a lot of code. There is a great deal of code that looks basically like this pseudo code:
NSTATUS DoSomething() { NSTATUS Status; ...
Status = NtXXX(); if( !NT_SUCCESS( Status ) ) return Status; Status = NtYYY(); if( !NT_SUCCESS( Status ) ) { cleanupXXX(); return Status; } Status = NtZZZ(); if( !NT_SUCCESS( Status ) ) { cleanupYYY(); cleanupXXX(); return Status; } // yay, everything worked cleanupZZZ(); cleanupYYY(); cleanupXXX(); return STATUS_SUCCESS; }
This type of error handling is absolutely horrible. Just look at all of the duplicate code! There are 3 different return points and each one duplicates all of the code from the previous, and adds one more line. Maintaining code like this becomes very hard, and it produces a larger, slower executing binary.
Now ideally all of those NtXXX functions would be throwing a SEH exception on failure instead of returning the failure code, and we could just write that code like so:
void do_something() { __try { NtXXX(); NtYYY(); NtZZZ(); // manipulate the stuff you allocated } __finally { cleanupZZZ(); cleanupYYY(); cleanupXXX(); } }
But sadly, I don't think that is ever going to happen. Instead, I propose that we use goto to eliminate the duplicate points of return, and wrap the goto in a nice macro. So far I've come up with this macro:
#define NTCALL(target, function) if( Status = function ) goto target;
And you use it like this:
NTSTATUS do_something() { NTSTATUS Status;
NTCALL( done, NtXXX(...) ); NTCALL( cleanXXX, NtYYY(...) ); NTCALL( cleanYYY, NtZZZ(...) );
// manipulate x, y, and z here
cleanZZZ: cleanupZZZ(); cleanYYY: cleanupYYY(); cleanXXX: cleanupXXX(); done: return Status; }
Now that code is more compact and maintainable than the original, and will produce a leaner binary. I'm still trying to improve on it a bit but I like this basic idea, so I'm throwing it out for comments. Last night I came up with this macro and in about 10 minutes I rewrote the file copy routine in usetup to use memory mapping instead of allocating a huge buffer, reading in the source file, then writing it out to the destination. The new code involves making several more system calls than the old code, but since the error handling has been greatly cleaned up, the new code is more compact and easier to maintain.
Phillip Susi wrote:
Now that code is more compact and maintainable than the original, and will produce a leaner binary.
No offense, but to me that code is barely readable. Using macros for such things imo is inappropriate, I rather don't want to debug such generated code.
Best Regards,
Thomas
How is it not readable? I think it is MUCH better than the original code. You don't have a half dozen lines of (mostly duplicated) cleanup code after every call, but what you're calling and what you're passing to it are still quite clear. Like I said though, it still isn't as ideal as SEH, but it's not the terrible mess the original code was. If you have any suggestions on how to make it even more clean, I'm all ears.
Thomas Weidenmueller wrote:
Phillip Susi wrote:
Now that code is more compact and maintainable than the original, and will produce a leaner binary.
No offense, but to me that code is barely readable. Using macros for such things imo is inappropriate, I rather don't want to debug such generated code.
Best Regards,
Thomas
Ros-dev mailing list Ros-dev@reactos.com http://reactos.com:8080/mailman/listinfo/ros-dev
-----Original Message----- From: ros-dev-bounces@reactos.com [mailto:ros-dev-bounces@reactos.com] On Behalf Of Phillip Susi Sent: 11. april 2005 20:53 To: ReactOS Development List Subject: [ros-dev] Error handling
But sadly, I don't think that is ever going to happen. Instead, I propose that we use goto to eliminate the duplicate points of return, and wrap the goto in a nice macro. So far I've come up with this macro:
#define NTCALL(target, function) if( Status = function ) goto target;
And you use it like this:
NTSTATUS do_something() { NTSTATUS Status;
NTCALL( done, NtXXX(...) ); NTCALL( cleanXXX, NtYYY(...) ); NTCALL( cleanYYY, NtZZZ(...) ); // manipulate x, y, and z herecleanZZZ: cleanupZZZ(); cleanYYY: cleanupYYY(); cleanXXX: cleanupXXX(); done: return Status; }
Now that code is more compact and maintainable than the original, and will produce a leaner binary. I'm still trying to improve on it a bit but I like this basic idea, so I'm throwing it out for comments. Last night I came up with this macro and in about 10 minutes I rewrote the file copy routine in usetup to use memory mapping instead of allocating a huge buffer, reading in the source file, then writing it out to the destination. The new code involves making several more system calls than the old code, but since the error handling has been greatly cleaned up, the new code is more compact and easier to maintain.
The control flow isn't obvious since NTCALL hides much of the logic. You should rather split the function into smaller functions with sensible names. This will, in many cases, reduce the duplicated error handling code.
Casper
Casper Hornstrup wrote:
The control flow isn't obvious since NTCALL hides much of the logic. You should rather split the function into smaller functions with sensible names. This will, in many cases, reduce the duplicated error handling code.
Indeed. This was my immediate thought too. Maybe something like this, or maybe not, but you get the idea.
DoNtXXX() { NSTATUS Status;
Status = NtXXX();
cleanupXXX();
return !NT_SUCCESS(Status); }
DoNtZZZ() { NSTATUS Status;
Status = NtZZZ();
cleanupZZZ();
return !NT_SUCCESS(Status); }
DoNtYYY() { NSTATUS Status;
Status = NtYYY();
cleanupYYY();
return !NT_SUCCESS(Status); }
NTSTATUS DoSomething() { return DoNtXXX() || DoNtYYY() || DoNtZZZ(); }
That doesn't really work. You don't want to do XXX and then clean it up right away, you want to allocate XXX, then allocate YYY, then allocate ZZZ, then do something with them, then clean up and return ( or maybe not clean up on success ), but if there is an error allocating any of the 3, then you need to clean up the ones that succeeded already.
Jakob Eriksson wrote:
Indeed. This was my immediate thought too. Maybe something like this, or maybe not, but you get the idea.
DoNtXXX() { NSTATUS Status;
Status = NtXXX();
cleanupXXX();
return !NT_SUCCESS(Status); }
DoNtZZZ() { NSTATUS Status;
Status = NtZZZ();
cleanupZZZ();
return !NT_SUCCESS(Status); }
DoNtYYY() { NSTATUS Status;
Status = NtYYY();
cleanupYYY();
return !NT_SUCCESS(Status); }
NTSTATUS DoSomething() { return DoNtXXX() || DoNtYYY() || DoNtZZZ(); }
Ros-dev mailing list Ros-dev@reactos.com http://reactos.com:8080/mailman/listinfo/ros-dev
Hello Philp!
Great to see you're back!
I've been doing a bit of work on the usetup code and came to realize that we have a bit of a widespread problem in a lot of code. There is a great deal of code that looks basically like this pseudo code:
NSTATUS DoSomething() { NSTATUS Status; ...
Status = NtXXX(); if( !NT_SUCCESS( Status ) ) return Status; Status = NtYYY(); if( !NT_SUCCESS( Status ) ) { cleanupXXX(); return Status; } Status = NtZZZ(); if( !NT_SUCCESS( Status ) ) { cleanupYYY(); cleanupXXX(); return Status; } // yay, everything worked cleanupZZZ(); cleanupYYY(); cleanupXXX(); return STATUS_SUCCESS; }
This type of error handling is absolutely horrible. Just look at all of the duplicate code! There are 3 different return points and each one duplicates all of the code from the previous, and adds one more line. Maintaining code like this becomes very hard, and it produces a larger, slower executing binary.
At the time I wrote this code it was more important to make it work reliably than to make it look good. File I/O has been pretty instable at this time because of several bugs in the io manager and the cache manager. That's why I wrote it this way; it worked in most cases! ;-)
And you use it like this:
NTSTATUS do_something() { NTSTATUS Status;
NTCALL( done, NtXXX(...) ); NTCALL( cleanXXX, NtYYY(...) ); NTCALL( cleanYYY, NtZZZ(...) );
// manipulate x, y, and z here
cleanZZZ: cleanupZZZ(); cleanYYY: cleanupYYY(); cleanXXX: cleanupXXX(); done: return Status; }
Now that code is more compact and maintainable than the original, and will produce a leaner binary. I'm still trying to improve on it a bit but I like this basic idea, so I'm throwing it out for comments. Last night I came up with this macro and in about 10 minutes I rewrote the file copy routine in usetup to use memory mapping instead of allocating a huge buffer, reading in the source file, then writing it out to the destination. The new code involves making several more system calls than the old code, but since the error handling has been greatly cleaned up, the new code is more compact and easier to maintain.
I still prefer using the bare gotos because in my opinion it is a lot easier to read. But otherwise I agree with your improvements.
NTSTATUS DoSomething() { NSTATUS Status; ...
Status = NtXXX(); if (!NT_SUCCESS(Status)) goto done;
Status = NtYYY(); if (!NT_SUCCESS(Status)) goto cleanXXX;
Status = NtZZZ(); if (!NT_SUCCESS(Status)) goto cleanYYY;
cleanZZZ: cleanupZZZ(); cleanYYY: cleanupYYY(); cleanXXX: cleanupXXX(); done: return Status; }
Regards, Eric
Eric Kohl wrote:
Hello Philp!
Great to see you're back!
Thanks! It is good to be back.
At the time I wrote this code it was more important to make it work reliably than to make it look good. File I/O has been pretty instable at this time because of several bugs in the io manager and the cache manager. That's why I wrote it this way; it worked in most cases! ;-)
Right, and handling errors that way is a VERY common practice in C, but still is bad and should be improved somehow, which is why I brought this up. Whatever better practice we decide on, we need something to replace this methodology with in the future.
I still prefer using the bare gotos because in my opinion it is a lot easier to read. But otherwise I agree with your improvements.
Yea, I liked the bare gotos at first too, but got to thinking that there is still a good bit of duplicate code in there that could be cleaned up. Plus using the gotos still gave me an icky feeling. Specifically every time you assign the return to Status and call NT_SUCCESS to decide if you should goto, so I came up with the macro to cut down on all the retyping of duplicate code. I'm still trying to think of a way to do it that gives the best of both worlds; looks good, yet avoids duplicate code. Wrack your brain and see if you can come up with something.
Steven Edwards wrote:
We have PSEH which can be used.
Thanks Steven
PSEH?
Thomas Larson wrote:
i think thats the most simple and easy to read but in the end is´ent it not a littel waste of time discusse old functions their need rewrite anyway..?
How is it a waste of time? What good is rewriting the functions if they aren't written better the next time around? If that is going to happen then we need to discuss how to do it better.
Hi Phillip, welcome back!
Does this baroque solution based on the "Think positive!" motto satisfy you? :)
Emanuele
#include <all_inclusive.h>
#define CLEANUP_DEFINE DWORD __FUNCTION__##__cleanup_mask = 0 #define CLEANUP_SET(f) __FUNCTION__##__cleanup_mask != (m) #define CLEANUP_DO(m,f) if (__FUNCTION__##__cleanup_mask & (m)) (f) #define CLEANUP_BIT(b) (1<<(b))
NSTATUS DoSomething() { NSTATUS Status = STATUS_SUCCESS; CLEANUP_DEFINE; #define MASK_XXX CLEANUP_BIT(1) #define MASK_YYY CLEANUP_BIT(2) #define MASK_ZZZ CLEANUP_BIT(3) ...
Status = NtXXX(); if (NT_SUCCESS(Status)) { CLEANUP_SET(MASK_XXX); Status = NtYYY(); if (NT_SUCCESS(Status)) { CLEANUP_SET(MASK_YYY); Status = NtZZZ(); if (NT_SUCCESS(Status)) { CLEANUP_SET(MASK_ZZZ); // yay, everything worked } } } CLEANUP_DO(MASK_ZZZ,cleanupZZZ); CLEANUP_DO(MASK_YYY,cleanupYYY); CLEANUP_DO(MASK_XXX,cleanupXXX); return Status; } /* EOF */
ea wrote:
Does this baroque solution based on the "Think positive!" motto satisfy you? :)
Emanuele
#include <all_inclusive.h>
#define CLEANUP_DEFINE DWORD __FUNCTION__##__cleanup_mask = 0 #define CLEANUP_SET(f) __FUNCTION__##__cleanup_mask != (m) #define CLEANUP_DO(m,f) if (__FUNCTION__##__cleanup_mask & (m)) (f) #define CLEANUP_BIT(b) (1<<(b))
<snip>
I don't think that works either. The main reason is what is cleanupXXX? An external function? It can't be. Also it is not very readable at all, and you end up with a half dozen macros you have to call for each NtXXX() you call, and you get another indent level for each NtXXX() so you very quickly will be indented off the screen.
Steven Edwards wrote:
We have a macro based SEH implementation we are using in ReactOS called PSEH for Portable SEH. Hyperion wrote it and it seems to work with almost ever compiler thrown at it. It is not syntax compatible with MS-SEH but some people have been working to develop p(retty)pseh which should be. Look in reactos/lib/pseh and grep the source tree. Its used in ntoskrnl, win32k, afd?, and others.
Thanks Steven
The problem with that is that none of the NtXXX() functions use SEH or PSEH, so you would have to write a wrapper library around them to convert the return codes into exceptions before you could take advantage of (P)SEH in the calling code. That seems like a lot of work. I was toying with the idea though of implementing a set of parallel apis next to the Zw and Nt functions. They could be called EhXXX for instance and would be called via a new EH aware system call interface. The EH system call interface could wrap the underlying non EH functions and make them EH enabled, or directly call any kernel functions that are EH aware. The normal system call interface would directly call non EH system calls, and wrap and translate the EH system calls. It's something neat to think about, but I don't think it will ever happen. I just don't see why MS invented SEH then didn't bother to use it. Would be real nice if they had just made all the kernel APIs SEH enabled in the first place.
Phillip Susi wrote:
I've been doing a bit of work on the usetup code and came to realize that we have a bit of a widespread problem in a lot of code. There is a great deal of code that looks basically like this pseudo code:
To everyone suggesting horrendous macro ideas (Filip, wake up!!! You haven't commented on this yet!), may I recommend the following read:
http://blogs.msdn.com/oldnewthing/archive/2005/01/06/347666.aspx
As well as Larry Osterman's "Hiding Complexity" and "Every programmer should know what assembly their code generates".
Best regards, Alex Ionescu
Ok, you have convinced me. Macros are bad. Let's just stick with the gotos, i.e.:
Status = NtXXX(); if( !NT_SUCCESS(Status) ) goto somecleanup;
Now if anyone can help me to customize emacs to auto paste the boilerplate if( !NT_SUCCSS line, maybe even with a DPRINT1 built into it, I'd appreciate it ;)
Anyhow, from now on I think everyone should try to use the goto method instead of duplicate cleanup blocks all over the place.
Alex Ionescu wrote:
Phillip Susi wrote:
I've been doing a bit of work on the usetup code and came to realize that we have a bit of a widespread problem in a lot of code. There is a great deal of code that looks basically like this pseudo code:
To everyone suggesting horrendous macro ideas (Filip, wake up!!! You haven't commented on this yet!), may I recommend the following read:
http://blogs.msdn.com/oldnewthing/archive/2005/01/06/347666.aspx
As well as Larry Osterman's "Hiding Complexity" and "Every programmer should know what assembly their code generates".
Best regards, Alex Ionescu
In general gotos are *REALLY* bad, there must be some better way than gotos to get this working, maybe it would be a good idea to put some work into PSEH so that it is suitable for this purpose and save yourself the trouble of doing it later.
Phillip Susi wrote:
Ok, you have convinced me. Macros are bad. Let's just stick with the gotos, i.e.:
Status = NtXXX(); if( !NT_SUCCESS(Status) ) goto somecleanup;
Now if anyone can help me to customize emacs to auto paste the boilerplate if( !NT_SUCCSS line, maybe even with a DPRINT1 built into it, I'd appreciate it ;)
Anyhow, from now on I think everyone should try to use the goto method instead of duplicate cleanup blocks all over the place.
Alex Ionescu wrote:
Phillip Susi wrote:
I've been doing a bit of work on the usetup code and came to realize that we have a bit of a widespread problem in a lot of code. There is a great deal of code that looks basically like this pseudo code:
To everyone suggesting horrendous macro ideas (Filip, wake up!!! You haven't commented on this yet!), may I recommend the following read:
http://blogs.msdn.com/oldnewthing/archive/2005/01/06/347666.aspx
As well as Larry Osterman's "Hiding Complexity" and "Every programmer should know what assembly their code generates".
Best regards, Alex Ionescu
Ros-dev mailing list Ros-dev@reactos.com http://reactos.com:8080/mailman/listinfo/ros-dev
Nate DeSimone wrote:
In general gotos are *REALLY* bad
True, gotos have been 'out-of-style' for the last 10-15 years, and indeed, overly-liberal use of gotos is *REALLY* bad; however, I would claim that blind avoidance of goto's is just as bad.
I've seen people use all kinds of things to avoid typing 'goto' because they are '*REALLY* bad', including:
do { status = NtXXX(); if ( ! NT_SUCCESS(Status) ) break;
...
} while(0)
Or similar, but equivilant for or while loops; writing break when you mean goto doesn't make the code any better.
Similarly, the code could be written as:
try { status = NtXXX(); if ( ! NT_SUCCESS(Status) ) ExRaiseStatus(Status); } except(...) { // Do cleanup }
But this is just an expensive way of writing goto.
Sometimes, if you mean goto, it is just better to write goto.
On alternative to goto's for the the code in question might be:
status = NtXXX(); // Work A if ( NT_SUCCESS(status) ) { ... do some work B... ... if work fails, set status ... }
if ( NT_SUCCESS(status) ) { ... do some more work (C) that depends on A and B ... if work fails, make sure status is set ... }
if ( NT_SUCCESS(status) ) { ... do some more work (D) that depends on A, B and C ... if work fails, make sure status is set ... }
// If A was done, cleanup A // If B was done, cleanup B
if ( ! NT_SUCCESS(status) ) { ... Cleanup C and D in the failure case return status; }
// In success case, C and D are outputs in some way, so don't // clean them up return STATUS_SUCCESS.
Note that indentation from error checking never exceeds one level.
You do have to be able to figure out what work was done and what work wasn't, but the same would be true if all the NtXXX functions threw exceptions instead, unless you caught the exception from each function seperately, in which case, you'd get some pretty horrendous code. (Exceptions are not a panacea.)
Probably this isn't much, if any, better than the goto solution-- but it doesn't have the for letter word in it.
It might be possible to design a clean system using C++ object encapsulation of work, so that work is undone automatically during stack unwind... but that doesn't really apply here.
Thanks,
Joseph
I can just ACK your post.
Not using goto is like a no-no. It was for the last 15y and will be another. (?) The schools teach it. But one should allways think about the "Trueth" wether it is ture or just "dontcare".
Using Exceptions and the resulting stack unwind etc. is not nice concerning performance. However it leads to more secure and elegant code. I for my part would rather insert some spare gotos. AND use auto variables.
Nate DeSimone wrote:
In general gotos are *REALLY* bad, there must be some better way than gotos to get this working, maybe it would be a good idea to put some work into PSEH so that it is suitable for this purpose and save yourself the trouble of doing it later.
There is nothing inherently wrong with goto, it's only bad when it makes the code less readable. When used carefully, like in this case, it creates code that is _more_ readable, not less, and that's clearly a good thing. Also as I said before, the problem is not a lack of SEH, as I believe that is now supported by gcc so we could use it, but rather the problem is that virtually none of the NTAPI uses SEH. In order for SEH to be beneficial, it's use has to be widespread.
Robert Köpferl wrote:
I can just ACK your post.
Not using goto is like a no-no. It was for the last 15y and will be another. (?) The schools teach it. But one should allways think about the "Trueth" wether it is ture or just "dontcare".
Using Exceptions and the resulting stack unwind etc. is not nice concerning performance. However it leads to more secure and elegant code. I for my part would rather insert some spare gotos. AND use auto variables. _______________________________________________
Just because schools teach it does not make it true. Half of what they teach in school is wrong. As I said above, there is nothing inherently bad about goto, it's just that it can easily be abused to create spaghetti code, and new programmers often do just that. When used carefully goto can be the lesser of two evils, as in this case. I think it is clear that the code using goto is much cleaner and more maintainable than the original code. If using goto improved the code, then it has got to be a good thing.
C++ exception handling is the ideal way to handle this kind of thing, but we're mostly using C and all the APIs are in C, so it is with C that we must make our peace. Also blanket statements like C++ exception handling adds too much overhead are very often quite wrong. Often C++ exceptions actually IMPROVE performance because they eliminate multiple return value checks, duplicate code, and so on in favor of paying a very small fee when entering and exiting non trivial functions. In the case where an exception is not thrown, which is to say, the vast majority of the time, the small overhead for pushing the exception handler onto the exception stack is more than made up for by the overhead saved from not having to constantly check a dozen different return parameters. In fact if you have say, a loop, which calls 3 functions each iteration and iterates 100 times, you are saving a ton of overhead by trading 300 return value checks for one exception handler push/pop.
Phillip Susi wrote:
There is nothing inherently wrong with goto, it's only bad when it makes the code less readable. When used carefully, like in this case, it creates code that is _more_ readable, not less, and that's clearly a good thing. Also as I said before, the problem is not a lack of SEH, as I believe that is now supported by gcc so we could use it, but rather the problem is that virtually none of the NTAPI uses SEH. In order for SEH to be beneficial, it's use has to be widespread.
Ok, yes I agree that lacking SEH, gotos definately are more elegant solution than a bunch of nested if statements. However thinking forward a little bit, I recall that the use of gotos in your code can introduce security vunerabilities and that you have to be very careful whenever you use a goto, since we want ReactOS to be a mainstream OS we better do a better job with security than MS. So may I recommend that the final error handling boilerplate be double checked so that you can't get it to jump to anouther address in memory than the intended address.
Nate DeSimone wrote:
Phillip Susi wrote:
There is nothing inherently wrong with goto, it's only bad when it makes the code less readable. When used carefully, like in this case, it creates code that is _more_ readable, not less, and that's clearly a good thing. Also as I said before, the problem is not a lack of SEH, as I believe that is now supported by gcc so we could use it, but rather the problem is that virtually none of the NTAPI uses SEH. In order for SEH to be beneficial, it's use has to be widespread.
Ok, yes I agree that lacking SEH, gotos definately are more elegant solution than a bunch of nested if statements. However thinking forward a little bit, I recall that the use of gotos in your code can introduce security vunerabilities and that you have to be very careful whenever you use a goto, since we want ReactOS to be a mainstream OS we better do a better job with security than MS. So may I recommend that the final error handling boilerplate be double checked so that you can't get it to jump to anouther address in memory than the intended address.
Hmmm... I'm not sure I follow how gotos lead to security vulnerabilities?
Remember, gotos are just a jmp opcode... if goto's lead to security vulnerabilities, so do if statements. The code generated by the compiler is littered with hidden 'goto' statements. If an attacker can overwrite the gotos that are spelled 'goto', he can overwrite the ones spelled 'while', 'if', 'break', 'do', 'for', function calls, function returns, etc., etc., etc.
I don't believe the use of 'goto' in and of itself poses any increased security risk.
I would claim that unexercised code is one source of bugs (including security vulnerabilities.)
Therefor if statements _do_ lead to security vulnerabilities.
In particular, error handling (which is classicly unexercised code) leads to security vulnerabilities. But not handling errors leads to even more security vulnerabilities, so what is a guy to do?
Thanks,
Joseph
Joseph Galbraith wrote:
In particular, error handling (which is classicly unexercised code) leads to security vulnerabilities. But not handling errors leads to even more security vulnerabilities, so what is a guy to do?
Unit-test a lot. (Casper?)
Also, how do the LISP coders do it? (Just curious. I bet they don't usually create a nested "if".)
//Jakob
I'm not sure what relation between goto and security vulnerabilities you are thinking of. As for the labels you are going to, yes, when you write the code you will have to make sure you are jumping to the right place to do the proper cleanup that is required at that point in the code. Figuring out which label you need to jump to though, is MUCH easier than figuring out all of the ( possibly dozens ) of lines of cleanup code you need to duplicate when you add a new operation and error check/recovery.
Nate DeSimone wrote:
Ok, yes I agree that lacking SEH, gotos definately are more elegant solution than a bunch of nested if statements. However thinking forward a little bit, I recall that the use of gotos in your code can introduce security vunerabilities and that you have to be very careful whenever you use a goto, since we want ReactOS to be a mainstream OS we better do a better job with security than MS. So may I recommend that the final error handling boilerplate be double checked so that you can't get it to jump to anouther address in memory than the intended address.