[Pharo-project] My solution to handling errors in system-critical processes

Stéphane Ducasse stephane.ducasse at inria.fr
Fri May 20 19:06:25 CEST 2011


Lovely :)
I want more - I want some tests :)
For example that the running process is still running.

Stef

On May 19, 2011, at 12:56 PM, Igor Stasenko wrote:

> Hello,
> 
> if you remember , there was a discussion about how to solve
> efficiently a following problem:
> 
> We have a critical processes running in system, which usually running
> an infinite loop and providing some service(s), which triggered
> periodically.
> A most common case is weak finalization. We need to ensure that
> finalization works, and if finalization of some object causing an
> error,
> it should not affect the finalization of the rest of object(s) nor
> affect the finalization process (like suspending it or terminating
> it).
> Also, in another discussion about Announcements, we found that same
> requirement actually should be fulfilled by Announcement framework: a
> delivery of a single announcement to some subscriber may fail,
> however regardless of such failure, other subscribers should be able
> to receive an announcement no matter what happen.
> 
> There was a different solutions proposed, like finalizing a single
> object in separate, forked process,
> so even if it will fail with error, the rest of finalizers are not
> affected by this and finalization process continues to run normally.
> However its not very effective, because you paying the cost of forking
> process each time you need to finalize new object. There was an
> optimized solution,
> but nevertheless it doesn't changing the idea: perform a single item
> finalization in separate process.
> 
> Another (a bit lame) approach is to simply swallow any errors, and
> while it provides guarantee that your critical process won't be
> terminated due to errors,
> at the same time, it makes impossible to detect and fix error(s),
> which of course should be taken care of to prevent them from appearing
> in future.
> 
> So, my idea is to fork only if error happens.
> Add a new protocol to BlockClosure, which could allow us to handle
> errors in special manner:
> 
> [ self doSomething  ] on: Error fork: [:ex | handle error here ].
> 
> it is similar to #on:do: , except that in case of error, and error
> handler is invoked in separate forked process, while original process
> simply returns from closure activation with nil return value.
> But don't think that it is implemented as simple as:
> 
> on: error fork: handleAction
>  ^ self on: error do: [:ex | [handleAction cull: ex ] fork ].
> 
> it would be too easy and therefore useless. :)
> Because when you do it like that, a debugger window which opens an
> error, will not show you the stack contents which you wanna see.
> And then you will have to manually inspect the exception and then
> inspect an exception context and so on, in order to determine what
> caused error.
> 
> What my implementation does, is actually splits the stack of current
> process and all contexts which are above #on:fork: method is going to
> forked process,
> while original process simply returns to sender of #on:fork: with nil
> return value.
> 
> So, consider the original stack of a single process:
> 
> a. <bottom>
> a. ...
> a. ... sender of #on:fork:
> b. <#on:fork:>
> b. #on:do:
> b. ...
> b. ...
> b. ...
> b. SomeError signal.
> b. ...
> b. ...
> b. error handler.
> 
> then in case of exception, all contexts labeled by (a) are left in
> original process, and original process continues execution from sender
> of on:fork:,
> while contexts labeled by (b) is transferred to a newly forked process.
> 
> In this way, if exception is unhandled (if you put 'ex pass' there),
> the debugger will show a usual stack trace, as you normally see when
> error occurs, except that you don't see stack below #on:do: method
> (which is a context next to #on:fork:),
> but that's already enough information to determine what causing an
> error and even fix it and complete the action!
> 
> So, we can have a cake (a fault-tolerant critical services) and eat it
> too (conveniently debug errors which could happen there), and without
> extra overhead.
> 
> Please review my implementation. I tried it with could different
> exceptions and it works fine.
> However there could be some caveats. I tried it on Cog and it works fine.
> 
> If there everything ok, then we can start using this method in
> finalization and in announcements, which will improve our systems
> stability and make it much easier to deal with errors there :)
> 
> -- 
> Best regards,
> Igor Stasenko AKA sig.
> <BlockClosure-onfork.st>




More information about the Pharo-project mailing list