[Simgrid-user] Fwd: exception handling when host is down

ashish lucknow luck.ashish at gmail.com
Fri Oct 2 10:08:11 CEST 2009


The process should cleanup and then return from its main function ASAP. It
shouldn't retry
 running something after this exception.

how to do this cleanup in my code pleas give some more hint. t
Thanks..


On Fri, Oct 2, 2009 at 1:21 PM, Martin Quinson <martin.quinson at loria.fr>wrote:

> Like I said, you shouldn't catch and shallow this exception. The point
> is that the process is running on a dead host. The process should
> cleanup and then return from its main function ASAP. It shouldn't retry
> running something after this exception. There is only 2 reason why we
> don't kill the process from the library itself.
> - so that you can free all the memory you malloced in this process
> - because the autorestart mechanism, supposed to reboot your process
> when the host comes back up, is not working atm. This should be better
> in the svn, and in the upcomming 3.3.4 (that I hope to release today).
>
> Bye, Mt.
>
> Le vendredi 02 octobre 2009 à 12:52 +0530, ashish lucknow a écrit :
> >
> >
> > ---------- Forwarded message ----------
> > From: ashish lucknow <luck.ashish at gmail.com>
> > Date: Fri, Oct 2, 2009 at 11:01 AM
> > Subject: Re: [Simgrid-user] exception handling when host is down
> > To: Ghislain Charrier <ghislain.charrier at ens-lyon.fr>
> >
> >
> > still unable to execute my program I have a ckpclint() that sends
> > dummy task to ckpserver () and the Host3 that is attached to ckpserver
> > will be
> > DOWN at time 1
> >
> >
> > program is like
> > int ckpclint(int argc,char *argv[])
> > {
> > - - - --  -
> >  -- - -- -
> > while(1){
> >            for(i=0;i<number_of_chunck;i++)
> >            {
> >                 TRY{
> >
> > ret=MSG_task_put(MSG_task_create("test",0,0,NULL),server[i %
> > localservercount],PORT_22);
> >                   INFO1("ret %d",ret);
> >
> >              CATCH(e){
> >                       ret= MSG_task_cancel(); // last task cancel with
> > para..
> >               INFO0("exception in sending task");
> >                     xbt_ex_free(e);
> >              }
> >    }
> >   }
> >
> >
> > int ckpserver(int argc, char *argv[])
> > {
> >    INFO0("IN SERVER");
> >     ....................
> >    TRY{
> >     MSG_task_get(&(task), PORT_22);
> >     INFO0("after");
> >      }
> >     CATCH(e)
> >     {
> >         //xbt_ex_display(&e);
> >         printf("execpti on taskget\n");
> >        printf("%s\n", e.msg);
> >         RETHROW;
> >       }
> >
> >     xbt_assert0(res == MSG_OK, "MSG_task_get failed");
> >     INFO1("Received \"%s\"", MSG_task_get_name(task));
> >     if (!strcmp(MSG_task_get_name(task), "FINALIZE1")) {
> >       MSG_task_destroy(task);
> >       break;
> >     }
> >
> >     INFO1("Processing \"%s\"", MSG_task_get_name(task));
> >     TRY{
> >     MSG_task_execute(task);
> >    }
> >   CATCH(ee)
> >   {
> >  MSG_task_destroy(task);
> >        RETHROW;
> >    }
> > -..........
> > }
> >
> > OUT ......
> >
> >
> > [Host0:ckpclint:(1) 1.000000] [msg_test/INFO] Sent
> > [Host0:ckpclint:(1) 1.000000] [msg_test/INFO] Sending "chunck_2" to
> > "Host3"
> > [Host0:ckpclint:(1) 1.000000] [msg_test/INFO] count 51
> > [Host3:ckpserver:(4) 1.000000] [msg_test/INFO] after
> > [Host3:ckpserver:(4) 1.000000] [msg_test/INFO] Received "chunck_1"
> > [Host3:ckpserver:(4) 1.000000] [msg_test/INFO] Processing "chunck_1"
> > task execute execption
> >
> > ** SimGrid: UNCAUGHT EXCEPTION received on Host3(4): category:
> > unknown_err; value: 0
> > ** Host failed, you cannot call this function. (state=0)
> > ** Thrown by ckpserver() in this process
> > [Host3:ckpserver:(4) 1.000000] xbt/ex.c:117: [xbt_ex/CRITICAL] Host
> > failed, you cannot call this function. (state=0)
> >
> > **   In MSG_task_execute()
> > at /home/luck/temp/simgrid-3.3.3/src/msg/gos.c:47
> > **   In ckpserver() at ??:0
> > **   In smx_ctx_sysv_stop()
> > at /home/luck/temp/simgrid-3.3.3/src/simix/smx_context_sysv.c:145
> > **   In makecontext() at ??:0
> >
> > **   In double_update()
> > at /home/luck/temp/simgrid-3.3.3/src/../src/include/surf/maxmin.h:18
> >
> > **   In surf_solve()
> > at /home/luck/temp/simgrid-3.3.3/src/surf/surf.c:414
> > **   In SIMIX_solve()
> > at /home/luck/temp/simgrid-3.3.3/src/simix/smx_global.c:305
> > **   In MSG_main()
> > at /home/luck/temp/simgrid-3.3.3/src/msg/global.c:161
> > **   In test_all() at ??:0
> > Aborted
> >
> >
> >
> > How to stop this Aborting...
> >
> >
> > Ashsih Kumar
> >
> >
> >
> >
> > --
> > Ashsih Kumar
> > _______________________________________________
> > Simgrid-user mailing list
> > Simgrid-user at lists.gforge.inria.fr
> > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/simgrid-user
> -- > Strengths: What are the major reasons to accept the paper? I did
> not find strengths; I stopped reading the paper on page 12, so I may
> have missed something. -- Bastard Reviewer From Hell
>
>


-- 
Ashsih Kumar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.gforge.inria.fr/pipermail/simgrid-user/attachments/20091002/bdbab5b0/attachment.htm 


More information about the Simgrid-user mailing list