[Simgrid-user] Excessive Memory Use

Martin Quinson martin.quinson at loria.fr
Sat Jul 23 01:52:13 CEST 2011


On Fri, Jul 22, 2011 at 05:00:06PM -0300, Wagner Kolberg wrote:
> Hi guys. We are going to run new tests without the optimization flags
> and so forth, as suggested. But first a few questions...
> 
> On Fri, Jul 22, 2011 at 4:31 AM, Martin Quinson <martin.quinson at loria.fr> wrote:
> >
> > All these callstacks are related to the creation of the processes. The
> > second one is where the user process stack is allocated (user
> > processes are mapped onto something similar to threads but lighter,
> > each of them is allocated a C stack, [1]). The third one is strange,
> > but I don't have enough info to help here.
> >
> 
> This information was quite helpful... We have something similar to
> p2p, with an all-to-all connection. That is, our worker nodes can
> exchange data between them. To simulate this behavior, every time a
> worker receives some data request, it spawns a new simgrid process to
> send the data.
> 
> We did some tests, and observed that after these "data-sending"
> processes were done, the memory did not decreased. To confirm, we
> finished our 32 nodes simulation and kept the master node receiving
> heartbeat signals from the workers (that weren't exchanging any more
> data) during a while. The memory kept stable on its maximum point, and
> only decreased at the end of the simulation.
> 
> So finally, the question: does simgrid free these process stacks as
> soon as the process function ends (don't know if that's possible), or
> only at the end of the simulation? Can/must we destroy these stacks on
> our code? Is there a function to do that?

It seems that you trigered a bug in the recent rewrite of SIMIX. We
forget to clean the terminated processes until the very end of the
simulation. Try calling 

  SIMIX_process_empty_trash();
  
somewhere in your code to see whether it helps (just add a fake
prototype in your code if you need to, that's an internal function
which does not live in any public header). This function is in charge
of freeing the context of dead processes.

I'd say that in previous version it was called in each scheduling
round (ie main loop of the simulator), and now the only remaining call
is in SIMIX_process_killall(), itself only called from SIMIX_clean().


I think we found your bug (in our code, sorry). Now. Why the heck are
you spawning a process per communication? I fear that it's to achieve
non blocking communication. If so, you may want to switch to
MSG_task_isend(), MSG_task_irecv(), MSG_task_test() and
MSG_task_wait(). Before you ask, MSG_task_dsend() is a detached send
where you send the data in best effort and don't care of whether it
gets received or raises an error (see the chord example). And
MSG_task_isend_with_matching() allows to mimick the MPI tag mechanism.

Thanks for your time,
Mt.

-- 
Those are my principles, and if you don't like them... well, I have others.
  -- Groucho Marx



More information about the Simgrid-user mailing list