[SimGrid-user] Getting started with SimDAG

Kiril Dichev K.Dichev at qub.ac.uk
Fri May 11 13:29:58 CEST 2018


Excellent! Thanks for attaching the source — it may be of use to others as well!

Regards,
Kiril

> On 10 May 2018, at 21:21, Thomas Mcsweeney <thomas.mcsweeney at postgrad.manchester.ac.uk> wrote:
> 
> Hi Kiril,
> 
> Thank you! Your last comment was exactly where the problem was and after modifying my code it now seems to work as intended.
> 
> I have attached my source code (including the dot and xml files I am using to define the DAG and the environment) anyway if you are interested. 
> 
> Thanks again,
> Tom
> From: Kiril Dichev <K.Dichev at qub.ac.uk>
> Sent: Thursday, May 10, 2018 8:51:26 PM
> To: Thomas Mcsweeney
> Cc: simgrid-user at lists.gforge.inria.fr; Mawussi Zounon
> Subject: Re: [SimGrid-user] Getting started with SimDAG
>  
> By the way — only the SD_simulate / SD_simulate_with_update calls actually process the tasks you have scheduled. Unless you call SD_simulate_with_update in your loop after SD_schedule, your simulation will never make progress, and all your tasks will be in SD_RUNNABLE forever (which seems to be what happens). SD_schedule only schedules, but does not process tasks.
> 
> Regards,
> Kiril
> 
> 
> 
>> On 10 May 2018, at 20:23, Kiril Dichev <K.Dichev at qub.ac.uk <mailto:K.Dichev at qub.ac.uk>> wrote:
>> 
>> The snippet is missing includes and has some pseudo code lines.
>> 
>>  Can you actually attach a source file that compiles? Source code is often the most self explanatory piece of information you can possibly provide.
>> 
>> Regards,
>> Kiril
>> 
>>> On 10 May 2018, at 16:39, Thomas Mcsweeney <thomas.mcsweeney at postgrad.manchester.ac.uk <mailto:thomas.mcsweeney at postgrad.manchester.ac.uk>> wrote:
>>> 
>>> Hi Kiril,
>>> 
>>> Thanks for the reply!
>>> 
>>> I have taken your advice but still seem to be having difficulty.  I think the problem is that my tasks never seem to get beyond the state SD_RUNNABLE and so other tasks that depend on them never get scheduled.  
>>> 
>>> I have included a very simple version of my code that I am using to debug below, if anybody is willing to have a look and suggest where I might be going wrong. Is there something simple I am missing? Any advice would be greatly appreciated.
>>> 
>>> All the best,
>>> Tom
>>> 
>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> 
>>> #define number_episodes 1              
>>> // Number of times to load the DAG afresh and work through it - never gets past the first iteration so set it to 1 for convenience.
>>> 
>>> SD_task_t get_root(xbt_dynar_t dot){
>>>   // Returns the root of the DAG.
>>>   SD_task_t task;  
>>>   xbt_dynar_get_cpy(dot, 0, &task);
>>>   return task;
>>> }
>>> 
>>> xbt_dynar_t get_ready_tasks(xbt_dynar_t dag) {
>>>   // Returns an array of the tasks ready to be scheduled.
>>>   unsigned int i;
>>>   xbt_dynar_t ready_tasks = xbt_dynar_new(sizeof(SD_task_t), NULL);
>>>   SD_task_t task;  
>>>   xbt_dynar_foreach(dag, i, task)
>>>     if (SD_task_get_state(task) == SD_SCHEDULABLE) {      
>>>       xbt_dynar_push(ready_tasks, &task);
>>>     }
>>>   return ready_tasks;
>>> }
>>> 
>>> bool simulation_complete(xbt_dynar_t dag) {  
>>>   // Check if all the tasks in the DAG are done.
>>>   SD_task_t task;
>>>   int task_index;
>>>   xbt_dynar_foreach(dag, task_index, task) {
>>>     if (SD_task_get_state(task) != SD_DONE) 
>>>       return false;
>>>   }
>>>   return true;  
>>> }
>>> 
>>> int main(int argc, char **argv) {  
>>>   
>>>   SD_init(&argc, argv); // Initialize SimDAG.   
>>>   SD_create_environment("./platform.xml");   // Define the environment.
>>> 
>>>   int task_index, i;
>>>   SD_task_t task; 
>>>   
>>>   const auto total_nworkstations = SD_workstation_get_number();
>>>   const auto workstations = SD_workstation_get_list();  
>>> 
>>>   // Run for number_episodes number of episodes.
>>>   for (i = 0; i < number_episodes; ++i) {     
>>>     
>>>     // Load the DAG from a dot file.
>>>     // Very basic DAG - three nodes c1, c2 and c3 with dependencies c1->c3 and c2->c3, all sequential computations of small amounts.
>>>     auto dag = SD_dotload("./task_graph.dot");    
>>> 
>>>     // Schedule the root task on a random workstation.
>>>     auto root = get_root(dag);
>>>     ... *find a random workstation* ...
>>>     SD_task_schedulel(root, 1, random_workstation);
>>> 
>>>     // Simulate an episode.
>>>     xbt_dynar_t changed_tasks = xbt_dynar_new(sizeof(SD_task_t), NULL);
>>>     SD_simulate_with_update(-1.0, changed_tasks);   
>>>     while(!(xbt_dynar_is_empty(changed_tasks))) {         
>>> 
>>>       // Get the ready task queue.
>>>       auto ready_tasks = get_ready_tasks(dag);
>>>       if (xbt_dynar_is_empty(ready_tasks)) {        
>>>           continue;
>>>       }                            
>>> 
>>>       // Choose some task randomly from the ready tasks.
>>>       r = ...*random index*...        
>>>       xbt_dynar_get_cpy(ready_tasks, r, &task);      
>>> 
>>>       // Choose some workstation randomly.
>>>       workstation = ... *find a random workstation* ...           
>>> 
>>>       // Schedule the chosen task on the chosen workstation.    
>>>       SD_task_schedulel(task, 1, workstation);        
>>>       
>>>       // Check if all tasks in the DAG have been scheduled, and exit if that is the case.      
>>>       if (simulation_complete(dag))
>>>           break;               
>>>     }   
>>> 
>>>     // Tidy up at the end of each episode.
>>>     xbt_dynar_free_container(&changed_tasks);    
>>>   }
>>> 
>>>   // Exit SimDAG.
>>>   SD_exit();
>>> 
>>>   return 0;
>>> }
>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> 
>>> From: Kiril Dichev <K.Dichev at qub.ac.uk <mailto:K.Dichev at qub.ac.uk>>
>>> Sent: Monday, May 7, 2018 1:37:21 PM
>>> To: Thomas Mcsweeney
>>> Cc: simgrid-user at lists.gforge.inria.fr <mailto:simgrid-user at lists.gforge.inria.fr>; Mawussi Zounon
>>> Subject: Re: [SimGrid-user] Getting started with SimDAG
>>>  
>>> Hey Thomas,
>>> 
>>> I’m using SimDAG quite a lot recently, and one thing I notice is this:
>>> 
>>> auto done = true;
>>> xbt_dynar_foreach(dag, task_index, task) {
>>>     if (SD_task_get_state(task) == SD_NOT_SCHEDULED) {
>>>         done = false;
>>>         break;
>>>     }
>>> }
>>> if (done) 
>>>     break;
>>> 
>>> Your tasks go through a lot more states than SD_NOT_SCHEDULED before they are done. They go through SD_NOT_SCHEDULED -> SD_SCHEDULABLE -> SD_SCHEDULED -> SD_RUNNABLE -> SD_RUNNING -> SD_DONE (or SD_FAILED).
>>> One possible issue is that you may have any task in any of the in-between states between SD_NOT_SCHEDULED and SD_DONE. However, you may incorrectly get “done" in the above code if you have tasks lingering in the states before SD_DONE. I think you should ask if any task at all is in any other state than SD_DONE. 
>>> 
>>> I use something like this for my simulation:
>>> 
>>> bool sim_not_done(SD_task_t* kernel_tasks) {
>>>     for (int j=0; j<count; j++)
>>>         if (SD_task_get_state(kernel_tasks[j]) != SD_DONE)
>>>                 return true;
>>>     return false;
>>> }
>>> 
>>> Regards,
>>> Kiril
>>> 
>>>> On 2 May 2018, at 16:26, Thomas Mcsweeney <thomas.mcsweeney at postgrad.manchester.ac.uk <mailto:thomas.mcsweeney at postgrad.manchester.ac.uk>> wrote:
>>>> 
>>>> Hello all,
>>>> 
>>>>  <>I am a first-year PhD student at the University of Manchester, studying how we can apply techniques from reinforcement learning to design novel scheduling algorithms for applications on HPC systems, with a focus on linear algebra applications. (I have already corresponded with Frédéric and Arnaud; thank you again for your emails!).
>>>> 
>>>>  <>I have been having some difficulty with some (in principle) simple SimDAG code that I have been working on and wonder whether anyone would be able to offer any help?
>>>> 
>>>> Having only recently begun to work with SimDAG, I first worked my way through this tutorial: 
>>>> 
>>>> http://simgrid.gforge.inria.fr/tutorials/simdag-101.pdf <http://simgrid.gforge.inria.fr/tutorials/simdag-101.pdf>,
>>>> SimulatingDAGScheduling Algorithms withSimDAG - SimGrid <http://simgrid.gforge.inria.fr/tutorials/simdag-101.pdf>
>>>> simgrid.gforge.inria.fr <http://simgrid.gforge.inria.fr/>
>>>> SimulatingDAGScheduling Algorithms withSimDAG Fr ed eric Suter (CNRS, IN2P3 Computing Center, France) Martin Quinson (Nancy University, France) Arnaud Legrand (CNRS, Grenoble University, France)
>>>> and then moved on to attempt to write a scheduler of my own. However, I have encountered some a few problems.
>>>> 
>>>>  <>Basically, I want to initialize a DAG (loaded from a DOT file) on every iteration of a loop (with a small number of iterations), then launch a simulation and work my way through the DAG in the body of the loop. After each iteration, I make use of data gathered to do some reinforcement learning things. The problem is that although my code compiles without a problem, when run it hangs forever after the first iteration and I am not sure why.
>>>> 
>>>> For debugging purposes, I created a simplified version of my code, with all the extraneous reinforcement learning bits removed, to illustrate where my problems seem to be. I am also using the most basic DAG (with just three nodes, from the tutorial mentioned above) and cluster (again, the one from the tutorial) I possibly can, to simplify things further - but I am still having the same problem (this simplified code is what I have quoted from below).
>>>> 
>>>> In each iteration of the loop, after loading the DAG with SD_dotload, I add watchpoints to all the tasks in it (as in the tutorial). I then schedule the root task on a random workstation:
>>>> 
>>>> auto root = get_root(dag);
>>>> int r = r_workstations(mt);       // r = random number chosen from indices of the workstations.
>>>> auto random_workstation = workstations[r];
>>>> SD_task_schedulel(root, 1, random_workstation); 
>>>> 
>>>> Then I begin to simulate:
>>>> 
>>>> xbt_dynar_t changed_tasks = xbt_dynar_new(sizeof(SD_task_t), NULL);
>>>> SD_simulate_with_update(-1.0, changed_tasks); 
>>>> while(!(xbt_dynar_is_empty(changed_tasks)))
>>>> 
>>>> (I am not sure at all of the correct syntax for doing this so it wouldn’t surprise me if it is incorrect - the example in the tutorial doesn’t seem to work for me, so the syntax I used above I got from another example I found somewhere, although I can’t remember where.)
>>>> 
>>>> Then in the body of the while loop, I find the tasks ready to be scheduled, choose one at random, then choose a workstation at random and schedule the chosen task on the chosen workstation.
>>>> 
>>>> // Get the ready task queue.
>>>> auto ready_tasks = get_ready_tasks(dag);
>>>> if (xbt_dynar_is_empty(ready_tasks))
>>>>      continue;
>>>> auto n_ready_tasks = xbt_dynar_length(ready_tasks); 
>>>> 
>>>> // Choose some task from the ready tasks.
>>>> std::uniform_int_distribution<int> r_tasks(0, n_ready_tasks - 1);
>>>> r = r_tasks(mt);
>>>> xbt_dynar_get_cpy(ready_tasks, r, &task);
>>>> 
>>>> // Choose some workstation randomly.
>>>> auto r = r_workstations(mt);
>>>> auto workstation = workstations[r];
>>>> 
>>>> // Schedule the chosen task on the chosen workstation. 
>>>> SD_task_schedulel(task, 1, workstation);
>>>> 
>>>> Here, get_ready_tasks is a function taken from one of the examples in the tutorial:
>>>> 
>>>> xbt_dynar_t get_ready_tasks(xbt_dynar_t dag) {
>>>>     unsigned int i;
>>>>     xbt_dynar_t ready_tasks = xbt_dynar_new(sizeof(SD_task_t), NULL);
>>>>     SD_task_t task;
>>>>     xbt_dynar_foreach(dag, i, task)
>>>>          if (SD_task_get_kind(task) == SD_TASK_COMP_SEQ && SD_task_get_state(task) == SD_SCHEDULABLE)
>>>>              xbt_dynar_push(ready_tasks, &task);
>>>>     return ready_tasks;
>>>> }
>>>> 
>>>> (At the moment, I check if the tasks are of kind SD_TASK_COMP_SEQ so I can use SD_task_schedulel, but this shouldn’t be a problem since all the tasks in my DAG are of this type.)
>>>> 
>>>> I then finish each iteration of the while loop by checking if we have finished scheduling all the tasks in the DAG:
>>>> 
>>>> auto done = true;
>>>> xbt_dynar_foreach(dag, task_index, task) {
>>>>     if (SD_task_get_state(task) == SD_NOT_SCHEDULED) {
>>>>         done = false;
>>>>         break;
>>>>     }
>>>> }
>>>> if (done) 
>>>>     break;
>>>> 
>>>> (NB: is there a better way to check if we have scheduled all the tasks in the DAG?)
>>>> 
>>>> As I said, the code compiles just fine but when I run it, it never seems to get past the first iteration of the loop and I eventually have to kill it, and I haven't been able to locate precisely what the problem is.  Again, I am very much a beginner so I am sure that it is just a silly, basic error on my part, but any guidance or suggestions that you may have would be greatly appreciated.
>>>> 
>>>> (Note that I can of course also provide a fuller code and more detail if anybody wishes.)
>>>> 
>>>> All the best,
>>>> Tom 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Simgrid-user mailing list
>>>> Simgrid-user at lists.gforge.inria.fr <mailto:Simgrid-user at lists.gforge.inria.fr>
>>>> https://lists.gforge.inria.fr/mailman/listinfo/simgrid-user <https://lists.gforge.inria.fr/mailman/listinfo/simgrid-user>
>> _______________________________________________
>> Simgrid-user mailing list
>> Simgrid-user at lists.gforge.inria.fr <mailto:Simgrid-user at lists.gforge.inria.fr>
>> https://lists.gforge.inria.fr/mailman/listinfo/simgrid-user
> 
> <basic_example.cpp><platform.xml><task_graph.dot>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gforge.inria.fr/pipermail/simgrid-user/attachments/20180511/766523d1/attachment-0001.html>


More information about the Simgrid-user mailing list