[SimGrid-user] Getting started with SimDAG

Thomas Mcsweeney thomas.mcsweeney at postgrad.manchester.ac.uk
Wed May 2 17:26:51 CEST 2018


Hello all,


I am a first-year PhD student at the University of Manchester, studying how we can apply techniques from reinforcement learning to design novel scheduling algorithms for applications on HPC systems, with a focus on linear algebra applications. (I have already corresponded with Frédéric and Arnaud; thank you again for your emails!).


I have been having some difficulty with some (in principle) simple SimDAG code that I have been working on and wonder whether anyone would be able to offer any help?

Having only recently begun to work with SimDAG, I first worked my way through this tutorial:


http://simgrid.gforge.inria.fr/tutorials/simdag-101.pdf,

SimulatingDAGScheduling Algorithms withSimDAG - SimGrid<http://simgrid.gforge.inria.fr/tutorials/simdag-101.pdf>
simgrid.gforge.inria.fr
SimulatingDAGScheduling Algorithms withSimDAG Fr ed eric Suter (CNRS, IN2P3 Computing Center, France) Martin Quinson (Nancy University, France) Arnaud Legrand (CNRS, Grenoble University, France)


and then moved on to attempt to write a scheduler of my own. However, I have encountered some a few problems.


Basically, I want to initialize a DAG (loaded from a DOT file) on every iteration of a loop (with a small number of iterations), then launch a simulation and work my way through the DAG in the body of the loop. After each iteration, I make use of data gathered to do some reinforcement learning things. The problem is that although my code compiles without a problem, when run it hangs forever after the first iteration and I am not sure why.


For debugging purposes, I created a simplified version of my code, with all the extraneous reinforcement learning bits removed, to illustrate where my problems seem to be. I am also using the most basic DAG (with just three nodes, from the tutorial mentioned above) and cluster (again, the one from the tutorial) I possibly can, to simplify things further - but I am still having the same problem (this simplified code is what I have quoted from below).


In each iteration of the loop, after loading the DAG with SD_dotload, I add watchpoints to all the tasks in it (as in the tutorial). I then schedule the root task on a random workstation:


auto root = get_root(dag);

int r = r_workstations(mt);       // r = random number chosen from indices of the workstations.

auto random_workstation = workstations[r];

SD_task_schedulel(root, 1, random_workstation);


Then I begin to simulate:


xbt_dynar_t changed_tasks = xbt_dynar_new(sizeof(SD_task_t), NULL);

SD_simulate_with_update(-1.0, changed_tasks);

while(!(xbt_dynar_is_empty(changed_tasks)))


(I am not sure at all of the correct syntax for doing this so it wouldn’t surprise me if it is incorrect - the example in the tutorial doesn’t seem to work for me, so the syntax I used above I got from another example I found somewhere, although I can’t remember where.)


Then in the body of the while loop, I find the tasks ready to be scheduled, choose one at random, then choose a workstation at random and schedule the chosen task on the chosen workstation.


// Get the ready task queue.

auto ready_tasks = get_ready_tasks(dag);

if (xbt_dynar_is_empty(ready_tasks))

     continue;

auto n_ready_tasks = xbt_dynar_length(ready_tasks);


// Choose some task from the ready tasks.

std::uniform_int_distribution<int> r_tasks(0, n_ready_tasks - 1);

r = r_tasks(mt);

xbt_dynar_get_cpy(ready_tasks, r, &task);


// Choose some workstation randomly.

auto r = r_workstations(mt);

auto workstation = workstations[r];


// Schedule the chosen task on the chosen workstation.

SD_task_schedulel(task, 1, workstation);


Here, get_ready_tasks is a function taken from one of the examples in the tutorial:


xbt_dynar_t get_ready_tasks(xbt_dynar_t dag) {

    unsigned int i;

    xbt_dynar_t ready_tasks = xbt_dynar_new(sizeof(SD_task_t), NULL);

    SD_task_t task;

    xbt_dynar_foreach(dag, i, task)

         if (SD_task_get_kind(task) == SD_TASK_COMP_SEQ && SD_task_get_state(task) == SD_SCHEDULABLE)

             xbt_dynar_push(ready_tasks, &task);

    return ready_tasks;

}


(At the moment, I check if the tasks are of kind SD_TASK_COMP_SEQ so I can use SD_task_schedulel, but this shouldn’t be a problem since all the tasks in my DAG are of this type.)


I then finish each iteration of the while loop by checking if we have finished scheduling all the tasks in the DAG:


auto done = true;

xbt_dynar_foreach(dag, task_index, task) {

    if (SD_task_get_state(task) == SD_NOT_SCHEDULED) {

        done = false;

        break;

    }

}

if (done)

    break;


(NB: is there a better way to check if we have scheduled all the tasks in the DAG?)


As I said, the code compiles just fine but when I run it, it never seems to get past the first iteration of the loop and I eventually have to kill it, and I haven't been able to locate precisely what the problem is.  Again, I am very much a beginner so I am sure that it is just a silly, basic error on my part, but any guidance or suggestions that you may have would be greatly appreciated.


(Note that I can of course also provide a fuller code and more detail if anybody wishes.)


All the best,

Tom


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gforge.inria.fr/pipermail/simgrid-user/attachments/20180502/79d34a2f/attachment-0001.html>


More information about the Simgrid-user mailing list