[SimGrid-user] Getting started with SimDAG

Thomas Mcsweeney thomas.mcsweeney at postgrad.manchester.ac.uk
Thu May 10 22:21:12 CEST 2018


Hi Kiril,


Thank you! Your last comment was exactly where the problem was and after modifying my code it now seems to work as intended.


I have attached my source code (including the dot and xml files I am using to define the DAG and the environment) anyway if you are interested.


Thanks again,

Tom

________________________________
From: Kiril Dichev <K.Dichev at qub.ac.uk>
Sent: Thursday, May 10, 2018 8:51:26 PM
To: Thomas Mcsweeney
Cc: simgrid-user at lists.gforge.inria.fr; Mawussi Zounon
Subject: Re: [SimGrid-user] Getting started with SimDAG

By the way — only the SD_simulate / SD_simulate_with_update calls actually process the tasks you have scheduled. Unless you call SD_simulate_with_update in your loop after SD_schedule, your simulation will never make progress, and all your tasks will be in SD_RUNNABLE forever (which seems to be what happens). SD_schedule only schedules, but does not process tasks.

Regards,
Kiril



On 10 May 2018, at 20:23, Kiril Dichev <K.Dichev at qub.ac.uk<mailto:K.Dichev at qub.ac.uk>> wrote:

The snippet is missing includes and has some pseudo code lines.

 Can you actually attach a source file that compiles? Source code is often the most self explanatory piece of information you can possibly provide.

Regards,
Kiril

On 10 May 2018, at 16:39, Thomas Mcsweeney <thomas.mcsweeney at postgrad.manchester.ac.uk<mailto:thomas.mcsweeney at postgrad.manchester.ac.uk>> wrote:

Hi Kiril,

Thanks for the reply!

I have taken your advice but still seem to be having difficulty.  I think the problem is that my tasks never seem to get beyond the state SD_RUNNABLE and so other tasks that depend on them never get scheduled.

I have included a very simple version of my code that I am using to debug below, if anybody is willing to have a look and suggest where I might be going wrong. Is there something simple I am missing? Any advice would be greatly appreciated.

All the best,
Tom

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------


#define number_episodes 1
// Number of times to load the DAG afresh and work through it - never gets past the first iteration so set it to 1 for convenience.

SD_task_t get_root(xbt_dynar_t dot){
  // Returns the root of the DAG.
  SD_task_t task;
  xbt_dynar_get_cpy(dot, 0, &task);
  return task;
}

xbt_dynar_t get_ready_tasks(xbt_dynar_t dag) {
  // Returns an array of the tasks ready to be scheduled.
  unsigned int i;
  xbt_dynar_t ready_tasks = xbt_dynar_new(sizeof(SD_task_t), NULL);
  SD_task_t task;
  xbt_dynar_foreach(dag, i, task)
    if (SD_task_get_state(task) == SD_SCHEDULABLE) {
      xbt_dynar_push(ready_tasks, &task);
    }
  return ready_tasks;
}

bool simulation_complete(xbt_dynar_t dag) {
  // Check if all the tasks in the DAG are done.
  SD_task_t task;
  int task_index;
  xbt_dynar_foreach(dag, task_index, task) {
    if (SD_task_get_state(task) != SD_DONE)
      return false;
  }
  return true;
}

int main(int argc, char **argv) {

  SD_init(&argc, argv); // Initialize SimDAG.
  SD_create_environment("./platform.xml");   // Define the environment.

  int task_index, i;
  SD_task_t task;

  const auto total_nworkstations = SD_workstation_get_number();
  const auto workstations = SD_workstation_get_list();

  // Run for number_episodes number of episodes.
  for (i = 0; i < number_episodes; ++i) {

    // Load the DAG from a dot file.
    // Very basic DAG - three nodes c1, c2 and c3 with dependencies c1->c3 and c2->c3, all sequential computations of small amounts.
    auto dag = SD_dotload("./task_graph.dot");

    // Schedule the root task on a random workstation.
    auto root = get_root(dag);
    ... *find a random workstation* ...
    SD_task_schedulel(root, 1, random_workstation);

    // Simulate an episode.
    xbt_dynar_t changed_tasks = xbt_dynar_new(sizeof(SD_task_t), NULL);
    SD_simulate_with_update(-1.0, changed_tasks);
    while(!(xbt_dynar_is_empty(changed_tasks))) {

      // Get the ready task queue.
      auto ready_tasks = get_ready_tasks(dag);
      if (xbt_dynar_is_empty(ready_tasks)) {
          continue;
      }

      // Choose some task randomly from the ready tasks.
      r = ...*random index*...
      xbt_dynar_get_cpy(ready_tasks, r, &task);

      // Choose some workstation randomly.
      workstation = ... *find a random workstation* ...

      // Schedule the chosen task on the chosen workstation.
      SD_task_schedulel(task, 1, workstation);

      // Check if all tasks in the DAG have been scheduled, and exit if that is the case.
      if (simulation_complete(dag))
          break;
    }

    // Tidy up at the end of each episode.
    xbt_dynar_free_container(&changed_tasks);
  }

  // Exit SimDAG.
  SD_exit();

  return 0;
}
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

________________________________
From: Kiril Dichev <K.Dichev at qub.ac.uk<mailto:K.Dichev at qub.ac.uk>>
Sent: Monday, May 7, 2018 1:37:21 PM
To: Thomas Mcsweeney
Cc: simgrid-user at lists.gforge.inria.fr<mailto:simgrid-user at lists.gforge.inria.fr>; Mawussi Zounon
Subject: Re: [SimGrid-user] Getting started with SimDAG

Hey Thomas,

I’m using SimDAG quite a lot recently, and one thing I notice is this:

auto done = true;
xbt_dynar_foreach(dag, task_index, task) {
    if (SD_task_get_state(task) == SD_NOT_SCHEDULED) {
        done = false;
        break;
    }
}
if (done)
    break;

Your tasks go through a lot more states than SD_NOT_SCHEDULED before they are done. They go through SD_NOT_SCHEDULED -> SD_SCHEDULABLE -> SD_SCHEDULED -> SD_RUNNABLE -> SD_RUNNING -> SD_DONE (or SD_FAILED).
One possible issue is that you may have any task in any of the in-between states between SD_NOT_SCHEDULED and SD_DONE. However, you may incorrectly get “done" in the above code if you have tasks lingering in the states before SD_DONE. I think you should ask if any task at all is in any other state than SD_DONE.

I use something like this for my simulation:

bool sim_not_done(SD_task_t* kernel_tasks) {
    for (int j=0; j<count; j++)
        if (SD_task_get_state(kernel_tasks[j]) != SD_DONE)
                return true;
    return false;
}

Regards,
Kiril

On 2 May 2018, at 16:26, Thomas Mcsweeney <thomas.mcsweeney at postgrad.manchester.ac.uk<mailto:thomas.mcsweeney at postgrad.manchester.ac.uk>> wrote:


Hello all,

I am a first-year PhD student at the University of Manchester, studying how we can apply techniques from reinforcement learning to design novel scheduling algorithms for applications on HPC systems, with a focus on linear algebra applications. (I have already corresponded with Frédéric and Arnaud; thank you again for your emails!).

I have been having some difficulty with some (in principle) simple SimDAG code that I have been working on and wonder whether anyone would be able to offer any help?

Having only recently begun to work with SimDAG, I first worked my way through this tutorial:

http://simgrid.gforge.inria.fr/tutorials/simdag-101.pdf,
SimulatingDAGScheduling Algorithms withSimDAG - SimGrid<http://simgrid.gforge.inria.fr/tutorials/simdag-101.pdf>
simgrid.gforge.inria.fr<http://simgrid.gforge.inria.fr/>
SimulatingDAGScheduling Algorithms withSimDAG Fr ed eric Suter (CNRS, IN2P3 Computing Center, France) Martin Quinson (Nancy University, France) Arnaud Legrand (CNRS, Grenoble University, France)


and then moved on to attempt to write a scheduler of my own. However, I have encountered some a few problems.

Basically, I want to initialize a DAG (loaded from a DOT file) on every iteration of a loop (with a small number of iterations), then launch a simulation and work my way through the DAG in the body of the loop. After each iteration, I make use of data gathered to do some reinforcement learning things. The problem is that although my code compiles without a problem, when run it hangs forever after the first iteration and I am not sure why.

For debugging purposes, I created a simplified version of my code, with all the extraneous reinforcement learning bits removed, to illustrate where my problems seem to be. I am also using the most basic DAG (with just three nodes, from the tutorial mentioned above) and cluster (again, the one from the tutorial) I possibly can, to simplify things further - but I am still having the same problem (this simplified code is what I have quoted from below).

In each iteration of the loop, after loading the DAG with SD_dotload, I add watchpoints to all the tasks in it (as in the tutorial). I then schedule the root task on a random workstation:

auto root = get_root(dag);
int r = r_workstations(mt);       // r = random number chosen from indices of the workstations.
auto random_workstation = workstations[r];
SD_task_schedulel(root, 1, random_workstation);

Then I begin to simulate:

xbt_dynar_t changed_tasks = xbt_dynar_new(sizeof(SD_task_t), NULL);
SD_simulate_with_update(-1.0, changed_tasks);
while(!(xbt_dynar_is_empty(changed_tasks)))

(I am not sure at all of the correct syntax for doing this so it wouldn’t surprise me if it is incorrect - the example in the tutorial doesn’t seem to work for me, so the syntax I used above I got from another example I found somewhere, although I can’t remember where.)

Then in the body of the while loop, I find the tasks ready to be scheduled, choose one at random, then choose a workstation at random and schedule the chosen task on the chosen workstation.

// Get the ready task queue.
auto ready_tasks = get_ready_tasks(dag);
if (xbt_dynar_is_empty(ready_tasks))
     continue;
auto n_ready_tasks = xbt_dynar_length(ready_tasks);

// Choose some task from the ready tasks.
std::uniform_int_distribution<int> r_tasks(0, n_ready_tasks - 1);
r = r_tasks(mt);
xbt_dynar_get_cpy(ready_tasks, r, &task);

// Choose some workstation randomly.
auto r = r_workstations(mt);
auto workstation = workstations[r];

// Schedule the chosen task on the chosen workstation.
SD_task_schedulel(task, 1, workstation);

Here, get_ready_tasks is a function taken from one of the examples in the tutorial:

xbt_dynar_t get_ready_tasks(xbt_dynar_t dag) {

    unsigned int i;
    xbt_dynar_t ready_tasks = xbt_dynar_new(sizeof(SD_task_t), NULL);
    SD_task_t task;

    xbt_dynar_foreach(dag, i, task)
         if (SD_task_get_kind(task) == SD_TASK_COMP_SEQ && SD_task_get_state(task) == SD_SCHEDULABLE)
             xbt_dynar_push(ready_tasks, &task);
    return ready_tasks;
}

(At the moment, I check if the tasks are of kind SD_TASK_COMP_SEQ so I can use SD_task_schedulel, but this shouldn’t be a problem since all the tasks in my DAG are of this type.)

I then finish each iteration of the while loop by checking if we have finished scheduling all the tasks in the DAG:

auto done = true;
xbt_dynar_foreach(dag, task_index, task) {
    if (SD_task_get_state(task) == SD_NOT_SCHEDULED) {
        done = false;
        break;
    }
}
if (done)
    break;

(NB: is there a better way to check if we have scheduled all the tasks in the DAG?)

As I said, the code compiles just fine but when I run it, it never seems to get past the first iteration of the loop and I eventually have to kill it, and I haven't been able to locate precisely what the problem is.  Again, I am very much a beginner so I am sure that it is just a silly, basic error on my part, but any guidance or suggestions that you may have would be greatly appreciated.

(Note that I can of course also provide a fuller code and more detail if anybody wishes.)

All the best,
Tom



_______________________________________________
Simgrid-user mailing list
Simgrid-user at lists.gforge.inria.fr<mailto:Simgrid-user at lists.gforge.inria.fr>
https://lists.gforge.inria.fr/mailman/listinfo/simgrid-user

_______________________________________________
Simgrid-user mailing list
Simgrid-user at lists.gforge.inria.fr<mailto:Simgrid-user at lists.gforge.inria.fr>
https://lists.gforge.inria.fr/mailman/listinfo/simgrid-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gforge.inria.fr/pipermail/simgrid-user/attachments/20180510/0d6c959d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: basic_example.cpp
Type: text/x-c++src
Size: 4210 bytes
Desc: basic_example.cpp
URL: <http://lists.gforge.inria.fr/pipermail/simgrid-user/attachments/20180510/0d6c959d/attachment-0001.cpp>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: platform.xml
Type: text/xml
Size: 652 bytes
Desc: platform.xml
URL: <http://lists.gforge.inria.fr/pipermail/simgrid-user/attachments/20180510/0d6c959d/attachment-0001.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: task_graph.dot
Type: application/msword-template
Size: 200 bytes
Desc: task_graph.dot
URL: <http://lists.gforge.inria.fr/pipermail/simgrid-user/attachments/20180510/0d6c959d/attachment-0001.bin>


More information about the Simgrid-user mailing list