[Cado-nfs-discuss] Statistic Visualization

Emmanuel Thomé Emmanuel.Thome at inria.fr
Tue Jan 3 09:58:06 CET 2017



>From a cado-nfs standpoint, we are more interested in doing the following
 1 - think about which data needs to be extracted, 
 2 - how to actually extract it,
 3 - and present it nicely.

Item 3 above is less important than the two previous. And eye-candy is
probably the least important thing.

In my experience when doing large computations, I had some interest in
doing the following things:

 - monitor, over time, the number of resubmitted WUs (because some jobs
   got killed without notice. This happens when we use idle time on
   otherwise busy HPC resources).
 - monitor, over time, the number of WUs processed, and relations
   obtained, by each "group" of machines. I've run computations spanning
   several clusters which come on and off. Get an idea of the average
   yield of some given "group", say over a week or so.
 - order nodes (or groups of nodes) by last completed WUs. This is
   potentially very useful to detect when the connection to some given
   set of nodes breaks.
 - accumulate some definition of CPU time spent -- which is a bit tricky
   when nodes are oversubscribed. Somehow the WU point of view does not
   have enough info to compute that.

It is also important that this data cat be extracted regardless of which
database back-end is used (sqlite or mysql).

Above, the notions of "groups" of machines call for some external data.
In our latest hsnfs1024 computation, I defined a few auxiliary tables to
provide this data, e.g.

mysql> select * from clusters limit 3;
| cluster   | site                | cores | vcores | ram  | njobs | jobthreads |
| cluster1  | siteA               |    16 |     32 |   64 |    10 |          4 |
| cluster2  | siteB               |     8 |      8 |   16 |     2 |          4 |
| cluster3  | siteB               |    16 |     16 |  128 |    16 |          1 |
3 rows in set (0.00 sec)
mysql> select * from clients limit 3;
| clientid                | cluster   |
| cluster1-00.loria.fr+1  | cluster1  |
| cluster1-00.loria.fr+10 | cluster1  |
| cluster1-00.loria.fr+2  | cluster1  |
3 rows in set (0.00 sec)

Then I had some expensive SQL queries which were used to walk the
complete workunits table and come up with statistics. Here counted by
total number of WUs completed.

mysql> select site,sum(wus) as score from (select cluster,count(*) as wus from (select * from workunits where status=5) x left join clients on x.assignedclient=clients.clientid group by cluster) a left join clusters using (cluster) group by site order by score;
| site                | score  |
| siteA               |  77823 |
| siteB               |  98326 |
| siteC               | 153184 |
| siteD               | 167058 |
| siteE               | 387611 |
| siteF               | 523427 |
6 rows in set (1 min 14.28 sec)

Also, queries such as "what happened recently ?" go by:

mysql> select cluster,count(*),wurowid from (select * from workunits where status=1 AND unix_timestamp(timeassigned)>=unix_timestamp(now())-14400) x left join clients on x.assignedclient=clients.clientid group by cluster ;

(here, status 1 selects ASSIGNED WUs).

Overall, this leads to a cookbook of sql queries I ran quite frequently.

A web-accessible frontend could make such queries for the user, maybe.
We should make sure the user doesn't inadvertently (or maliciously) DOS
the system by running expensive queries, though.

Maybe some design choices in the server databases layout are to be blamed
for the fact that some queries take long; and improvements could be

How this could all be presented is a matter of taste (I'm fine with text
tables, and unfancy plots for data which is to be plotted over a time

I'm somewhat undecided as to what would be best for how to do such
queries. A full-blown apache with php enabled is not really to my taste
for achieving this. A homemade server doing this heavylifting would bring
more flexibility (and avoid having one extra language), but would perhaps
not be ideal either: we should make sure we don't expose it to the
outside world. I think that embedding this functionality in the main
server as it exists now should be out of question. It should be two
separate processes.


On Sun, Jan 01, 2017 at 10:26:56PM +0000, Grüninger, Micha wrote:
> Hey guys,
> I have written some scripts to visualize progress and participation. Of course, the scripts are not perfect, but they work for me yet, and I think they are a good start. If you want to see it in action visit: . Since you shared your cool program, I think it is just fair to share my work. It consists of these three files:
>   *   data_eta.php: Reads the log, extracts and process information about the ETA. Saves it to a file that needs to be shared by a webserver.
>   *   data_list_wus.php: Extract information from a SQLite3 DB, process it and saves it to a file that needs to be shared by a webserver.
>   *   index.html: Takes to output file of the 2 previous scripts and visualize them. Needs to be shared by a webserver. This is the file the user sees.
> Some words about the example page:
> We are a group of students at FH-Aachen, and we are factoring a 200-digit number. We computed the polynomal selection with the GPU-version of Msieve, and are now using cadonfs for the second Step. Because our server crashed several times we have slumps in the ETA at the 23 and 29 December.
> I see that it is not optimal that I used PHP to code the data generation scripts, because users must install another interpreter to use them. Feel free to convert them to another script language, if that bothers you.
> I attached the files.
> Greetings
> Micha Grüninger
> Team Rocket


> _______________________________________________
> Cado-nfs-discuss mailing list
> Cado-nfs-discuss at lists.gforge.inria.fr
> http://lists.gforge.inria.fr/mailman/listinfo/cado-nfs-discuss

More information about the Cado-nfs-discuss mailing list