[Cado-nfs-discuss] Monitoring Tool/Website

Seth Troisi braintwo at gmail.com
Sun Sep 8 01:13:43 CEST 2019


I added simple mysql support and have removed all but one constant (which
I'm working on getting rid of).
I added trivial banner support and made a number of things configurable
(like host name joining)

I'm sure there are bugs with the mysql databases format and I'm wondering
if you have suggestions on how to test this
or if you would be willing to test on your end or I can also try to get
volunteers from mersenneforum.

Other comments inline


On Mon, Aug 5, 2019 at 5:58 AM Emmanuel Thomé <Emmanuel.Thome at inria.fr>
wrote:

> Hi,
>
> Looks promising.
>
> I think this can end up in the distribution, provided that you agree with
> it being licensed LGPL as the rest of cado-nfs.
>
Happy to do this.

>
> There are quite a few things that would need to be tweaked.
>
>  - a README would be great.
>
Done, https://github.com/sethtroisi/factoring-ui/blob/master/README.md

>  - some hard coded stuff in log_processor.py ought to go. Also, this file
>    should be executable, I guess.
>
Done

>  - the dependencies should be clearly spelled out, and checked
>    (python3-flask, python3-matplotlib, python3-seaborn, maybe among others
>    -- these are the ones I had to install to perform a quick check).
>
Done,
https://github.com/sethtroisi/factoring-ui/blob/master/requirements.txt

>  - How one should invoke log_processor.py is not entirely clear. It seems
>    it wants stuff in the current working directory, while app.py expects
>    the json .status file in its source dir (I had to do a symlink).
>
See https://github.com/sethtroisi/factoring-ui#log-processing

>  - Some uses of random.sample should be guarded, e.g.
>     -    random_coll = [l for i,l in
> sorted(random.sample(list(enumerate(eta_lines)), count - 2))]
>     +    n=min(len(eta_lines), count-2)
>     +    random_coll = [l for i,l in
> sorted(random.sample(list(enumerate(eta_lines)), n))]
>    Or:
>     -total_last_24 = sorted(random.sample(total_last_24, 5000)) +
> [rels_last_24]
>     +total_last_24 = sorted(random.sample(total_last_24,
> min(5000,len(total_last_24)))) + [rels_last_24]
>
Done

>  - My simple check aborted on the assert host_stats.keys() ==
>    client_work.keys() ; I had an <EMPTY> in client_work.keys(), not in
>    host_stats.keys(). Maybe you're over-relying on the fact that client
>    names contain a dot ? (Mine were "localhost" and "localhost+2").
>
Should be fixed now

> I think it's mandatory that the means to connect to the database is
> provided to the log_processor script. We really want it to work also in
> the case where a mysql/mariadb database is used instead of sqlite3. The
> way we do it currently is to rely on an existing server on the current
> machine, and host the database there. (IDK if we can run the
> mysql/mariadb server in userland, after all. Maybe we should do that in
> the near future.)


>
That means that an URI syntax should probably be passed to log_processor,
> e.g. in the form:
>     db:mysql://USERNAME:PASSWORD@localhost:3306/c200
>  or db:sqlite3:///tmp/foo.db
>
> This is the least tested part.
I use urllib.parse.urlparse to pull out username, password, host, port,
database for mysql databases
I assumer everything after "db:sqlite3:" is a path.
Happy to expand this with configs from real users.

> (note that this also proves the wdir information as a by-product). It
> is acceptable if the path to the db file works as a shortcut to the long
> db:sqlite3:... uri.
>
>
> As a general rule, the less you rely on log output parsing, the better.
> Output strings tends to be a bit fragile. It's more robust if the info
> you're looking for is present in the database. Alas, I agree that it is
> not _always_ there. (some can be found by a join with the files table).
>
I'm only using a few things
* ETA lines (I added a TODO to handle these not being present)
* Stats lines ("Lattice Sieving: Newly arrived stats") maybe these could be
moved into the db at some later date

> Missing features at this time:
> > * Everything related to polyselect. I missed this stage of the 2,2330L
> > project but my understanding is it was done with other tools
> (msieve-gpu),
> > it's possible clients would still show WU and it wouldn't be hard to add
> > some other counting (for the equivalent of relations)
>
> yes, that would be useful, but not I don't think that would be my #1 next
> feature.
>
> > * List of outstanding WUs (these are in the db so it would be easy to
> add)
> > * sorting in the table (pretty simple with bootstrap but not done)
> > * Inspection of failed / bad stuff: number of bad WU, client "badness",
> ...
> >   * Our client has failed a number of times, it might be worth exposing
> the
> > last X lines of the log if it was easy to remove PII (IP addresses,
> client
> > names, ...)
> > * Inspection of successful workunits
> > * rate of timedout tasks.
>
> All of these would be really cool. The inspection of failed stuff and the
> rate of timedout tasks are the ones for which I feel that the need is the
> most compelling.
>
> E.
>
> On Fri, Jul 26, 2019 at 02:28:53AM -0700, Seth Troisi wrote:
> > Background:
> >
> > Thomé visited mersenneforum[1] to comment[2] on our effort to factor a
> 207
> > digit composite of 2^2330+1. As part of that effort I wrote a small
> > monitoring website[3](see attachment for screenshot) github[4].
> > Thomé pointed me towards cado-nfs tracker[5] for this task.
> >
> > Currently my code (factoring-ui) has quite a lot of details requested in
> > the tracker:
> > * number of active hosts (groups of clients with same base name) and
> clients
> > * last submitted workunit by client, total number of workunit, relations,
> > and cpu-seconds by client
> >
> > It has several other fun features (graphs, partial log viewer, and many
> > more).
> > It works by occasionally processing[6] X.log (for ETA estimates) and X.db
> > (for client stats) to create a json status file (as well as a couple of
> png
> > images). This is surprisingly similar to the work done by Micha Grüninger
> > in 2017 in [7] and Emmanuel Thomé in 2016 [8].
> >
> > This json status file is then served with Flask ("Flask is a micro web
> > framework written in Python"). On my machine I serve this with Apache
> > via mod_wsgi
> > (but this is not technically needed flask can easily be run from the
> > command line).
> >
> > Potential ideas:
> > * I've adapted the code to support multiple simultaneous efforts.
> > I/you/someone could easily run a central server with progress on all
> major
> > efforts. This would require some sort of read-only access to log and db
> > file (I find I'm rsyncing ~500kb every 15 minutes).
> > * I'm happy to move this a <monitoring> subdirectory, add the relevant
> > server setting configuration so that it's as simple as `FLASK_APP=app.py
> > flask run --port 5003` to bring up a monitoring server. This would
> require
> > figuring out how to run an occasional hook for log_processor.py or moving
> > the relatively simple processing into the upload processing and sql
> > database.
> >
> > Missing features at this time:
> > * Everything related to polyselect. I missed this stage of the 2,2330L
> > project but my understanding is it was done with other tools
> (msieve-gpu),
> > it's possible clients would still show WU and it wouldn't be hard to add
> > some other counting (for the equivalent of relations)
> > * List of outstanding WUs (these are in the db so it would be easy to
> add)
> > * sorting in the table (pretty simple with bootstrap but not done)
> > * Inspection of failed / bad stuff: number of bad WU, client "badness",
> ...
> >   * Our client has failed a number of times, it might be worth exposing
> the
> > last X lines of the log if it was easy to remove PII (IP addresses,
> client
> > names, ...)
> > * Inspection of successful workunits
> > * rate of timedout tasks.
> >
> > [1] https://www.mersenneforum.org/showthread.php?t=24292
> > [2] https://www.mersenneforum.org/showthread.php?t=24292&page=30
> > [3] http://factoring.cloudygo.com
> > [4] https://github.com/sethtroisi/factoring-ui
> > [5]
> >
> https://gforge.inria.fr/tracker/index.php?func=detail&aid=16699&group_id=2065&atid=7445
> > [6]
> >
> https://github.com/sethtroisi/factoring-ui/blob/master/log_processor.py#L67
> > [7]
> >
> https://lists.gforge.inria.fr/pipermail/cado-nfs-discuss/2017-January/000713.html
> > [8]
> >
> https://lists.gforge.inria.fr/pipermail/cado-nfs-discuss/2016-July/000643.html
>
>
> > _______________________________________________
> > Cado-nfs-discuss mailing list
> > Cado-nfs-discuss at lists.gforge.inria.fr
> > https://lists.gforge.inria.fr/mailman/listinfo/cado-nfs-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gforge.inria.fr/pipermail/cado-nfs-discuss/attachments/20190907/1053edb7/attachment.html>


More information about the Cado-nfs-discuss mailing list