[Cado-nfs-discuss] Help starting out using two machines

David Willmore davidwillmore at gmail.com
Sat Mar 7 21:27:53 CET 2020


I never specified the version of cado-nfs I'm using.  I am using the
2.3.0 version.  I can try the current git version if you think it
would help.  I wanted to use a stable release to minimize the chance
of hitting unstabilities or bugs in unreleased code.

On Sat, Mar 7, 2020 at 1:15 PM Emmanuel Thomé <Emmanuel.Thome at inria.fr> wrote:
> > > [[ --override -t auto ]]
> >
> > It is at this point that I'm running into problems.  It seems that if
> > I ever have any override of the threads, I get errors.  In tracking
> > down what was going wrong, I ran into cado-nfs-client.py not
> > respecting the workdir parameter for 'downloads' which effects
> > downloaded binaries, polynomials, and roots files.
>
> I'm not sure which workdir you're speaking of. There's an optional
> WORKDIR in the workunits, but I don't think we ever use it, and anyway
> it's a path that is relative to the cado-nfs-client.py's "basepath".
>
> At any rate, if you want to adjust where the client puts its stuff,
> you want to adjust its "basepath": it
> defaults to $CWD and can be adjusted with the --basepath option to
> cado-nfs-client.py.

Ahh, okay, I set CWD and the work dir to
/home/willmore/factoring/cado-nfs-2.3.0/2  That will be important
later on to make sense of the error message.

> If you wonder why the client seems to ignore paths that are specified in
> the server-level parameter file, here's a simple explanation: if you go
> the route where you're running the clients by yourself (as opposed to
> having the server run them, which we both agree is a potential source of
> problems), then there's a chicken-and-egg problem: the client needs to be
> somewhere before it contacts the server, so there's no way it can obey
> paths that are set in the server-level parameter file. (other parameters,
> of more algorithmic nature, appear in the work units, naturally).

I'm not sure I understand what you're stating here, so I'll state my
guess and you can correct any misunderstanding.  When the client is
run, it only has $CWD for sure (can't be Posix and not have that) but
it may also have --workdir, etc. specified on the command line, but
there is no guarantee of that.  If it only has $CWD, then it have to
assume that is where all server provided paths are relative to.  But,
if it has a --workdir or --basepath, can't it use that instead?  Maybe
--workdir (which I specified) is not meant for that and I was
misunderstanding things.  I will use --basepath instead.  If I do so
(or if I just specify CWD to be where I wish everything to be relative
to), will the client use the paths given by the server relative to
there?  That's what I would assume.

> > That still leaves the main problem.  I get strange errors like:
> > NFO:root:Running  ' d o w n l o a d / p o l y s e l e c t '  - P  1 0
> > 0 0 0  - N  3 5 3 4 9 3 7 4 9 7 3 1 2 3 6 2 7 3 0 1 4 6 7 8 0 7 1 2 6
> > 0 9 2 0 5 9 0 6 0 2 8 3 6 4 7 1 8 5 4 3 5 9 7 0 5 3 5 6 6 1 0 4 2 7 2
> > 1 4 8 0 6 5 6 4 1 1 0 7 1 6 8 0 1 8 6 6 8 0 3 4 0 9  - d e g r e e  4
> > - v  - t  2  - a d m i n  1 4 5 0 0 0  - a d m a x  1 5 0 0 0 0  - i n
> > c r  6 0  - n q  2 5 6  - s o p t e f f o r t  0  >  ' 2 / c 9 0 . p o
> > l y s e l e c t 1 . 1 4 5 0 0 0 - 1 5 0 0 0 0 '
> > ERROR:root:Command resulted in exit code 1
> > ERROR:root:Stderr: b'/bin/sh:  2 / c 9 0 . p o l y s e l e c t 1 . 1 4
> > 5 0 0 0 - 1 5 0 0 0 0 : No such file or directory'
>
> Oh, that seems to be a hilarious one !

I agree it's funny, but also frustrating. :)

> The " >  ' 2 / c 9 0" part is most weird. Which commit did you try ? I
> changed some stuff this week, and got rid of one /bin/sh middle man (see
> https://gitlab.inria.fr/cado-nfs/cado-nfs/issues/21718 and commits
> referenced at the end of the page). I'd be interested in the WU file that
> got downloaded by your client.

Could that have come from the server as it had the parameter:
--workdir /home/scratch/factoring/2/

> If you find something, do tell me. It seems puzzling enough. I'll try and
> see if I can reproduce.

I will try it again with completely different paths for server and
client so the origin of paths should be more clear.

> Longer runs on the clients. This might become awkward if you have to deal
> with scheduling systems and so on. Yes, it's also a potential problem
> when you have machines of vastly different power (and this is an
> acknowledged issue I want to work on someday). Memory usage oes not
> depend on workunit size.

I'm a long time GIMPS contiributer, so I appreciate these issues.
It's not a trivial problem.

> One thing I forgot to add is that at some point, if you hammer the server
> too much, the sqlite3 backend is going to get in the way, as it has
> several shortcomings. We've had much better success using the mysql
> backend for large projects (I think there's some doc in
> scripts/cadofactor/README.md). But probably the first thing to do before
> that is to strive to be gentle on the database load on the server.

I doubt I will be doing anything that large with the resources I have
at my disposal.  Currently I have two dual Xeon 5600X machines (24GB
and 32GB) and one desktop (running windows) with a Ryzen 7 3700X
(64GB).  Do don't see any reason to use small workunits, so the server
should never be loaded too badly.  One of the dual Xeon machines has
an NVME SSD in it, so I could host the database there, that should
provide a reasonable amount of performance at least for my reasonable
needs.

> > Indeed.  I think I'll be okay as it is for the small numbers I plan to
> > run--512 bits or so.
>
> Depending on your interconnect, a distributed block Wiedemann can be a
> useful strategy even for 512 bits. But that's clearly a "next-level"
> exercise.

It's only GigE.  If I get to the point that the linear algebra is an
issue, I'll look into running it on the Windows machine.  Maybe I'll
boot it into a Linux instance for that purpose.  I would think that a
single Ryzen 7 3700X w/64GB would do better than two dual-Xeon 5660X
machines with less than half the memory.  If I had an infiniband link
between the two Xeons, then they might be worth messing with, but
that's not the case at this point.  Even if I did, at best I could get
8x PCI-E 2.0 worth of BW or 5GB/s each way.  I'm not sure that's worth
even trying.

Thank you again for your assistance.

Cheers,
David


More information about the Cado-nfs-discuss mailing list