[Cado-nfs-discuss] Help starting out using two machines

Emmanuel Thomé Emmanuel.Thome at inria.fr
Sat Mar 7 22:13:00 CET 2020


On Sat, Mar 07, 2020 at 03:27:53PM -0500, David Willmore wrote:
> I never specified the version of cado-nfs I'm using.  I am using the
> 2.3.0 version.  I can try the current git version if you think it
> would help.  I wanted to use a stable release to minimize the chance
> of hitting unstabilities or bugs in unreleased code.

ah ok. Then some of what I said, especially regarding -t auto, would be
irrelevant. I'm not even sure that --override works at all in 2.3.0, and
if it does, then I'm almost certain that it does do funny things. I don't
recall regexp woes of the kind you report, but that seems possible.

Overall, yes, 2.3.0 is getting old, and a 3.0 release is overdue. I
recommend using the git version. Yes, it comes with the inherent danger
of hitting instabilities, but this mailing list is a good place to report
odd stuff. (There are a few blockers to making a new release from the
current git in my opinion, and that includes a pending merge request, as
well as problems with negative timings reported in the lattice siever.
Maybe some other minor things)

> On Sat, Mar 7, 2020 at 1:15 PM Emmanuel Thomé <Emmanuel.Thome at inria.fr> wrote:
> > you want to adjust its "basepath": it
> > defaults to $CWD and can be adjusted with the --basepath option to
> > cado-nfs-client.py.
> 
> Ahh, okay, I set CWD and the work dir to
> /home/willmore/factoring/cado-nfs-2.3.0/2  That will be important
> later on to make sense of the error message.

ah, right. indeed.

> > If you wonder why the client seems to ignore paths that are specified in
> > the server-level parameter file, here's a simple explanation: if you go
> > the route where you're running the clients by yourself (as opposed to
> > having the server run them, which we both agree is a potential source of
> > problems), then there's a chicken-and-egg problem: the client needs to be
> > somewhere before it contacts the server, so there's no way it can obey
> > paths that are set in the server-level parameter file. (other parameters,
> > of more algorithmic nature, appear in the work units, naturally).
> 
> I'm not sure I understand what you're stating here, so I'll state my
> guess and you can correct any misunderstanding.  When the client is
> run, it only has $CWD for sure (can't be Posix and not have that) but
> it may also have --workdir, etc. specified on the command line, but
> there is no guarantee of that.  If it only has $CWD, then it have to
> assume that is where all server provided paths are relative to.  But,
> if it has a --workdir or --basepath, can't it use that instead?

It obeys --basepath and --workdir, yes. I think that the code is the best
source of explanation, here.

    if SETTINGS["WORKDIR"] is None:
        SETTINGS["WORKDIR"] = SETTINGS["CLIENTID"] + '.work/'
    if not SETTINGS["BASEPATH"] is None:
        SETTINGS["WORKDIR"] = os.path.join(SETTINGS["BASEPATH"],
                                           SETTINGS["WORKDIR"])
        SETTINGS["DLDIR"] = os.path.join(SETTINGS["BASEPATH"],
                                         SETTINGS["DLDIR"])

=> basepath is the main container, and it creates dldir and workdir from
there, which are both relative paths.

I don't recall whether this worked like this in 2.3.0 ; a quick
inspection by git show 2.3.0:cado-nfs.py  suggests that yes.

> Maybe --workdir (which I specified) is not meant for that and I was
> misunderstanding things.  I will use --basepath instead.  If I do so
> (or if I just specify CWD to be where I wish everything to be relative
> to), will the client use the paths given by the server relative to
> there?  That's what I would assume.

The server doesn't ship paths to the clients in the workunits, period.
The clients get their paths from their command line (which _may_ be
crafted by the server in the "server-spawns-the-clients" setting). If
there's no --basepath, --workdir, --dldir, then the client defines
defaults that are $CWD/[[client_id]].work and $CWD/download

> > > That still leaves the main problem.  I get strange errors like:
> > > NFO:root:Running  ' d o w n l o a d / p o l y s e l e c t '  - P  1 0
> > > 0 0 0  - N  3 5 3 4 9 3 7 4 9 7 3 1 2 3 6 2 7 3 0 1 4 6 7 8 0 7 1 2 6
> > > 0 9 2 0 5 9 0 6 0 2 8 3 6 4 7 1 8 5 4 3 5 9 7 0 5 3 5 6 6 1 0 4 2 7 2
> > > 1 4 8 0 6 5 6 4 1 1 0 7 1 6 8 0 1 8 6 6 8 0 3 4 0 9  - d e g r e e  4
> > > - v  - t  2  - a d m i n  1 4 5 0 0 0  - a d m a x  1 5 0 0 0 0  - i n
> > > c r  6 0  - n q  2 5 6  - s o p t e f f o r t  0  >  ' 2 / c 9 0 . p o
> > > l y s e l e c t 1 . 1 4 5 0 0 0 - 1 5 0 0 0 0 '
> > > ERROR:root:Command resulted in exit code 1
> > > ERROR:root:Stderr: b'/bin/sh:  2 / c 9 0 . p o l y s e l e c t 1 . 1 4
> > > 5 0 0 0 - 1 5 0 0 0 0 : No such file or directory'
> >
> > Oh, that seems to be a hilarious one !
> 
> I agree it's funny, but also frustrating. :)

please try the git version, first.

> [...]
> > > Indeed.  I think I'll be okay as it is for the small numbers I plan to
> > > run--512 bits or so.
>
> > Depending on your interconnect, a distributed block Wiedemann can be a
> > useful strategy even for 512 bits. But that's clearly a "next-level"
> > exercise.
> 
> It's only GigE.  If I get to the point that the linear algebra is an
> issue, I'll look into running it on the Windows machine.

You'd be looking into a world of pain, really. We completely abandoned
windows support years ago. Sorry for that. The code did compile and work
on windows (with mingw), maybe until 2013-2014 or so, but it's completely
obvious that the code has evolved in ways that would clash with windows
idiosyncrasies. (and I'm really talking idiosyncrasies. Try "git log -p
-r 6f0aadc83 portability.h" just for fun. I'm not talking lack of
portability of our code. We strive to adhere to ISO C99, C++11,
POSIX-2001 (maybe inadvertently 2008 at times), and we have a range of
test environments that pester us often enough that we can claim that we
know a bit about portability). Windows is just an unimportant and
uninteresting platform for me. This being said, while I won't work on
having cado-nfs work on windows, I'm not averse to contributions...

> Maybe I'll boot it into a Linux instance for that purpose.

yeah, that seems to be a much better use of human time :-)

> I would think that a single Ryzen 7 3700X w/64GB would do better than
> two dual-Xeon 5660X machines with less than half the memory.  If I had
> an infiniband link between the two Xeons, then they might be worth
> messing with, but that's not the case at this point.

GigE is often really disappointing for linear algebra, so I would bet on
the single host, yes.

> Even if I did, at best I could get 8x PCI-E 2.0 worth of BW or 5GB/s
> each way.  I'm not sure that's worth even trying.

Me neither. Also, I'm not sure about your throughput counts. I don't
think that you can pump 5GB/s out of a 8x PCI-E.

E.


More information about the Cado-nfs-discuss mailing list