[Cado-nfs-discuss] Help starting out using two machines

David Willmore davidwillmore at gmail.com
Mon Mar 9 01:46:11 CET 2020


On Sat, Mar 7, 2020 at 4:13 PM Emmanuel Thomé <Emmanuel.Thome at inria.fr> wrote:
> On Sat, Mar 07, 2020 at 03:27:53PM -0500, David Willmore wrote:
> Overall, yes, 2.3.0 is getting old, and a 3.0 release is overdue. I
> recommend using the git version. Yes, it comes with the inherent danger
> of hitting instabilities, but this mailing list is a good place to report
> odd stuff. (There are a few blockers to making a new release from the
> current git in my opinion, and that includes a pending merge request, as
> well as problems with negative timings reported in the lattice siever.
> Maybe some other minor things)

Understood.  Pulling current git code.

> please try the git version, first.

Built and running.

> > It's only GigE.  If I get to the point that the linear algebra is an
> > issue, I'll look into running it on the Windows machine.
>
> You'd be looking into a world of pain, really. We completely abandoned
> windows support years ago. Sorry for that. The code did compile and work
> on windows (with mingw), maybe until 2013-2014 or so, but it's completely
> obvious that the code has evolved in ways that would clash with windows
> idiosyncrasies. (and I'm really talking idiosyncrasies. Try "git log -p
> -r 6f0aadc83 portability.h" just for fun. I'm not talking lack of
> portability of our code. We strive to adhere to ISO C99, C++11,
> POSIX-2001 (maybe inadvertently 2008 at times), and we have a range of
> test environments that pester us often enough that we can claim that we
> know a bit about portability). Windows is just an unimportant and
> uninteresting platform for me. This being said, while I won't work on
> having cado-nfs work on windows, I'm not averse to contributions...

I will take your word that porting to Windows is difficult.  I know
the pain that Prime95 goes through to support processor topology
detection and core/task assignment.  I don't wish that job on anyone.

> > Maybe I'll boot it into a Linux instance for that purpose.
>
> yeah, that seems to be a much better use of human time :-)

Strongly agreed.  It's a minor inconvenience, but nothing I cannot live without.

> > I would think that a single Ryzen 7 3700X w/64GB would do better than
> > two dual-Xeon 5660X machines with less than half the memory.  If I had
> > an infiniband link between the two Xeons, then they might be worth
> > messing with, but that's not the case at this point.
>
> GigE is often really disappointing for linear algebra, so I would bet on
> the single host, yes.

Then I will plan on using an alternate linux boot environment for that
machine and to use it for the linear algebra.  That will put it out of
action for longer, but will shorten the whole task time a good deal
and allow me to handle larger project.

> > Even if I did, at best I could get 8x PCI-E 2.0 worth of BW or 5GB/s
> > each way.  I'm not sure that's worth even trying.
>
> Me neither. Also, I'm not sure about your throughput counts. I don't
> think that you can pump 5GB/s out of a 8x PCI-E.

I did the math, I think.  PCI-E v2.0 is 5GT/s x 8 lanes should be
5GB/s *each way*.  That's only 5x what ethernet would do (minus
protocol overhead).  If I had a 16X PCI-E 3.0 slot in both machines, I
could get a reasonable amount of bandwidth, but it doesn't seem worth
persuing at this speed.  Fortunately, the Ryzen box is faster and has
more memory.

When built from git and run on the example from the make file, I run
into a problem:
[willmore at blade1 cado-nfs]$ ./cado-nfs.py
90377629292003121684002147101760858109247336549001090677693
Info:root: Using default parameter file ./parameters/factor/params.c60
Info:root: No database exists yet
Info:root: Created temporary directory /tmp/cado.9bqyjpbs
Info:Database: Opened connection to database /tmp/cado.9bqyjpbs/c60.db
Info:root: Set tasks.linalg.bwc.threads=12 based on detected physical cores
Info:root: Set tasks.threads=24 based on detected logical cpus
Info:root: tasks.threads = 24 [via tasks.threads]
Info:root: tasks.polyselect.threads = 2 [via tasks.polyselect.threads]
Info:root: tasks.sieve.las.threads = 2 [via tasks.sieve.las.threads]
Info:root: tasks.linalg.bwc.threads = 12 [via tasks.linalg.bwc.threads]
Info:root: tasks.sqrt.threads = 8 [via tasks.sqrt.threads]
Info:root: slaves.scriptpath is /home/willmore/factoring/cado-nfs
Info:root: Command line parameters: ./cado-nfs.py
90377629292003121684002147101760858109247336549001090677693
Info:root: If this computation gets interrupted, it can be resumed
with ./cado-nfs.py /tmp/cado.9bqyjpbs/c60.parameters_snapshot.0
Info:Server Launcher: Adding blade1 to whitelist to allow clients on
localhost to connect
Info:HTTP server: Using non-threaded HTTPS server
Info:HTTP server: Using whitelist: localhost,blade1
Info:Lattice Sieving: param rels_wanted is 0
Info:Complete Factorization / Discrete logarithm: Factoring
90377629292003121684002147101760858109247336549001090677693
Info:HTTP server: serving at https://blade1:38715 (0.0.0.0)
Info:HTTP server: For debugging purposes, the URL above can be
accessed if the server.only_registered=False parameter is added
Info:HTTP server: You can start additional cado-nfs-client.py scripts
with parameters: --server=https://blade1:38715
--certsha1=47056cef0eb8628681aff9c1794b99fc75d0d89e
Info:HTTP server: If you want to start additional clients, remember to
add their hosts to server.whitelist
Info:Client Launcher: Starting client id localhost on host localhost
Info:Client Launcher: Starting client id localhost+2 on host localhost
Info:Client Launcher: Starting client id localhost+3 on host localhost
Info:Client Launcher: Starting client id localhost+4 on host localhost
Info:Client Launcher: Starting client id localhost+5 on host localhost
Info:Client Launcher: Starting client id localhost+6 on host localhost
Info:Client Launcher: Starting client id localhost+7 on host localhost
Info:Client Launcher: Starting client id localhost+8 on host localhost
Info:Client Launcher: Starting client id localhost+9 on host localhost
Info:Client Launcher: Starting client id localhost+10 on host localhost
Info:Client Launcher: Starting client id localhost+11 on host localhost
Info:Client Launcher: Starting client id localhost+12 on host localhost
Info:Client Launcher: Running clients: localhost (Host localhost, PID
47676), localhost+2 (Host localhost, PID 47678), localhost+3 (Host
localhost, PID 47680), localhost+4 (Host localhost, PID 47682),
localhost+5 (Host localhost, PID 47684), localhost+6 (Host localhost,
PID 47686), localhost+7 (Host localhost, PID 47688), localhost+8 (Host
localhost, PID 47690), localhost+9 (Host localhost, PID 47692),
localhost+10 (Host localhost, PID 47694), localhost+11 (Host
localhost, PID 47696), localhost+12 (Host localhost, PID 47698)
Info:Polynomial Selection (size optimized): Starting
Info:Polynomial Selection (size optimized): 0 polynomials in queue
from previous run
Info:Polynomial Selection (size optimized): Adding workunit
c60_polyselect1_0-5000 to database
Info:Polynomial Selection (size optimized): Adding workunit
c60_polyselect1_5000-10000 to database
Info:HTTP server: 127.0.0.1 Sending workunit c60_polyselect1_0-5000 to
client localhost
Info:HTTP server: 127.0.0.1 Sending workunit
c60_polyselect1_5000-10000 to client localhost+2

And there it sits.  Nothing is running.  There are ten instances of
cado_nfs_client.py on the machine in a sleep state--clearly waiting
for something on their network sockets. Hmm, okay, it's not using port
8001 for some reason.  I wonder if the firewall is that strict. .....
Yes, yes it is.  It's blocking the seemingly random port that the
script now picks.  Updating the firewall after the script runs isn't
enough.  Is there a way to tell the script to stop picking random
ports?

Cheers,
David


More information about the Cado-nfs-discuss mailing list