[Cado-nfs-discuss] return code -6 , exit code -6

eric.jeancolas at free.fr eric.jeancolas at free.fr
Wed Feb 20 21:20:38 CET 2019


Hi,

I use cado-nfs-2.3.0 for factoring c120 to c160 numbers. It has always been working fine for about 30 numbers so far. This is the first time I get an error.
I'm using a 12 cores, 16GB Ram computer, under Ubuntu 18.4
I wanted to submit the remaing c141 of (4*10^191+17)/3 factorization. 
Command line : nohup ./cado-nfs.py 114092790024322036665919359655432984163277313962134547254404440195422458391584434451220692010515202678454628424122139839393927744958571074477
(nohup to get the process running even after a session stops).
At the end of the calculation (Lattice sieving step), after about 65 hours of calculation, I get the following (end of stdout) 

Info:Lattice Sieving: Marking workunit c140_sieving_26000000-26010000 as ok (99.9% => ETA Fri Feb 15 20:21:27 2019)
Info:HTTP server: 127.0.0.1 Sending workunit c140_sieving_26070000-26080000 to client localhost+4
Info:Lattice Sieving: Adding workunit c140_sieving_26170000-26180000 to database
Info:Lattice Sieving: Found 13303 relations in '/tmp/cado.qnpox55b/c140.upload/c140.26010000-26020000.75hgum4v.gz', total is now 25427334/25433459
Info:Lattice Sieving: Marking workunit c140_sieving_26010000-26020000 as ok (100.0% => ETA Fri Feb 15 20:21:52 2019)
Info:HTTP server: 127.0.0.1 Sending workunit c140_sieving_26080000-26090000 to client localhost
Info:Lattice Sieving: Adding workunit c140_sieving_26180000-26190000 to database
Info:Lattice Sieving: Found 13261 relations in '/tmp/cado.qnpox55b/c140.upload/c140.26030000-26040000.bwi6c9rn.gz', total is now 25440595/25433459
Info:Lattice Sieving: Marking workunit c140_sieving_26030000-26040000 as ok (100.0% => ETA Fri Feb 15 20:22:17 2019)
Info:HTTP server: 127.0.0.1 Sending workunit c140_sieving_26090000-26100000 to client localhost+3
Info:Lattice Sieving: Adding workunit c140_sieving_26190000-26200000 to database
Info:Lattice Sieving: Reached target of 25433459 relations, now have 25440595
Info:Filtering - Duplicate Removal, splitting pass: Starting
Info:Filtering - Duplicate Removal, splitting pass: Splitting 19 new files
Info:Filtering - Duplicate Removal, splitting pass: Relations per slice: 0: 12723950, 1: 12716645
Info:Filtering - Duplicate Removal, removal pass: Starting
Info:HTTP server: 127.0.0.1 Sending workunit c140_sieving_26100000-26110000 to client localhost+6
Info:Filtering - Duplicate Removal, removal pass: 10587187 unique relations remain on slice 0
Warning:Command: Process with PID 3489 finished with return code -6
Error:Filtering - Duplicate Removal, removal pass: Program run on server failed with exit code -6
Error:Filtering - Duplicate Removal, removal pass: Command line was: /home/ng/cado-nfs-2.3.0/build/ng-All-Series/filter/dup2 -nrels 12716645 -renumber /tmp/cado.qnpox55b/c140.renumber.gz /tmp/cado.qnpox55b/c140.dup1//1/dup1.0.0000.gz /tmp/cado.qnpox55b/c140.dup1//1/dup1.1.0000.gz > /tmp/cado.qnpox55b/c140.dup2.slice1.stdout.2 2> /tmp/cado.qnpox55b/c140.dup2.slice1.stderr.2
Error:Filtering - Duplicate Removal, removal pass: Stderr output follows (stored in file /tmp/cado.qnpox55b/c140.dup2.slice1.stderr.2):
b'antebuffer set to /home/ng/cado-nfs-2.3.0/build/ng-All-Series/utils/antebuffer\n[checking true duplicates on sample of 152601 cells]\nAllocated hash table of 15260083 entries (58Mb)\nConstructing the two filelists...\n2 files (1 new and 1 already renumbered)\nReading files already renumbered:\nWarning, insertion cost 106 for a=-358592460 b=10217 h=4834835072344515773 i=15132920 j=1125697761\nWarning, insertion cost 111 for a=2012527949 b=7029 h=6930206488097786108 i=11592921 j=1613564437\nWarning, hash table is 67% full (avg cost 1.00)\nWarning, insertion cost 107 for a=2954498525 b=18124 h=15347151127726195691 i=9218043 j=3573287075\nWarning, insertion cost 127 for a=1513806417 b=96379 h=15754223042399372350 i=2372113 j=3668065891\nWarning, insertion cost 113 for a=7533867483 b=40844 h=10934835052273365397 i=12254485 j=2545964683\nWarning, insertion cost 126 for a=654115036 b=18675 h=14910921127195703663 i=4198215 j=3471719363\nWarning, insertion cost 111 for a=-329315800 b=9993 h=5412028549560374873 i=10076038 j=1260086090\nWarning, insertion cost 107 for a=1423341155 b=24959 h=15740826660245930600 i=5291557 j=3664946802\nWarning, insertion cost 118 for a=2511637215 b=12961 h=17404843824755500942 i=11592922 j=4052380990\n\ngzip: stdin: invalid compressed data--crc error\nfilter_rels_producer_thread: load error (Bad file descriptor) from\n/home/ng/cado-nfs-2.3.0/build/ng-All-Series/utils/antebuffer 24 /tmp/cado.qnpox55b/c140.dup1//1/dup1.0.0000.gz | gzip -dc -\n'
Traceback (most recent call last):
  File "./cado-nfs.py", line 122, in <module>
    factors = factorjob.run()
  File "./scripts/cadofactor/cadotask.py", line 5429, in run
    last_status, last_task = self.run_next_task()
  File "./scripts/cadofactor/cadotask.py", line 5504, in run_next_task
    return [task.run(), task.title]
  File "./scripts/cadofactor/cadotask.py", line 3314, in run
    raise Exception("Program failed")
Exception: Program failed

As it refers to c140.dup2.slice1.stdout.2 and c140.dup2.slice1.stderr.2 , I give here the contents of these two files.

c140.dup2.slice1.stdout.2
# (2acb184f4) /home/ng/cado-nfs-2.3.0/build/ng-All-Series/filter/dup2 -nrels 12716645 -renumber /tmp/cado.qnpox55b/c140.renumber.gz /tmp/cado.qnpox55b/c140.dup1//1/dup1.0.0000.gz /tmp/cado.qnpox55b/c140.dup1//1/dup1.1.0000.gz
# List of modified files in working directory and their SHA1 sum:
# (tarball extracted)
# Compiled with gcc 4.8.4
# Compilation flags -std=c99 -g -W -Wall -O2  -msse3 -mssse3 -msse4.1 -mavx -mavx2 -mpclmul
# Opening /tmp/cado.qnpox55b/c140.renumber.gz to read the renumbering table
# Read 1024 elements in 0.0s -- inf MB/s -- inf elts/s
# Read 2048 elements in 0.0s -- inf MB/s -- inf elts/s
# Read 4096 elements in 0.0s -- inf MB/s -- inf elts/s
# Read 8192 elements in 0.0s -- inf MB/s -- inf elts/s
# Read 16384 elements in 0.0s -- inf MB/s -- inf elts/s
# Read 32768 elements in 0.0s -- inf MB/s -- inf elts/s
# Read 65536 elements in 0.0s -- inf MB/s -- inf elts/s
# Read 131072 elements in 0.0s -- 63.0 MB/s -- 10921939.3 elts/s
# Read 262144 elements in 0.0s -- 83.4 MB/s -- 13671266.7 elts/s
# Read 524288 elements in 0.0s -- 98.8 MB/s -- 15335531.9 elts/s
# Read 1048576 elements in 0.1s -- 107.8 MB/s -- 16156814.6 elts/s
# Read 2097152 elements in 0.1s -- 112.5 MB/s -- 16523418.3 elts/s
# Read 4194304 elements in 0.3s -- 117.6 MB/s -- 16302676.9 elts/s
# Read 8388608 elements in 0.5s -- 119.7 MB/s -- 15928905.6 elts/s
# Read 16777216 elements in 1.1s -- 120.5 MB/s -- 15621346.2 elts/s
# Done: Read 29258219 elements in 1.9s -- 120.7 MB/s -- 15441535.3 elts/s
# Information on renumber struct:
# INFO: sizeof(p_r_values_t) = 4
# INFO: nb_bits = 32
# INFO: number of polynomials = 2
# INFO: Polynomial on side 0 is rational
# INFO: #badideals = 0 [max_p = 0]
# INFO: #additional columns = 0
# INFO: Non monic polynomial on side: 0 1
# INFO: lpb0 = 28
# INFO: lpb1 = 28
# INFO: size = 29258219
# INFO: smallest prime not cached = 0x100007 at index 0x2802c
# INFO: biggest prime below lbp0 is 0xfffffc7 at index 0x1be71ea
# INFO: biggest prime below lbp1 is 0xfffffc7 at index 0x1be71ea
# Read 1024 relations in 0.0s -- 3.4 MB/s -- 33522.0 rels/s
# Read 2048 relations in 0.0s -- 6.7 MB/s -- 66372.5 rels/s
# Read 4096 relations in 0.0s -- 13.2 MB/s -- 130279.8 rels/s
# Read 8192 relations in 0.0s -- 20.6 MB/s -- 202891.9 rels/s
# Read 16384 relations in 0.0s -- 33.5 MB/s -- 329160.4 rels/s
# Read 32768 relations in 0.1s -- 49.2 MB/s -- 484095.1 rels/s
# Read 65536 relations in 0.1s -- 69.2 MB/s -- 680751.5 rels/s
# Read 131072 relations in 0.2s -- 87.2 MB/s -- 857189.3 rels/s
# Read 262144 relations in 0.3s -- 97.8 MB/s -- 960913.3 rels/s
# Read 524288 relations in 0.5s -- 105.4 MB/s -- 1035962.1 rels/s
# Read 1048576 relations in 1.0s -- 106.9 MB/s -- 1050508.8 rels/s
# Read 2097152 relations in 2.0s -- 109.4 MB/s -- 1074023.3 rels/s
# Read 4194304 relations in 3.9s -- 109.4 MB/s -- 1071952.6 rels/s
# Read 8388608 relations in 8.9s -- 96.8 MB/s -- 946319.6 rels/s

c140.dup2.slice1.stderr.2
antebuffer set to /home/ng/cado-nfs-2.3.0/build/ng-All-Series/utils/antebuffer
[checking true duplicates on sample of 152601 cells]
Allocated hash table of 15260083 entries (58Mb)
Constructing the two filelists...
2 files (1 new and 1 already renumbered)
Reading files already renumbered:
Warning, insertion cost 106 for a=-358592460 b=10217 h=4834835072344515773 i=15132920 j=1125697761
Warning, insertion cost 111 for a=2012527949 b=7029 h=6930206488097786108 i=11592921 j=1613564437
Warning, hash table is 67% full (avg cost 1.00)
Warning, insertion cost 107 for a=2954498525 b=18124 h=15347151127726195691 i=9218043 j=3573287075
Warning, insertion cost 127 for a=1513806417 b=96379 h=15754223042399372350 i=2372113 j=3668065891
Warning, insertion cost 113 for a=7533867483 b=40844 h=10934835052273365397 i=12254485 j=2545964683
Warning, insertion cost 126 for a=654115036 b=18675 h=14910921127195703663 i=4198215 j=3471719363
Warning, insertion cost 111 for a=-329315800 b=9993 h=5412028549560374873 i=10076038 j=1260086090
Warning, insertion cost 107 for a=1423341155 b=24959 h=15740826660245930600 i=5291557 j=3664946802
Warning, insertion cost 118 for a=2511637215 b=12961 h=17404843824755500942 i=11592922 j=4052380990

gzip: stdin: invalid compressed data--crc error
filter_rels_producer_thread: load error (Bad file descriptor) from
/home/ng/cado-nfs-2.3.0/build/ng-All-Series/utils/antebuffer 24 /tmp/cado.qnpox55b/c140.dup1//1/dup1.0.0000.gz | gzip -dc -

Well, when I issue the following : gzip -t dup1.0.0000.gz , I get the following 
gzip: dup1.0.0000.gz: invalid compressed data--crc error

ls -l dup1.0.0000.gz returns
-rw-rw-r-- 1 ng ng 604014628 févr. 15 19:47 dup1.0.0000.gz

Not a problem of disk : I have 1.6 TB free.

Maybe a RAM problem when gzip decompresses the file? In this case, may anybody test this number with a 32GB machine before I buy RAM ?

Any other idea to understand what's going on?


More information about the Cado-nfs-discuss mailing list