[Cado-nfs-discuss] Block-Wiedemann with MPI
Emmanuel.Thome at gmail.com
Tue Oct 4 11:55:19 CEST 2011
On Tue, Oct 04, 2011 at 09:05:37AM +0000, Schmid Johannes wrote:
> I'm trying to use cado-nfs-1.0 with MPI enabled,
Don't. Use the svn version, please. Some changes made since 1.0 impacted
bwc code significantly, so that guidance on how to cure the quirks of
that version isn't so interesting.
> but I have a basic question: is the working directory (wdir) supposed
> to be accessible/shared for all nodes? For all other steps and for
> mpi=0 this isn't the case, but with mpi=1 the bwc commands are
> accessing directories (projectname.bwc) and *.bin files from the
> working directory. I tried sharing the working directory between the
> machines, but this caused the computation to slow down significantly.
> Could this be due to insufficient network speed?
There are some infelicities with this, surely. The scripts do have their
rough edges to this regard. I'm almost sure that cadofct.pm doesn't
handle this smoothly, and the ``integrated'' interface ``bwc.pl
:complete'' should, but could have some bugs. So the core programs don't
care, but the driver scripts might be slightly limited at the moment
At least see how it goes with the svn version, and then we might want to
interact in order to see the stuff which needs to be improved in the
If you can read shell, you might want to have a look at
scripts/cluster/bwc-cluster-oar.sh -- it's special purpose for one
computation I've run, but at least it's a working example. It only
assumes a shared directory for the binaries, not for the data. The
difficult stuff in that case is that it's written for a cluster where
starting jobs inherit no direct access to the data, so that everything
has to be imported at job start. But it's perhaps overly complicated.
> On a different note, I've had problems restarting the factoring after
> the linear algebra stage aborted due to the problems mentioned above.
> The symptom is an error message
> Error:parameter mpi from <wdir>/<project>.param clashes with value defined earlier from other config files (0 vs. 2) at <bindir>/cadofct.pm line 281.
> I think the problem is that before <project>.param is read, the
> parameters have already been established (either from the default
> values or by parsing the other param files) and line 281 disallows
> certain parameters to be redefined. I get by by removing the
> <project>.param file, but I think that this may be a bug.
yeah. Again, use the svn version instead. I recall having fixed a couple
of things regarding untimely parsing of parameters. I acknowledge at
least that there were issues of this kind at some point -- I expect them
to be fixed now.
> Thanks in advance for any help on these issues!
More information about the Cado-nfs-discuss