[knem-devel] "unexisting region cookie" error

Brice Goglin Brice.Goglin at inria.fr
Mon Jul 11 22:26:34 CEST 2011


Le 11/07/2011 22:20, bin wang a écrit :
> hello Brice, 
>
> Thanks for your timely reply.
>
> I checked the return value of ioctl, and it was -1 if the process
> crashes. 
>
> In my code,  
> 1. I didn't set the KNEM_FLAG_SINGLEUSE flag, and the sender process
> was still running.
> So I assume that the memory region was not destroyed yet.
> 2.I'm always checking the cookies on both writer and reader sides, and
> they are exactly the same.
> 3.The modular info suggests that the the requests was due to
> unexisting region
>
>
>
> Do you have any other suggestion?

One thing you could do is:
* Load the knem kernel module with the module parameter statsverbose=1
* In your MPI code, when you get -1, insert a debug printf with the
cookie value and then a "while (1);" so that the process stops
progressing and waits.
* When you see the above printf, run (as root) "cat /dev/knem" to get
even more module information, including the existing cookie values in
the driver (only printed if statsverbose=1).

Brice



>
>
>
> I'm looking forward to hearing from you.
>
> On Mon, Jul 11, 2011 at 4:10 PM, Brice Goglin <Brice.Goglin at inria.fr
> <mailto:Brice.Goglin at inria.fr>> wrote:
>
>     Hello,
>
>     From what I see in your knem module information, it complains that
>     many
>     knem copy requests were invalid because the submitted cookie is
>     invalid.
>     This usually suggests that the cookie value has been altered between
>     when it was created with the region create ioctl and when it is
>     used in
>     a copy ioctl. Or it could be a cookie that has already been destroyed
>     (either explicitly or through the single-use flag).
>
>     Are you developing your own MPI port over KNEM? If so, I suggests that
>     you check the return value of the copy ioctl. When it returns -1 with
>     errno=EINVAL, you should print the value of the cookie and check
>     that it
>     matches the cookie value that was previously returned by a region
>     creation.
>
>
>  
>
>     I am thinking of printing the available cookie values in the
>     kernel logs
>     when such an invalid cookie is requested. It would be very verbose
>     when
>     multiple processes are involved, but it may help you debug this
>     kind of
>     problem.
>
>     Brice
>
>
>
>     Le 11/07/2011 21:47, bin Wang a écrit :
>     > hello All,
>     >
>     > I'm trying to utilize knem in MPI.
>     > When there is only two processes, knem was working properly.
>     > when # of processes is 3, the code is not working properly all
>     the time.
>     > when # of processes goes beyond 3, there will be at least one
>     process
>     > that will crash without calling the finalize.
>     >
>     > I don't know why it's not working properly for me.
>     > Below is the information of knem module.
>     >
>     > $ cat /dev/knem
>     > knem 0.9.6
>     >  Driver ABI=0xd
>     >  Flags: forcing 0x0, ignoring 0x0
>     >  DMAEngine: KernelSupported Enabled NoChannelAvailable
>     >  Debug: NotBuilt
>     >  Requests submitted                           : 68
>     >  Requests processed (total)                   : 50
>     >           processed (using DMA)               : 0
>     >           processed (offloaded to thread)     : 0
>     >           processed (with pinned local pages) : 0
>     >  Requests rejected (invalid flags)            : 0
>     >           rejected (not enough memory)        : 0
>     >           rejected (invalid ioctl argument)   : 0
>     >           rejected (unexisting region cookie) : 19
>     >           rejected (failed to pin local pages): 0
>     >  Requests failed during memcpy from/to user   : 0
>     >           failed during DMA copy              : 0
>     >  DMA copy cleanup timeout                     : 0
>     >
>     >
>     > Can anyone help me out?
>     >
>
>
>
>
> -- 
> Bin WANG
>

-------------- section suivante --------------
Une pi?ce jointe HTML a ?t? nettoy?e...
URL: <http://lists.gforge.inria.fr/pipermail/knem-devel/attachments/20110711/925adcc6/attachment-0001.htm>


More information about the knem-devel mailing list