[knem-devel] The unexpected performance of DMA memcopy with knem
Brice.Goglin at inria.fr
Mon Jul 29 17:11:36 CEST 2013
Le 08/07/2013 18:37, Brice Goglin a écrit :
>> Another problem I found is that per-CPU channel table doesn't always
>> contain the DMA channels in the local processor. For example, in my
>> machine (16 DMA devices and 1 channel on each device), a half of
>> channels in the per-CPU channel table in processor 0 are in a remote
>> processor. It seems the performance is much worse with a channel in
>> the remote processor. I don't know if it is caused by some wrong
>> configurations on my machine. Does anyone encounter the similar
> This is caused by the way the Linux kernel distributes the channels
> among the processors. KNEM just sits on top of it and cannot do much
> about it with the current stack.
Coming back to this thread now that I gave a look at this.
On my machine (dual 8-core Xeon E5), I have 16 DMA channels. 8 are in
the first (8-core) socket, 8 in the second. Unfortunately, when the
kernel distributes these channels to processes, it doesn't look at
affinity and just assigns them by processor ID: we end up with
even-numbers (instead of 0-7) for first-socket processes and odd-numbers
(instead of 8-15) to the second socket processes. So we indeed get an
immediate locality problem.
> The way to fix this is to stop using the netdma channels and directly
> acquire physical channels from KNEM.
KNEM actually currently uses the "general purpose" channels, those are
improperly distributed per cpu as explained above. One ugly way to solve
the issue would be to manually migrate to another CPUs so that we get a
channel with correct locality from the "general purpose" allocator. But
There's another low-level acquire interface but it doesn't work for the
channels we are considering here. So I don't think there's any clean
solution for now. I'll ask the kernel guys and report the locality
issue. We'll see what they say.
> That will require a large rework of
> the KNEM driver code, but it will enable other optimizations such as
> getting interrupts on copy completion.
This one still stands true as far as I understand.
More information about the knem-devel