Hi, I am trying some experiments to evaluate performance of peer2peer dma.
I am using spdk to control the nvme drives and fio-plugin compiled with
spdk. I am seeing a weird behavior where when I run 4K IOs with IO-Depth of
1 peer2peer DMA from nvme drive to some pci device (which exposes memory
via Bar1) in a different numa node has a 50th percentile latency of 17
usecs. The same experiment but where nvme device and pcie device in same
numa node (node 0) has a latency of 38 usecs. In both cases fio was running
in node 0 cpu core and pci device (which exposes memory via Bar1) is
connected to node 1. DMA from nvme device to host memory also takes 38
usecs.
To summarize the cases below
1. nvme (numa node 0) - pci device (numa node 1) --- 18 usecs
2. nvme (numa node 1) - pci device (numa node 1) --- 38 usecs
3. nvme (numa node 0) - host memory --- 38 usecs
fio running in numa node 0 cpu core in all cases.
For higher IO Depth values cross numa case (case 1 above), latency
increases steeply and performs poorly than case 2 and case 3.
Any pointers on why this could be happening?
The nvme devices used are both identical intel datacenter ssd 400G.
Thanks
Show replies by date
Hi, I am trying some experiments to evaluate performance of peer2peer dma.
I am using spdk to control the nvme drives and fio-plugin compiled with
spdk. I am seeing a weird behavior where when I run 4K IOs with IO-Depth of
1 peer2peer DMA from nvme drive to some pci device (which exposes memory
via Bar1) in a different numa node has a 50th percentile latency of 17
usecs. The same experiment but where nvme device and pcie device in same
numa node (node 0) has a latency of 38 usecs. In both cases fio was running
in node 0 cpu core and pci device (which exposes memory via Bar1) is
connected to node 1. DMA from nvme device to host memory also takes 38
usecs.
To summarize the cases below
1. nvme (numa node 0) - pci device (numa node 1) --- 18 usecs
2. nvme (numa node 1) - pci device (numa node 1) --- 38 usecs
3. nvme (numa node 0) - host memory --- 38 usecs
fio running in numa node 0 cpu core in all cases.
For higher IO Depth values cross numa case (case 1 above), latency
increases steeply and performs poorly than case 2 and case 3.
Any pointers on why this could be happening?
The nvme devices used are both identical intel datacenter ssd 400G.
Thanks
Hello
Can you provide some additional information?
1) Have you pre-conditioned the NVMe SSDs?
2) Which Intel Data Center NVMe SSDs are you using? I would like to look at the
device spec and see the expected QD 1 latencies?
3) Are you doing random/seq 4K reads from the device?
Thanks.
From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of PR PR
Sent: Tuesday, April 18, 2017 7:05 PM
To: spdk(a)lists.01.org
Subject: [SPDK] spdk peer2peer dma fio latency
Hi, I am trying some experiments to evaluate performance of peer2peer dma. I am using spdk
to control the nvme drives and fio-plugin compiled with spdk. I am seeing a weird behavior
where when I run 4K IOs with IO-Depth of 1 peer2peer DMA from nvme drive to some pci
device (which exposes memory via Bar1) in a different numa node has a 50th percentile
latency of 17 usecs. The same experiment but where nvme device and pcie device in same
numa node (node 0) has a latency of 38 usecs. In both cases fio was running in node 0 cpu
core and pci device (which exposes memory via Bar1) is connected to node 1. DMA from nvme
device to host memory also takes 38 usecs.
To summarize the cases below
1. nvme (numa node 0) - pci device (numa node 1) --- 18 usecs
2. nvme (numa node 1) - pci device (numa node 1) --- 38 usecs
3. nvme (numa node 0) - host memory --- 38 usecs
fio running in numa node 0 cpu core in all cases.
For higher IO Depth values cross numa case (case 1 above), latency increases steeply and
performs poorly than case 2 and case 3.
Any pointers on why this could be happening?
The nvme devices used are both identical intel datacenter ssd 400G.
Thanks