Let me have a try on your version and check the dmesg.

 

Thanks,

Gang

 

From: Victor Banh [mailto:victorb@mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao@intel.com>; Storage Performance Development Kit <spdk@lists.01.org>; Harris, James R <james.r.harris@intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

 

Hi Gang

Any update?

Do you see any error message from “dmesg” with 512k block size running fio?

Thanks

Victor

 

From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao@intel.com>; Storage Performance Development Kit <spdk@lists.01.org>; Harris, James R <james.r.harris@intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

 

Hi Gang

spdk-17.07.1 and dpdk-17.08

Thanks

Victor

 

From: Cao, Gang [mailto:gang.cao@intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb@mellanox.com>; Storage Performance Development Kit <spdk@lists.01.org>; Harris, James R <james.r.harris@intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

 

Hi Victor,

 

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

 

Thanks,

Gang

 

From: Victor Banh [mailto:victorb@mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao@intel.com>; Storage Performance Development Kit <spdk@lists.01.org>; Harris, James R <james.r.harris@intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

 

Hi Cao

Do you see any message from dmesg?

 

I tried this fio version and still saw these error message from dmesg.

 

fio-3.1

 

[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read

[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read

[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read

[869053.218263] ldm_validate_partition_table(): Disk read failed.

[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read

[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read

[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read

[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read

[869053.218296] Dev nvme2n1: unable to read RDB block 0

[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read

[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read

[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read

[869053.218338]  nvme2n1: unable to read partition table

[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736

[869053.246195] ldm_validate_partition_table(): Disk read failed.

[869053.246217] Dev nvme2n1: unable to read RDB block 0

 

From: Cao, Gang [mailto:gang.cao@intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk@lists.01.org>; Harris, James R <james.r.harris@intel.com>
Cc: Victor Banh <victorb@mellanox.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

 

Hi Victor,

 

Thanks for your detailed information on the testing.

 

I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.

 

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

 

fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1  --name=read-phase --rw=randwrite

read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16

...

fio-3.1-20-g132b

Starting 4 processes

Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]

read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017

 

My NIC information:

[root@node4 nvme-cli-gerrit]# lsmod | grep -i mlx

mlx5_ib               172032  0

ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm

mlx5_core             380928  1 mlx5_ib

ptp                    20480  3 ixgbe,igb,mlx5_core

[root@node4 nvme-cli-gerrit]# lspci | grep -i mell

81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

 

From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris@intel.com>; Storage Performance Development Kit <spdk@lists.01.org>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

 

 

 

From: Harris, James R [mailto:james.r.harris@intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk@lists.01.org>
Cc: Victor Banh <victorb@mellanox.com>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

 

(cc Victor)

 

From: James Harris <james.r.harris@intel.com>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <
spdk@lists.01.org>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

 

Hi Victor,

 

Could you provide a few more details?  This will help the list to provide some ideas.

 

1)      On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?

 

Kernel initiator, run these commands on client server.

 

modprobe mlx5_ib

modprobe nvme-rdma

nvme discover -t rdma -a 192.168.10.11 -s 4420

nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1  -a 192.168.10.11 -s 4420

 

 

2)      Can you provide the fio configuration file or command line?  Just so we can have more specifics on “bigger block size”.

 

fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1  --name=read-phase --rw=randwrite

 

3)      Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).

 

Nvmf.conf on target server

 

[Global]

  Comment "Global section"

    ReactorMask 0xff00

 

[Rpc]

  Enable No

  Listen 127.0.0.1

 

[Nvmf]

  MaxQueuesPerSession 8

  MaxQueueDepth 128

 

[Subsystem1]

  NQN nqn.2016-06.io.spdk:nvme-subsystem-1

  Core 9

  Mode Direct

  Listen RDMA 192.168.10.11:4420

  NVMe 0000:82:00.0

  SN S2PMNAAH400039

 

 

           It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz

NUMA node0 CPU(s):     0-7

NUMA node1 CPU(s):     8-15

 

 

 

 

Thanks,

 

-Jim

 

 

From: SPDK <spdk-bounces@lists.01.org> on behalf of Victor Banh <victorb@mellanox.com>
Reply-To: Storage Performance Development Kit <
spdk@lists.01.org>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "
spdk@lists.01.org" <spdk@lists.01.org>
Subject: [SPDK] Buffer I/O error on bigger block size running fio

 

Hi

I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.

I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.

The DPDK is 17.08 and SPDK is 17.07.1.

Thanks

Victor

 

 

[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750

[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968

[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496

[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168

[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568

[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824

[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152

[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088

[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040

[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128

[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792

[48285.191185] nvme nvme1: Reconnecting in 10 seconds...

[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read

[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read

[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read

[48285.191314] ldm_validate_partition_table(): Disk read failed.

[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read

[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read

[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read

[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read

[48285.191347] Dev nvme1n1: unable to read RDB block 0

[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read

[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read

[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read

[48285.191389]  nvme1n1: unable to read partition table

[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0

[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784

[48289.623411] ldm_validate_partition_table(): Disk read failed.

[48289.623447] Dev nvme1n1: unable to read RDB block 0

[48289.623486]  nvme1n1: unable to read partition table

[48289.643305] ldm_validate_partition_table(): Disk read failed.

[48289.643328] Dev nvme1n1: unable to read RDB block 0

[48289.643373]  nvme1n1: unable to read partition table