Re: [SPDK] Buffer I/O error on bigger block size running fio
by Harris, James R
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on “bigger block size”.
3) Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Victor Banh <victorb(a)mellanox.com>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org" <spdk(a)lists.01.org>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
3 years, 1 month
SPDK Trello board
by Shah, Prital B
All,
I just want to highlight that we have a SPDK Trello board: https://trello.com/spdk for Roadmap discussion and current feature design discussions.
1) To add items to SPDK backlog : Please use this board : https://trello.com/b/P5xBO7UR/things-to-do
2) To discuss features design via individual feature boards.
We are planning to do a discussion on JSON based configuration for SPDK and brainstorming on how we can make it easier for developers to configure SPDK.
You're welcome to join our meeting at +1(916)356-2663. Choose bridge 5, Conference ID: 864687063
Meeting on 09/15/2017 at 02:30 PM MST (UTC-7), this Friday.
Thanks
Prital
3 years, 3 months
Question about create a simple RAID1 in SPDK
by Annan Chang 張安男
Hello all,
I am planning to create a simple RAID1 in spdk.
My original idea is to implement a new spdk bdev device that contain two physical devices.
So I looked up the code of AIO and nvme bdev device.
But I found that writing a new spdk bdev device that contain two physical nvme devices is difficult for me
because of I am not familiar with nvme’s user space driver.
I just found in the maillist, maybe use “vbdev” architecture is just right.
But I cannot find any example of vbdev.
Does anybody can give me the advice how to write a vbdev device the contain of two bdev nvme device ?
Thank you very much.
--
AnNan, Chang
3 years, 4 months
SPDK errors
by Santhebachalli Ganesh
Folks,
My name is Ganesh, and I am working on NVEMoF performance metrics using
SPDK (and kernel).
I would appreciate your expert insights.
I am observing errors when QD on perf is increased above >=64 most of the
times. Sometimes, even for <=16
Errors are not consistent.
Attached are some details.
Please let me know if have any additional questions.
Thanks.
-Ganesh
3 years, 5 months
VTOPYHS Mapping
by Kumaraparameshwaran Rathnavel
Hi All,
I was looking into the vtophys.c and I had the following query.
I understand that this creates a mapping of 128TB Virtual Address Space where Physical Addresses are populated in the map.
The number of entries for Top-Level Map should be 17 Bits(30-46) which is 128K 1GB entries. But we have created entries for 18bits. So what is the reason behind having extra entries. And also the second level page map would require 9 Bits (21-29) 512 entries of 2MB page for a single 1GB map. Why do we create 1024 entries.
Please correct if my understanding is wrong.
SHIFT_128TB 47
SHIFT_1GB 30
SHIFT_2MB 21
/* Translation of a single 2MB page. */
struct map_2mb {
uint64_t translation_2mb;
};
/* Second-level map table indexed by bits [21..29] of the virtual address.
* Each entry contains the address translation or SPDK_VTOPHYS_ERROR for entries that haven't
* been retrieved yet.
*/
struct map_1gb {
struct map_2mb map[1ULL << (SHIFT_1GB - SHIFT_2MB + 1)];
uint16_t ref_count[1ULL << (SHIFT_1GB - SHIFT_2MB + 1)];
};
/* Top-level map table indexed by bits [30..46] of the virtual address.
* Each entry points to a second-level map table or NULL.
*/
struct map_128tb {
struct map_1gb *map[1ULL << (SHIFT_128TB - SHIFT_1GB + 1)];
};
Thanking You,
Param.
3 years, 5 months
FW: SPDK, Blobstore & Swift
by Luse, Paul E
Here's today's Swift community meeting IRC chat about SPDK driven by Wewe's proposal around blobstore, thanks Wewe!!
For those not wanting to read through the transcript, my takeaways are:
* SSDs are not in high use in Swift mainly due to cost making any SW effort in optimizing for flash a relatively low priority for them
* Where SSD are used (container storage) there are no significant performance issues
That said:
* The Swift project technical lead (notmyname) is always looking for new ways to differentiate and wants to keep an open dialogue, just not willing to endorse activities (merge any code into Swift related to this) to support the effort
So at this point anyone is, of course, free to work on a proof of concept w/o a ton of help from the swift folks and, provided there's some compelling data, they'd surely reconsider. I can't speak for the SPDK maintainers but I think w/o Swift community interest using Swift as vehicle to introduce SPDK into a more object oriented system is probably not the best route to go.
Let me know if there are any questions, these guys are always willing to talk.
Thx
Paul
<notmyname> today peluse is back with us (yay) to talk about something he's been working on
<peluse> rock n roll
<notmyname> peluse: take it away
<peluse> I'm thinking I should have typed some shit up in advance to avoid all the typos I'm about to introduce :)
<peluse> anyways...
<peluse> http://spdk.io
<notmyname> #link http://spdk.io
<peluse> is the URL as I mentioned before. Quick high level overview then I'll bring up a proposal someone in our community has made
<peluse> that we haven't spent a whole lot of time thinking about TBH
<notmyname> ok
* andreas_s has quit (Ping timeout: 240 seconds)
<peluse> Also, here's a SNIA talk I did last month about SPDK in general and one relevant component called blobstore https://www.snia.org/sites/default/files/SDC/2017/presentations/Solid_Sta...
<peluse> So SPDK is a set of user space components that is all BSD licensed
* andreas_s (andreas_s@nat/ibm/x-tvmvoewqleavvoqo) has joined
<peluse> its used in a whole bunch of ways but mainly by storage appliances to optimize SSD performance in what swift would call the storage node
<peluse> FYI its in Ceph already but not the default driver
<peluse> and when I say "it" I mean whatever component the system has chosen to take on, in Ceph its the user space polled mode NVMe driver
<peluse> there are some basic perf marketing type hypes slides in that deck I pated in for anyone interested
<peluse> pretty huge gains when you consider latency and CPU sensitive apps running with latest SSDs
<notmyname> so the basic idea is a fast/efficient way to talk to fast storage media that might potentially be useful in swift's object server?
<peluse> anyway, that's the real trick is that its all user space, direct access to HW, no INTs and no locking
<peluse> yup
<peluse> but there are a ton of compoennts, well not a ton, but a bunch that would not be relevant
<timburke> could it be useful for the account/container servers, too, or are we just looking at object servers (and diskfile in particular)?
<notmyname> what are the integration points. I doubt it's as simple as mmaping a file and your'e done
<peluse> and some are lirbaries and some are applications.
* baoli has quit (Remote host closed the connection)
<peluse> I think since its SSD only (well not techncially but it wouldn't make sense to use on spinning media) most likelt container
<rledisez> so we are talking of objec servers on SSD. is it a real use case? (i would think it's the target of ceph, very low latency)
<peluse> if you used object servers there are probably some limitations wrt what we call blobstore
<peluse> I'l get to the integration question in a sec
<peluse> so, assuming a node takes on the user space NVMe driver and the driver talks directly to HW you can see there no kernel and no FS
<peluse> so... unless the storage application talks in blocks it doesn't make much sense
* TxGirlGeek has quit (Quit: My MacBook has gone to sleep. ZZZzzz...)
<notmyname> ok
* artom_ (~artom(a)205.233.59.73<mailto:~artom@205.233.59.73>) has joined
<peluse> blobstore is SPDK's answer to this but its not a FS
* artom_ has quit (Remote host closed the connection)
<peluse> it's a super simple way for apps that don't talk blocks that can use a really simple file-ish object-ish like interface to take advantage of SPDK
<peluse> so for example, RocksDB
* artom_ (~artom(a)205.233.59.73<mailto:~artom@205.233.59.73>) has joined
<peluse> in that slide deck I mention some work we did there to bolt blobstore up to RocksDB as a back end
<notmyname> so ... as you know swift likes to be HW and driver agnostic. what does this tie in too? is it possible to write stuff in a way that works if you have fast media or not?
<peluse> its that kind of idea that might makes sense for Swift
* rcernin (rcernin@nat/redhat/x-twozdfdqzloeefcs) has joined
* andreas_s has quit (Ping timeout: 260 seconds)
* jungleboyj looks in late
<notmyname> or is the idea that swift would engage spdk mode if it detects flash?
<peluse> so there are lots of things that can be done there
* andreas_s (andreas_s@nat/ibm/x-asrkfmvdxkgqcbhv) has joined
<peluse> but yeah I think anything more aggressive than NVMe only would not be worth it
* artom has quit (Ping timeout: 255 seconds)
<peluse> SPDK doesn't automateically do any of that kind of detection
<peluse> so that would have to be considered
<notmyname> that makes sense
<notmyname> I could imagine swift detecting that
<peluse> and blocstore itself is pretty immature, need to point that out. We just now added code to recover from a dirty shutdown if that gives you an idea
<notmyname> ok, so tell me (us) more about the blobstore. would that be a diskfile thing?
<peluse> so this whole thing would be a proof of concept type activity for sure
<notmyname> how does this make rledisez's LOSF work awesomer?
<peluse> so yeah, I think diskfile would make sense
<peluse> but I don't rememeber the details there of course. my brain is pretty small :)
<peluse> In that slide deck you can see a super simple example of the interface
<peluse> blobstore bascially takes over an entire disk, writes its own private metadata and then the app create "blobs" and does basic LBA sized reads and writes to them
<notmyname> ah, ok
<peluse> it can't handle sub-LBA access (by design)
<peluse> well, we can them pages in blobstore but they're 4K
<notmyname> that sounds like a haystack-in-a-library thing. or something similar to what you're working on rledisez
<rledisez> yes, blobstore would be what we call volume. and I guess it embed its own k/v indexation. so it looks similar in some ways
<peluse> yeah, I think the integration effort w/Swift for production would be a decent sized lift but for a POC may be worth it provided, maybe for container SSDs, the latency and CPU usage bebenfit made sense
<notmyname> peluse: is there any spdk component that could replace sqlite? eg some kv store that does transactions?
<notmyname> eg to replace the container layer
* awaugama has quit (Quit: Leaving)
<peluse> rocksDB would be the closest match, using blobstore as a backing component
<peluse> but that's really what Wewe's proposal was - to add a k/v interface on blobstore
<notmyname> ah ok. so a 3rd part db that works with spdk
<peluse> yeah, maybe that's the best first step
<notmyname> any questions from anyone, so far?
<peluse> I can't remember what sqlite guts look like, can you easily replace the storage engine as its called in like MariaDB, anyone know?
* dprince has quit (Quit: leaving)
<notmyname> no
<peluse> yeah, OK didn't think so
<notmyname> sqlite is "just" a DB library
<tdasilva> dumb question from me, but can you explain the difference from spdk and the intel cas tech?
* xyang1 has quit (Quit: xyang1)
<notmyname> ^ not a dumb question
<peluse> sure, good question
<peluse> they are totally different for one thing
<peluse> CAS is a caching project/product that works between an app and the FS.
<peluse> SPDK is a whole bunch of stuff, but not caching layers. It has to be integrated with an application unless you use one of the things like the compiled iSCSI target
* e0ne has quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz...)
<peluse> dunno if that's enough explanation - block cache vs library of stuff for integration, mainly polled mode device driver for NVMe
<peluse> so Q for you guys, is there any urgency with container SSDs and latency and/or using a bunch of CPU?
<tdasilva> peluse, so spdk provides performance improvements by substituting the FS and writing directly to block storage
<rledisez> do you handle caching in bdev or blobstore? or do you assume the underlaying device is fast enought
<peluse> tdasilva, yup
<peluse> rledisez, there's no data caching at all right now
<tdasilva> peluse: very similar to bluestore?
<peluse> bdev is a layer for abstracting different types of block devices. For example we can have an NVMe at the bottom of the stack or a RAM disk and for layers above bdev they don't care. its super light wieght
<peluse> tdasilva, yeah, bluestore and blobstore area lot alike but bluestore was done of course just for Ceph and I think is more mature/feature rich right now
<peluse> but Sage mentioned in his keynote at SNIA SDC about looking at maybe using rocksdb w/blobstore at some point in the future (dont quote me though)
<peluse> that would be in addition to bluestore as backing FS though, no isntead of
<notmyname> peluse: what questions do you have for us?
<tdasilva> peluse: ack, thanks
<peluse> jsut the one above about pain points wrt latency and or CPU utilization around SSDs
* esberglu has quit (Remote host closed the connection)
<peluse> well, and if anyone is interested enough to work with someone from the SPDK community to try and see if there's some sort of proof of concept worth messing with here
<notmyname> only pain points I've seen recently with the container layer is drive fullness and the contaienr replicator not having all the goodness we've added to the object replicator for when drives fill up
* priteau has quit ()
<notmyname> rledisez: how about you? any latency or cpu issues on containers or accounts?
<rledisez> peluse: from my experience, there is not really a pain point about storage speed on containers. having a lot of containers slo down some process (like replicator) as they need to scan all db. not sure yet if blobstore would help here
<peluse> wen I say CPU util, there's more in that deck I referenced, using SPDK (nvme + blobstore) greatly reduces CPU utillization while at the same time greatly improving perf
<peluse> so you get kinda a two fer one thing
<peluse> so for containers you'll get more CPU utillization for other things happening on the storage node, and the IOs will be faster and more repsonsive
<peluse> (or your money back)
<notmyname> heh
<rledisez> how can you measure that CPU usage related to kernel/fs. i don't think i see any, but i would like to check
<rledisez> most of the cpu usage comes from replicator or container-server
<peluse> There's a perf blog on spdk.io that may have some good info in it, honestly I haven't read it :(
<peluse> but we have some folks in our comm that live for that kinda stuff so I can ask there and get back to y'all
<peluse> rledisez, yeah unless used for object storage wouldn't help w/replicator
<rledisez> if you have a magic command to get the cpu usage i would be interested (i guess it would be something related to perf command)
* msimonin has quit (Quit: Leaving.)
* msimonin (~Adium(a)tsv35-1-78-232-147-61.fbx.proxad.net<mailto:~Adium@tsv35-1-78-232-147-61.fbx.proxad.net>) has joined
<notmyname> honestly, spdk sounds really cool. it seems like something that would be great for an all-flash future. (but I'm not sure if anyone deloying swift is there yet)
<peluse> rledisez, yeah I dunno the details of the various measurements but the team has looked at every metric known to man using a variety of tools
* msimonin has quit (Client Quit)
* msimonin (~Adium(a)tsv35-1-78-232-147-61.fbx.proxad.net<mailto:~Adium@tsv35-1-78-232-147-61.fbx.proxad.net>) has joined
<notmyname> peluse: do you have people in the spdk community who are interested in swift? if so, are they interested because they just want to integrate spdk everywhere or because they are using swift already?
* msimonin has quit (Client Quit)
* Alex_Staf (~astafeye(a)bzq-109-65-185-7.red.bezeqint.net<mailto:~astafeye@bzq-109-65-185-7.red.bezeqint.net>) has joined
* msimonin (~Adium(a)tsv35-1-78-232-147-61.fbx.proxad.net<mailto:~Adium@tsv35-1-78-232-147-61.fbx.proxad.net>) has joined
<peluse> Wewe is the only person I know that's brought it up and he wasn't able to get connected today due to network issues
* VW_ (~vw(a)50.56.228.68<mailto:~vw@50.56.228.68>) has joined
* msimonin has quit (Client Quit)
<peluse> right now there's more demand on features/integration than there is anything else so I don't think the former is driving anyone
* msimonin (~Adium(a)tsv35-1-78-232-147-61.fbx.proxad.net<mailto:~Adium@tsv35-1-78-232-147-61.fbx.proxad.net>) has joined
<notmyname> ok
<peluse> which is one of the reasons I wanted to chat w/you guys about this - if it doesn't make a lot of sense to investigate from your perspective we certainly have enough work on our plate :)
* msimonin has quit (Client Quit)
* msimonin (~Adium(a)tsv35-1-78-232-147-61.fbx.proxad.net<mailto:~Adium@tsv35-1-78-232-147-61.fbx.proxad.net>) has joined
<peluse> that's all I got for ya, other questions?
<notmyname> I think it makes sense when looking a few years into the future and preparing for that. it doesn't make sense from the sense that all of our current employers have a huge amount of stuff we need to do in swift way before we get to needing spdk
* msimonin has quit (Client Quit)
<peluse> yup yup
<notmyname> (my opinion)
* msimonin (~Adium(a)tsv35-1-78-232-147-61.fbx.proxad.net<mailto:~Adium@tsv35-1-78-232-147-61.fbx.proxad.net>) has joined
<peluse> what is the current split of SSD usage, still mostly containers?
<notmyname> definitely something I want to keep an eye on
<notmyname> yeah
<peluse> cool
* msimonin has quit (Client Quit)
<notmyname> flash still too expensive for interesting-sized object server deployments
* msimonin (~Adium(a)tsv35-1-78-232-147-61.fbx.proxad.net<mailto:~Adium@tsv35-1-78-232-147-61.fbx.proxad.net>) has joined
<peluse> makes sense
<notmyname> people these days are going for bigger nodes. 80 10TB in a single chassis
<notmyname> (and getting all the eww that implies)
* msimonin has quit (Client Quit)
<peluse> well, that's not to say nobody on this end will work on a proof of concept anyways and if so I'll encourage them to check in the Swift comm frequently of course...
* msimonin (~Adium(a)tsv35-1-78-232-147-61.fbx.proxad.net<mailto:~Adium@tsv35-1-78-232-147-61.fbx.proxad.net>) has joined
<rledisez> i like the idea, and we can surely share some stuff between LOSF/blobstore but i think that people looking for really low latency object store will check ceph as by its design/implem, it looks more suited
<notmyname> let's move on so we can give m_kazuhiro appropriate time :-)
<notmyname> peluse: that's great!
<peluse> thanks for the time guys!!
<notmyname> and thanks for stopping by to give an update
* VW has quit (Ping timeout: 264 seconds)
<peluse> my pleasure... ping me later if anyone has followup questions. take care!
<notmyname> rledisez: I can get you in contact with peluse if you can't find him on IRC late
3 years, 5 months
ppc64le support
by Jonas Pfefferle1
Hi @all,
I just pushed a set of patches for review to gerrithub wich introduces
support for ppc64le architecture:
https://review.gerrithub.io/#/c/383725/
https://review.gerrithub.io/#/c/383726/
https://review.gerrithub.io/#/c/383727/
https://review.gerrithub.io/#/c/383728/
This patchset fixes the build on ppc architectures and page size alignment
of cmd and cpl rings. Since ppc has less strict memory ordering rules it
also introduces a memory barrier when polling for completions. Note that
this only runs against the latest DPDK master which includes fixes to the
sPAPR iommu. Currently it also requires to recompile the kernel with 4KB
page size to allow mapping around the MSI-X table in BAR0. We are working
on a patch to allow mapping the MSI-X table when interrupt remapping is
enabled (this will work with 64KB default page size on ppc).
Let me know what you guys think.
Regards,
Jonas
3 years, 5 months
compile error in old linux kernel version
by liupan1234
Hi All,
I met an error when I tried to compile spdk under linux kernel 2.6.32, and gcc version is 4.9.2:
$make DPDK_DIR=../dpdk/build/
CC lib/bdev/bdev.o
bdev.c:1: error: bad value (native) for -march= switch
bdev.c:1: error: bad value (native) for -mtune= switch
make[2]: *** [bdev.o] Error 1
make[1]: *** [bdev] Error 2
make: *** [lib] Error 2
bdev.c:1: error: bad value (native) for -march= switch
Could you give me some help?
Thanks very much!
Pan
3 years, 5 months
Re: [SPDK] Unusual interface and implementation of SPDK network function for iSCSI target
by Harris, James R
Hi Shuhei,
All of your suggestions make sense to me.
Refactoring net/sock.c make sense. The primary goal of this abstraction was to support multiple network stacks. Currently it is just the kernel network stack, but in the future there could be a userspace TCP stack implementation as well.
Regarding removing * and [*], I would be curious to know if anyone reading the mailing list is depending on * and [*] and if changing it would cause them any difficulty. Maybe we could still support these for one release, but print warning message that these will be deprecated in the future. Could you see how difficult this would be to implement?
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org> on behalf of 松本周平 / MATSUMOTO,SHUUHEI <shuhei.matsumoto.xt(a)hitachi.com>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
Date: Thursday, October 19, 2017 at 8:05 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Unusual interface and implementation of SPDK network function for iSCSI target
Sorry for repeated self reply.
2) User can use “*” as 0.0.0.0 or INADDR_ANY for IPv4 and “[*]” as [::] or in6addr_any for IPV6.
I’m not so confident of my expertise for networking, but I’ve never heard of this odd interface.
I would like to propose deleting the code related with “*” and “[*]”.
does not cause any apparent error or misunderstanding.
I did not understand why this interface is implemented. Hence I asked but if this interface is convenient, of course this should be maintained.
Thank you,
Shuhei Matsumoto
From: 松本周平 / MATSUMOTO,SHUUHEI
Sent: Friday, October 20, 2017 10:46 AM
To: Storage Performance Development Kit
Subject: RE: Unusual interface and implementation of SPDK network function for iSCSI target
I’m preparing patch and change message for each one. I apologize for your inconvenience until that.
From: 松本周平 / MATSUMOTO,SHUUHEI
Sent: Friday, October 20, 2017 10:28 AM
To: Storage Performance Development Kit
Subject: Unusual interface and implementation of SPDK network function for iSCSI target
Hi,
I’m not so confident of networking but as long as I looked into the code I have found that at least the following items make iSCSI target erroneous or difficult to understand.
The customized socket interface of SPDK (lib/net/sock.c) is only used in the SPDK iSCSI target. Hence I think that now may be a good chance to refactor.
Related with my pushed changes I would like to change my priority to the following:
1. the change https://review.gerrithub.io/#/c/381246/
2. the following (I have not registered into GerritHub except a few.)
3. remaining my pushed changes.
I appreciate any feedback and I hope these would make sense for you and more standardized implementation will make connecting SPDK iSCSI target to user space TCP/IP stack easier.
Best Regards,
Shuhei Matsumoto
1) spdk_sock_getaddr(sock, saddr, slen, caddr, clen) (in lib/net/sock.c) can return only IPv4 address correctly, because get_addr_str() does not take into account of IPv6. Hence current code may not work in IPv6 correctly.
static int get_addr_str(struct sockaddr_in *paddr, char *host, size_t hlen)
{
uint8_t *pa;
if (paddr == NULL || host == NULL)
return -1;
pa = (uint8_t *)&paddr->sin_addr.s_addr;
snprintf(host, hlen, "%u.%u.%u.%u", pa[0], pa[1], pa[2], pa[3]);
return 0;
}
2) User can use “*” as 0.0.0.0 or INADDR_ANY for IPv4 and “[*]” as [::] or in6addr_any for IPV6.
I’m not so confident of my expertise for networking, but I’ve never heard of this odd interface.
I would like to propose deleting the code related with “*” and “[*]”.
3) Network portal (struct spdk_iscsi_portal) remember IP address-port pair not as struct sockaddr but only string, host and port.
Hence iSCSI target do not know if each network portal is IPv4 or IPv6 and have to check “[“ and “]” manually.
If we strip “[“ and “]” from the user input and then pass it to getaddrinfo(), we can know and remember if it is IPv4 or IPv6.
It will be helpful and we can delete the helper function spdk_sock_is_ipv6/4() in lib/net/sock.c.
4) In the spdk_iscsi_tgt_node_access(),
“ALL” means that initiator group allows ANY IP address-port pair or iSCSI name of initiators.
However iSCSI target do not know ALL initiators beforehand.
Hence ANY may be better than ALL.
5) spdk_sock_connect() is not used anywhere. Hence abstraction by spdk_sock_create(LISTEN or CONNECT) is not necessary in lib/net/sock.c.
6) spdk_iscsi_portal_grp_is_visible may not run connrectly in the following case:
- an iSCSI target has PG1-IG1 and PG1-IG2
- an initiator has logined to PG1 of the target.
- the initiator is allowed by not IG1 but IG2.
-> However spdk_iscsi_portal_grp_is_visible() only check the first IG among the same PG, that is, only IG1.
Hence in this case spdk_iscsi_portal_grp_is_visible() should return true but would return false.
-> I think this is caused by PG-IG map array and PG-IG map tree will be better (https://review.gerrithub.io/#/c/379933/).
7) I found OK or NG in the comment in lib/iscsi/tgt_node.c.
These are not proper English and localized one in Japan. Hence it may be better to change to Allow and Deny because these are for ACL.
8) initiator group allow empty netmask list as a normal configuration. However we cannot create such one.
(https://review.gerrithub.io/#/c/382920/)
From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Victor Banh
Sent: Thursday, October 19, 2017 4:17 PM
To: Storage Performance Development Kit; Harris, James R; Cao, Gang
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
I am using Ubuntu 16.04 and kernel 4.12.X.
________________________________
From: Cao, Gang <gang.cao(a)intel.com>
Sent: Wednesday, October 18, 2017 11:51:39 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
[root@node4 gangcao]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@node4 gangcao]# lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.2.1511 (Core)
Release: 7.2.1511
Codename: Core
From: Victor Banh [mailto:victorb@mellanox.com]
Sent: Thursday, October 19, 2017 2:14 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; Cao, Gang <gang.cao(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao@intel.com>>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
This server does not have OFED installed and it is in the loopback mode.
Found another server also in the loopback mode with ConnectX-3 and have OFED installed.
By the way, what SSD are you using? Maybe it is relating to the SSD? I’ve just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.
[root@slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
[root@slave3 fio]# lsmod | grep -i mlx
mlx4_ib 159744 0
ib_core 208896 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en 114688 0
mlx4_core 307200 2 mlx4_en,mlx4_ib
ptp 20480 3 ixgbe,igb,mlx4_en
[root@slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):
ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz
cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz
dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz
fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz
fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm
hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm
ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz
ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz
ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz
ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz
infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz
infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz
iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm
knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz
libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm
libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz
libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm
mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm
mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm
mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz
multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz
mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm
mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm
ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d
openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm
opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz
perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz
qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz
rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm
srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
srptools:
srptools/srptools-1.0.2-12.src.rpm
Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim
[root@slave3 fio]# ibstat
CA 'mlx4_0'
CA type: MT4103
Number of ports: 2
Firmware version: 2.35.5100
Hardware version: 0
Node GUID: 0x248a0703006090e0
System image GUID: 0x248a0703006090e0
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e0
Link layer: Ethernet
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e1
Link layer: Ethernet
[root@slave3 fio]# ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.35.5100
node_guid: 248a:0703:0060:90e0
sys_image_guid: 248a:0703:0060:90e0
vendor_id: 0x02c9
vendor_part_id: 4103
hw_ver: 0x0
board_id: MT_1090111023
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
port: 2
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
From: Victor Banh [mailto:victorb@mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao@intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao@intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
I’ve just tried the SPDK v17.07.1 and DPDK v17.08.
nvme version: 1.1.38.gfaab
fio version: 3.1
Tried the 512k and 1024k IO size and there is no error. demsg information as following.
So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?
Other related information:
[root@node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
[root@node4 fio]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root@node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420
Thanks,
Gang
From: Victor Banh [mailto:victorb@mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao@intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Any update?
Do you see any error message from “dmesg” with 512k block size running fio?
Thanks
Victor
From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao@intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor
From: Cao, Gang [mailto:gang.cao@intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb@mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
Thanks,
Gang
From: Victor Banh [mailto:victorb@mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao@intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Cao
Do you see any message from dmesg?
I tried this fio version and still saw these error message from dmesg.
fio-3.1
[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338] nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0
From: Cao, Gang [mailto:gang.cao@intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb@mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Thanks for your detailed information on the testing.
I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017
My NIC information:
[root@node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root@node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
From: Harris, James R [mailto:james.r.harris@intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb@mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
(cc Victor)
From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
Kernel initiator, run these commands on client server.
modprobe mlx5_ib
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.10.11 -s 4420
nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1 -a 192.168.10.11 -s 4420
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on “bigger block size”.
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1 --name=read-phase --rw=randwrite
3) Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).
Nvmf.conf on target server
[Global]
Comment "Global section"
ReactorMask 0xff00
[Rpc]
Enable No
Listen 127.0.0.1
[Nvmf]
MaxQueuesPerSession 8
MaxQueueDepth 128
[Subsystem1]
NQN nqn.2016-06.io.spdk:nvme-subsystem-1
Core 9
Mode Direct
Listen RDMA 192.168.10.11:4420
NVMe 0000:82:00.0
SN S2PMNAAH400039
It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces@lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb@mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk@lists.01.org>" <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
3 years, 5 months