Does BlobFS Asynchronous API support multi thread writing?
by chen.zhenghua@zte.com.cn
Hi everyone,
I simply tested the BlobFS Asynchronous API by using SPDK events framework to execute multi tasks, each task writes one file.
But it doesn't work, the spdk_file_write_async() reported an error when resizing the file size.
The call stack looks like this:
spdk_file_write_async() -> __readwrite() -> spdk_file_truncate_async() -> spdk_blob_resize()
The resize operation must be done in the metadata thread which invoked the spdk_fs_load(), so only the task dispatched to the metadata CPU core works.
That's to say only one thread can be used to write files. It's hard to use, and performance issues may arise.
Does anyone knows further more about this?
thanks very much
2 months, 1 week
Best practices on driver binding for SPDK in production environments
by Lance Hartmann ORACLE
This email to the SPDK list is a follow-on to a brief discussion held during a recent SPDK community meeting (Tue Jun 26 UTC 15:00).
Lifted and edited from the Trello agenda item (https://trello.com/c/U291IBYx/91-best-practices-on-driver-binding-for-spd... <https://trello.com/c/U291IBYx/91-best-practices-on-driver-binding-for-spd...>):
During development many (most?) people rely on the run of SPDK's scripts/setup.sh to perform a number of initializations, among them the unbinding of the Linux kernel nvme driver from NVMe controllers targeted for use by the SPDK and then binding them to either uio_pci_generic or vfio-pci. This script is applicable for development environments, but not targeted for use in productions systems employing the SPDK.
I'd like to confer with my fellow SPDK community members on ideas, suggestions and best practices for handling this driver unbinding/binding. I wrote some udev rules along with updates to some other Linux system conf files for automatically loading either the uio_pci_generic or vfio-pci modules. I also had to update my initramfs so that when the system comes all the way up, the desired NVMe controllers are already bound to the needed driver for SPDK operation. And, as a bonus, it should "just work" when a hotplug occurs as well. However, there may be additional considerations I might have overlooked on which I'd appreciate input. Further, there's the matter of how and whether to semi-automate this configuration via some kind of script and how that might vary according to Linux distro to say nothing of the determination of employing uio_pci_generic vs vfio-pci.
And, now some details:
1. I performed this on an Oracle Linux (OL) distro. I’m currently unaware how and what configuration files might be different depending on the distro. Oracle Linux is RedHat-compatible, so I’m confident my implementation should run similarly on RedHat-based systems, but I’ve yet to delve into other distro’s like Debian, SuSE, etc.
2. In preparation to writing my own udev rules, I unbound a specific NVMe controller from the Linux nvme driver by hand. Then, in another window I launched: "udevadm monitor -k -p” so that I could observe the usual udev events when a NVMe controller is bound to the nvme driver. On my system, I observed four (4) udev kernel events (abbreviated/edited output to avoid this become excessively long):
(Event 1)
KERNEL[382128.187273] add /devices/pci0000:00/0000:00:02.2/0000:30:00.0/nvme/nvme0 (nvme)
ACTION=add
DEVNAME=/dev/nvme0
…
SUBSYSTEM=nvme
(Event 2)
KERNEL[382128.244658] bind /devices/pci0000:00/0000:00:02.2/0000:30:00.0 (pci)
ACTION=bind
DEVPATH=/devices/pci0000:00/0000:00:02.2/0000:30:00.0
DRIVER=nvme
…
SUBSYSTEM=pci
(Event 3)
KERNEL[382130.697832] add /devices/virtual/bdi/259:0 (bdi)
ACTION=add
DEVPATH=/devices/virtual/bdi/259:0
...
SUBSYSTEM=bdi
(Event 4)
KERNEL[382130.698192] add /devices/pci0000:00/0000:00:02.2/0000:30:00.0/nvme/nvme0/nvme0n1 (block)
ACTION=add
DEVNAME=/dev/nvme0n1
DEVPATH=/devices/pci0000:00/0000:00:02.2/0000:30:00.0/nvme/nvme0/nvme0n1
DEVTYPE=disk
...
SUBSYSTEM=block
3. My udev rule triggers on (Event 2) above: the bind action. Upon this action, my udev rule appends operations to the special udev RUN variable such that udev will essentially mirror that which is done in the SPDK’s scripts/setup.sh for unbinding from the nvme driver and binding to, in my case, the vfio-pci driver.
4. With my new udev rules in place, I was successful getting specific NVMe controllers (based on bus-device-function) to unbind from the Linux nvme driver and bind to vfio-pci. However, I made a couple of observations in the kernel log (dmesg). In particular, I was drawn to the following for an NVMe controller at BDF: 0000:40:00.0 for which I had a udev rule to unbind from nvme and bind to vfio-pci:
[ 35.534279] nvme nvme1: pci function 0000:40:00.0
[ 37.964945] nvme nvme1: failed to mark controller live
[ 37.964947] nvme nvme1: Removing after probe failure status: 0
One theory I have for the above is that my udev RUN rule was invoked while the nvme driver’s probe() was still running on this controller, and perhaps the unbind request came in before the probe() completed hence this “name1: failed to mark controller live”. This has left lingering in my mind that maybe instead of triggering on (Event 2) when the bind occurs, that perhaps I should instead try to derive a trigger on the “last" udev event, an “add”, where the NVMe namespace’s are instantiated. Of course, I’d need to know ahead of time just how many namespaces exist on that controller if I were to do that so I’d trigger on the last one. I’m wondering if that may help to avoid what looks like a complaint during the middle of probe() of that particular controller. Then, again, maybe I can just safely ignore that and not worry about it at all? Thoughts?
I discovered another issue during this experimentation that is somewhat tangential to this task, but I’ll write a separate email on that topic.
thanks for any feedback,
--
Lance Hartmann
lance.hartmann(a)oracle.com
2 years, 8 months
Topic from last week's community meeting
by Luse, Paul E
Hi Shuhei,
I was out of town last week and missed the meeting but saw on Trello you had the topic below:
"a few idea: log structured data store , data store with compression, and metadata replication of Blobstore"
Which I'd be pretty interested in working on with you or at least hearing more about it. When you get a chance, no hurry, can you please expand a little on how the conversation went and what you're looking at specifically?
Thanks!
Paul
2 years, 9 months
Add py-spdk client for SPDK
by We We
Hi, all
I have submitted the py-spdk code on https://review.gerrithub.io/#/c/379741/, please take some time to visit it, I will be very grateful to you.
The py-spdk is client which can help the upper-level app to communicate with the SPDK-based app (such as: nvmf_tgt, vhost, iscsi_tgt, etc.). Should I submit it into the other repo I rebuild rather than SPDK repo? Because I think it is a relatively independent kit upon the SPDK.
If you have some thoughts about the py-spdk, please share with me.
Regards,
Helloway
2 years, 9 months
The disk hot remove function of SPDK
by Vincent
Hello all,
Recently we are trying the disk hot remove property of SPDK.
We have a counter to record the IO send out for a io channel
The roughly hot remove procedure in my code is
(1) when receive the disk hot remove call back from spdk, we stop sending IO
(2) Because we have a counter to record the IO send out for IO channel, we
wait all IOs call back(complete)
(3) close io channel
(4)close bdev desc
But sometime we crashed (the crash rate is about 1/10) , the call stack is
attached
crash in function nvme_free_request
void
nvme_free_request(struct nvme_request *req)
{
assert(req != NULL);
assert(req->num_children == 0);
assert(req->qpair != NULL);
STAILQ_INSERT_HEAD(&req->qpair->free_req, req, stailq);
<-------------this line
}
Does any one can give me a hint ??
Any suggestion is appreciated
Thank you in advance
--------------------------------------------------------------------------------------------------------------------------
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./smistor_iscsi_tgt -c
/usr/smistor/config/smistor_iscsi_perf.conf'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000414b87 in nvme_free_request (req=req@entry=0x7fe4edbf3100)
at nvme.c:227
227 nvme.c: No such file or directory.
Missing separate debuginfos, use: debuginfo-install
bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.170-4.el7.x86_64
elfutils-libs-0.170-4.el7.x86_64 glibc-2.17-222.el7.x86_64
libaio-0.3.109-13.el7.x86_64 libattr-2.4.46-13.el7.x86_64
libcap-2.22-9.el7.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64
libuuid-2.23.2-52.el7.x86_64 lz4-1.7.5-2.el7.x86_64
numactl-libs-2.0.9-7.el7.x86_64 openssl-libs-1.0.2k-12.el7.x86_64
systemd-libs-219-57.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64
zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0 0x0000000000414b87 in nvme_free_request (req=req@entry=0x7fe4edbf3100)
at nvme.c:227
#1 0x0000000000412056 in nvme_pcie_qpair_complete_tracker
(qpair=qpair@entry=0x7fe4c5376ef8, tr=0x7fe4ca8ad000,
cpl=cpl@entry=0x7fe4c8e0a840, print_on_error=print_on_error@entry=true)
at nvme_pcie.c:1170
#2 0x0000000000413be0 in nvme_pcie_qpair_process_completions
(qpair=qpair@entry=0x7fe4c5376ef8, max_completions=64,
max_completions@entry=0) at nvme_pcie.c:2013
#3 0x0000000000415d7b in nvme_transport_qpair_process_completions
(qpair=qpair@entry=0x7fe4c5376ef8,
max_completions=max_completions@entry=0) at nvme_transport.c:201
#4 0x000000000041449d in spdk_nvme_qpair_process_completions
(qpair=0x7fe4c5376ef8, max_completions=max_completions@entry=0)
at nvme_qpair.c:368
#5 0x000000000040a289 in bdev_nvme_poll (arg=0x7fe08c0012a0) at
bdev_nvme.c:208
#6 0x0000000000499baa in _spdk_reactor_run (arg=0x6081dc0) at reactor.c:452
#7 0x00000000004a4284 in eal_thread_loop ()
#8 0x00007fe8fb276e25 in start_thread () from /lib64/libpthread.so.0
#9 0x00007fe8fafa0bad in clone () from /lib64/libc.so.6
3 years, 7 months
slide used at dev meetup wrt CI, etc.
by Luse, Paul E
Hi All,
There was a request to send out the slide(s) Seth and I were talking to at the dev meetup. I *think* this is the main one that was being asked for - if anyone remembers other specific pictures or concepts please let me know and I'll try to dig up whatever else we shared...
Thx
Paul
3 years, 7 months
spdk_blob_io_unmap() usage
by Niu, Yawei
Hi,
I tried to test spdk_blob_io_unmap() and didn’t get the completion callback (not sure if it because I didn't wait long enough), I checked SPDK source and didn’t see any test case of spdk_blob_io_unmap(), so I was wondering if the unmap is supposed to be executed as fast as blob read/write? Or it's not well supported for certain SSD model? BTW, spdk_blob_io_read/write() works well for me.
My SPDK commit:
051297114cb393d3eb1169520d474e81b4215bf0
My SSD model:
NVMe Controller at 0000:81:00.0 [8086:2701]
=====================================================
Controller Capabilities/Features
================================
Vendor ID: 8086
Subsystem Vendor ID: 8086
Serial Number: PHKS7335003H375AGN
Model Number: INTEL SSDPED1K375GA
Firmware Version: E2010324
...
Intel Marketing Information
==================
Marketing Product Information: Intel (R) Optane (TM) SSD P4800X Series
Namespace ID:1
Deallocate: Supported
Deallocated/Unwritten Error: Not Supported
Deallocated Read Value: Unknown
Deallocate in Write Zeroes: Not Supported
Deallocated Guard Field: 0xFFFF
Flush: Not Supported
Reservation: Not Supported
Size (in LBAs): 732585168 (698M)
Capacity (in LBAs): 732585168 (698M)
Utilization (in LBAs): 732585168 (698M)
EUI64: E4D25C73F0210100
Thin Provisioning: Not Supported
Per-NS Atomic Units: No
NGUID/EUI64 Never Reused: No
Number of LBA Formats: 7
Thanks
-Niu
3 years, 7 months
perf vs spdk_app_start()
by Szwed, Maciej
Hi,
I'm implementing NVMf event handler for initiator and I noticed that perf (spdk\examples\nvme\perf) does not use spdk_app_start(). I'd like to know if this was done on purpose? Or can I modify it so that it will use spdk_app_start()?
Regards,
Maciek
3 years, 7 months
[Release] 18.10: Dynamic memory allocation, Crypto vbdev, jsonrpc-client, SPDKCLI iSCSI and NVMe-oF support
by Zawadzki, Tomasz
On behalf of the SPDK community I'm pleased to announce the release of SPDK 18.10!
This release contains the following new features:
- Dynamic memory allocation: SPDK will now automatically utilize DPDK's dynamic memory management with DPDK versions >= 18.05.1.
- Crypto vbdev: A new vbdev module for performing inline data encryption and decryption has been added. It is based on the DPDK cryptodev framework. It supports software encryption as well as hardware assisted encryption via Intel QAT.
- jsonrpc-client: A C library for issuing RPC commands has been added.
- SPDKCLI: The interactive command-line tool for managing SPDK applications is no longer considered experimental. Support for the iSCSI and NVMe-oF targets has also been added.
- iSCSI initiator: The SPDK iSCSI initiator is no longer considered experimental.
- RAID: The RAID virtual bdev module is no longer considered experimental and is now enabled by default. RAID 0 is the only RAID level supported.
The full changelog for this release is available at:
https://github.com/spdk/spdk/releases/tag/v18.10
This quarterly release contains 670 commits from 55 different authors. We'd especially like to recognize all of our first time contributors:
Avinash M N
Chen Zhenghua
Crane Chu
Enming Zhang
Li Feng
Mike Altman
Ni Xun
Potnuri Bharat Teja
Rami Rosen
Sun Zhenyuan
Takeshi Yoshimura
Vitaliy Mysak
Wael Halbawi
Wuzhouhui
Thanks to everyone for your contributions, participation, and effort!
Thanks,
Tomek
3 years, 7 months