Hi George,

 

Until someone from the team responds to your questions, I will take a moment to mention that our upcoming SPDK Summit event on April 19th and 20th at the Hyatt in Santa Clara will have a 1 hour deep dive dedicated to this topic. During the 2 days we’ll cover just about every inch of SPDK and will also have discussions on the Intel Intelligent Storage Acceleration Library and Intel’s Cache Acceleration Software.  Additionally, there will be some storage companies talking about their use of SPDK. If you’re interested in attending, follow the link below.

 

https://goo.gl/XkS7Xx

 

Thanks,

Nate

 

From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of George Kondiles
Sent: Wednesday, March 29, 2017 12:07 PM
To: spdk@lists.01.org
Subject: [SPDK] SPDK Blob Store Fundamentals

 

Hello,

 

I am attempting to use the SPDK blob store to implement a basic NVMe-based flat file store. I understand that this is a new addition to the SPDK that is under active development and that documentation/examples of usage are sparse. But this is a great new addition to the SPDK that I've been tracking and so I'm eager to begin using it.

 

With that being said, I've been scouring through its usage in the bdev component, as well as the test cases in an attempt to glean how I might integrate it into my code base (specifically, I am already successfully using the SPDK to interact with NVMe devices) but have a few high-level questions that I hope are easy to answer.

 

1) In the most basic usage, it seems IO channels should be 1-to-1 with threads. It looks like I must start a thread, call spdk_allocate_thread(), then spdk_get_io_channel() to get the spdk_io_channel instance created and associated with that thread.

 

Since spdk_bs_dev.create_channel is synchronous, it looks like I must block the create_channel() call while the above is happening in the new IO thread. Is this a reasonable approach, or am I misinterpreting how IO channels are intended to work?

 

2) I've already got a set of IO threads for executing asynchronous NVMe operations (e.g. spdk_nvme_ns_cmd_read(...)) against one or more devices. These IO threads each own a set of NVMe queue pairs, and have queuing mechanisms allowing for the submission of work to be performed against a specific device. Given this, I am interpreting an IO channel to essentially be an additional "outer" queue of pending blob-IO operations that are processed by an additional, dedicated thread. A call to spdk_bs_dev.read() or .write() would find the correct IO channel thread, enqueue an "outer" blob op, and the channel IO thread would then enqueue one or more lower-level NVMe IO operations on the "inner" queue. Does this interpretation match the intended usage? Am I missing something?

 

3) spdk_bs_dev.unmap() appears to correspond to dealloc/TRIM. Is this correct?

 

4) I've read through the docs at http://www.spdk.io/doc/blob.html and understand at a high level how things are being stored on disk, but there are references to the caching of metadata. My current workload will likely generate on the order of 100K to 1M blobs of sizes ranging from 512KB to 32MB, each with a couple of small attributes. Is there any way to estimate the total size (in memory) of the cache? Also, are any metadata modifications O(n) in the number of blobs?

 

Thanks in advance for any help or insight anyone can provide. Any assistance is greatly appreciated.

 

- George Kondiles