Reporting intermittent test failures
by Harris, James R
Hi all,
I’ve seen a lot of cases recently where -1 votes from the test pool have been removed from a patch due to a failure unrelated to the patch, but then nothing was filed in GitHub for that failure. The filing in GitHub could be a new issue, or a comment on an existing issue.
Please make those GitHub updates a priority. It’s the only way the project can understand the frequency of those intermittent failures and gather to get them fixed. If you’re not sure if a failure has been seen before, search GitHub issues with the “Intermittent Failure” label, or ask on Slack if anyone else has seen the issue. There is no harm in filing a new issue that may be a duplicate – we can always clean these up later during the next bug scrub meeting. The important thing is that we get the failure tracked.
Thanks,
-Jim
2 months, 1 week
NVMe hotplug for RDMA and TCP transports
by Andrey Kuzmin
Hi team,
is NVMe hotplug functionality as implemented limited to PCIe transport or
does it also work for other transports? If it's currently PCIe only, are
there any plans to extend the support to RDMA/TCP?
Thanks,
Andrey
6 months
Bad Sectors / expected 'error' responses & timing
by alvarso@mit.edu
Hello SPDK team,
First, thank you for the really cool work you are doing! I am working on a small satellite mission at MIT, which will use a 6-channel SDR (SW defined radio). My task is to ensure that the 6-channel SDR data is saved reliably to an SSD. I am working on the processor (PS) side. Another colleague is working on the FPGA PL (programmable logic) side. The FPGA will provide DMA (still under development).
My general idea to try to achieve zero-copy performance is:
ADC -> PL Queue
PS Determines SDRAM temporary storage
DMA from PL Quque -> SDRAM (likely using libiio)
PS Determines when a block (or other storage unit, TBD) is ready to go to SSD
-> because our data is always the same size/format, I believe we can use an analytic/deterministic equation to determine the storage location
DMA from SDRAM -> SSD (likely using SPDK)
when its time to 'process' the data (which has to be at a later time due to power limits of the satellite):
PS determines (analytic equation) data to be processed
DMA from SSD -> SDRAM (w SPDK)
PS informs PL of data vailable
PL processes data via DMA
PL informs new 'processed data queue' ready
PS prepares location for processed data
DMA from SDRAM -> SSD (w SPDK)
The mission PI (principal investigator) has one main worry of our approach: he is concerned that SSD's can end up with 'bad sectors', like older drives, but that its usually a big chunk of space that goes bad. We are not concerned about single-event-upsets (when just one individual piece of data gets damaged), but rather when a large section that can result in us loosing too much data.
I understand that the idea of keeping track of 'bad sectors' in 'hard drives' is usually the task of a file-system. However, for our purposes a file system is appearing to be too much overhead and we have not found one that would help us with a 'zero copy' setup. But we need to be able to know when there are data errors (read data is garbage) and not slow down if there is a bad write request (if a bad write slows down the system, then we loose 'new' data that should have been saved).
I read the documentation as much as possible, and did a good amount of online searching for the 'expected' response from SPDK when the SSD has errors. But I could not find any information on that. I would greatly appreciate if anyone in the team can guide me in the right direction (maybe its pointing to some standard that SPDK adheres to [NVMe & PCIe] but even that I was not sure how SPDK returns such errors and the expected timing of them).
Hopefully this was clear and its in the scope of this list; if its not, please ask me to clarify or I greatly appreciate if you point me in the right direction.
Thank you!
Alvar
PS, Summary:
- Trying to do zero-copy 6-channel data saving from FPGA to SSD (PCIe NVME)
- If I don't want a full file-system, how can I handle 'bad sector' type errors in the SSD?
- Is there any spec of expectation on the timing impacts when an error occurs?
6 months, 3 weeks
Query about "Starting I/O failed" exception while using arbitration NVMe example
by Harshit Jain
Hello Team,
I was trying to setup SPDK on Linux Kernel 5.6.4 and get started on using SPDK examples. I hope I have posted the query in the right forum thread.
I was trying to setup SPDK on Linux Kernel 5.6.4. While trying to execute the NVMe arbitration example, I am facing Starting I/O failed exception. I have added debug prints to get the value of error and it was -22.
The attached NVMe SSD has a single namespace of 64MB and 4k sector size format and single controller. Since basic script was giving exception, I tried reducing the IO count to 1 and io size in bytes to 4096 to keep the IO profile simple. But I still faced the same exception.
Is there any configuration missing while setting up SPDK?
I have tried most of the other examples too, it is observed that NVMe Admin commands and queue creation is working fine but the example script is unable to submit NVM I/O commands to the controller due to the mentioned exception.
SPDK version
Starting SPDK v20.07-pre git sha1 e69375b / DPDK 19.11.0 initialization...
Thanks in advance.
Regards,
Harshit
7 months
Pay attention on the recent binary code location change in SPDK
by Yang, Ziye
Hi all,
Recently, there are some patches in SPDK master branch to move the complied binary code into different locations, e.g.,
1. Application binary in "spdk/app" folder are moved to "spdk/build/bin" folder,
2. Some related binary code from "spdk/examples" folder are moved to spdk/build/examples folder
3. Two Fio plugin binary code are moved to spdk/build/fio folder.
Pay attention on these changes if you use SPDK master branch to conduct code development or debugging work.
Best Regards
Ziye Yang
7 months