RFC: Implement fallocate() support for Lustre
by Swapnil Pimpale
Hello all,
I have attached a high level design document for implementing fallocate() support for Lustre.
The requirement for this as taken from the JIRA ticket (https://jira.hpdd.intel.com/browse/LU-3606) is as follows:
"The sys_fallocate() syscall was introduced to the linux kernel in the 2.6.24 kernel. There is also an ext4_fallocate() method added in this same kernel release. This has been available in vendor kernels since RHEL 5.4.
We need to implement an fallocate() method for llite, and transport this to the OSTs to interface with the underlying OSD's fallocate() code (for ldiskfs, ZFS has no such method).
The fallocate() API has been added to newer versions of the Lustre kernel to provide both persistent space reservation (block preallocation for a file, possibly beyond the file size, without having to write zeroes to the whole file), and for the reverse operation of hole punching (freeing allocated blocks in the middle or end of a file). However, Lustre predates these APIs and has not yet added support for them. Being able to preallocate space for a file is very useful for HPC applications that know the size of the output file in advance, and helps Lustre make better allocation decisions based on the file size."
I request you to kindly provide your feedback on the same.
Regards,
Swapnil
8 years, 4 months
How to locate the OST object through the MDT object for Lustre 2.5.0?
by Frank Yang
Hi all,
It seems I have little chance to successfully mount the lustre filesystem
again. As a result, I'm trying to extract the data back by inspecting the
MDT and OST objects on ldiskfs. However, I can only find an example of
"Identifying
To Which Lustre File an OST Object
Belongs<http://wiki.lustre.org/manual/LustreManual20_HTML/LustreOperations.html#5...>"
(
http://wiki.lustre.org/manual/LustreManual20_HTML/LustreOperations.html#5...).
That's exactly the opposite (OST->MDT) of what I want (MDT->OST). But, if
it could work, I could still make a big lookup table to get the data back
SLOWLY. However, it seems this example is for old Lustre.
Does anybody have some reference about this? I only use the default Lustre
2.5.0 configuration. It seems I need to check the oi.16 files to get the
correct mapping between MDT object and OST object.
I found a document
http://users.nccs.gov/~fwang2/papers/lustre_report.pdfdescribing the
internals of Lustre. However, I'm not sure if it's
up-to-date enough and actually I can hardly find enough time to comprehend
it. Below is a file map between OST and MDT objects that I'm sure of. If
somebody can help, it may be used as an example. Thanks a lot.
######
###### .zshrc
######
[root@old_mds ~]# debugfs -c -R "stat /ROOT/space/users2/fsyang/.zshrc"
/dev/mapper/VolGroup00-LogVol03
debugfs 1.42.7.wc2
(07-Nov-2013)
/dev/mapper/VolGroup00-LogVol03: catastrophic mode - not reading inode or
group bitmaps
Inode: 19988768 Type: regular Mode: 0644 Flags: 0x0
Generation: 4060094755 Version: 0x00000003:01d726a2
User: 646 Group: 100 Size: 0
File ACL: 0 Directory ACL: 0
Links: 1 Blockcount: 0
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x52b4cef8:00000000 -- Sat Dec 21 07:12:56 2013
atime: 0x52b4cef8:00000000 -- Sat Dec 21 07:12:56 2013
mtime: 0x52677014:00000000 -- Wed Oct 23 14:43:32 2013
crtime: 0x52b4cef8:326ee6e8 -- Sat Dec 21 07:12:56 2013
Size of extra inode fields: 28
Extended attributes stored in inode body:
lma = "00 00 00 00 00 00 00 00 26 04 00 00 02 00 00 00 3f 24 01 00 00 00
00 00
" (24)
lma: fid=[0x200000426:0x1243f:0x0] compat=0 incompat=0
lov = "d0 0b d1 0b 01 00 00 00 3f 24 01 00 00 00 00 00 26 04 00 00 02 00
00 00
00 00 10 00 01 00 00 00 ef ad 41 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 0
0 00 00 00 00 00 " (56)
link = "df f1 ea 11 01 00 00 00 30 00 00 00 00 00 00 00 00 00 00 00 00 00
00 0
0 00 18 00 00 00 02 00 00 04 26 00 00 ef 80 00 00 00 00 2e 7a 73 68 72 63 "
(48)
BLOCKS:
[root@myoss]# debugfs -c -R "stat /O/0/d15/4304367" /dev/sda5
debugfs 1.42.7.wc2 (07-Nov-2013)
/dev/sda5: catastrophic mode - not reading inode or group bitmaps
Inode: 5353030 Type: regular Mode: 0666 Flags: 0x80000
Generation: 3492092415 Version: 0x00000003:00c4f216
User: 646 Group: 100 Size: 658
File ACL: 0 Directory ACL: 0
Links: 1 Blockcount: 8
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x52b4cef8:00000000 -- Sat Dec 21 07:12:56 2013
atime: 0x52b4cef8:00000000 -- Sat Dec 21 07:12:56 2013
mtime: 0x52677014:00000000 -- Wed Oct 23 14:43:32 2013
crtime: 0x52b4cea7:6b754c20 -- Sat Dec 21 07:11:35 2013
Size of extra inode fields: 28
Extended attributes stored in inode body:
lma = "08 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 ef ad 41 00 00 00
00 00
" (24)
lma: fid=[0x100000000:0x41adef:0x0] compat=8 incompat=0
fid = "26 04 00 00 02 00 00 00 3f 24 01 00 00 00 00 00 " (16)
fid: parent=[0x200000426:0x1243f:0x0] stripe=0
EXTENTS:
(0):685147348
[root@old_mds MDT]# ls -l
total 485260
-rw-r--r-- 1 root root 32 Dec 20 10:59 CATALOGS
-rw-r--r-- 1 root root 0 Dec 20 10:47 changelog_catalog
-rw-r--r-- 1 root root 8256 Dec 20 10:47 changelog_users
drwxr-xr-x 2 root root 4096 Dec 20 10:47 CONFIGS
-rw-rw-rw- 1 root root 8192 Dec 20 10:47 fld
-rw-r--r-- 1 root root 0 Dec 20 10:47 hsm_actions
-rw-r--r-- 1 root root 8960 Dec 20 10:47 last_rcvd
-rw-r--r-- 1 root root 64 Dec 20 10:47 lfsck_bookmark
-rw-r--r-- 1 root root 8192 Dec 20 10:47 lfsck_namespace
drwx------ 2 root root 16384 Dec 20 10:47 lost+found
-rw-r--r-- 1 root root 8 Dec 20 10:59 lov_objid
-rw-r--r-- 1 root root 8 Dec 20 10:59 lov_objseq
drwxr-xr-x 2 root root 4096 Dec 20 10:47 NIDTBL_VERSIONS
drwxr-xr-x 5 root root 4096 Dec 20 10:47 O
-rw-r--r-- 1 root root 12759040 Dec 20 10:47 oi.16.0
-rw-r--r-- 1 root root 17424384 Dec 20 10:47 oi.16.1
-rw-r--r-- 1 root root 12759040 Dec 20 10:47 oi.16.10
-rw-r--r-- 1 root root 12759040 Dec 20 10:47 oi.16.11
-rw-r--r-- 1 root root 12759040 Dec 20 10:47 oi.16.12
-rw-r--r-- 1 root root 7991296 Dec 20 10:47 oi.16.13
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.14
-rw-r--r-- 1 root root 6541312 Dec 20 10:47 oi.16.15
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.16
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.17
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.18
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.19
-rw-r--r-- 1 root root 12759040 Dec 20 10:47 oi.16.2
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.20
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.21
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.22
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.23
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.24
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.25
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.26
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.27
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.28
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.29
-rw-r--r-- 1 root root 12759040 Dec 20 10:47 oi.16.3
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.30
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.31
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.32
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.33
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.34
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.35
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.36
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.37
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.38
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.39
-rw-r--r-- 1 root root 12759040 Dec 20 10:47 oi.16.4
-rw-r--r-- 1 root root 7053312 Dec 20 10:47 oi.16.40
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.41
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.42
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.43
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.44
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.45
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.46
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.47
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.48
-rw-r--r-- 1 root root 6545408 Dec 20 10:47 oi.16.49
-rw-r--r-- 1 root root 12759040 Dec 20 10:47 oi.16.5
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.50
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.51
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.52
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.53
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.54
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.55
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.56
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.57
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.58
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.59
-rw-r--r-- 1 root root 12759040 Dec 20 10:47 oi.16.6
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.60
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.61
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.62
-rw-r--r-- 1 root root 6381568 Dec 20 10:47 oi.16.63
-rw-r--r-- 1 root root 12759040 Dec 20 10:47 oi.16.7
-rw-r--r-- 1 root root 12759040 Dec 20 10:47 oi.16.8
-rw-r--r-- 1 root root 10059776 Dec 20 10:47 oi.16.9
-rw-r--r-- 1 root root 400 Dec 20 10:47 OI_scrub
drwxr-xr-x 2 root root 4096 Dec 20 10:47 PENDING
drwxr-xr-x 4 root root 4096 Dec 20 10:47 quota_master
drwxr-xr-x 2 root root 4096 Dec 20 10:47 quota_slave
drwxr-xr-x 2 root root 4096 Dec 20 10:47 REMOTE_PARENT_DIR
drwxr-xr-x 5 root root 4096 Dec 21 19:53 ROOT
-rw-rw-rw- 1 root root 24 Dec 20 10:47 seq_ctl
-rw-rw-rw- 1 root root 24 Dec 20 10:47 seq_srv
Regards,
Frank
8 years, 4 months