On Feb 18, 2014, at 12:41, "John Bauer"
<bauerj@iodoctors.com<mailto:bauerj@iodoctors.com>> wrote:
How do I determine the file logical order of the Lustre extents, given the device logical
order? Can I assume the file logical order of the extents will follow the
the device order indicated by obdidx from the "lfs getstripe" output?
That is not correct. The OST index is not related to the order of stripes in the file. The
filefrag/FIEMAP output for Lustre shows the physical layout and not the logical layout.
The logical layout is given by "lfs getstripe". If there is a single-stripe
file then they happen to be the same.
The reason this was done is to reduce the number of extents returned to userspace and make
it easier for users to see whether the file is fragmented on physical storage or not.
Otherwise, FIEMAP would return one extent per MB of the file data, in a round-robin order
for all the stripes. That would make it hard for the user to know how large the on-disk
extents are, which was the original reason for creating FIEMAP.
It would be possible to fix the Lustre FIEMAP implementation to return
"file-logical" extent ordering to callers, such as "cp" and
"tar", so that they can copy sparse striped Lustre files more efficiently, but
nobody has done this yet. It would mean many more extents returned, but if they are
logically contiguous most userspace applications will not care about this.
Is it safe to assume that the order of the devices as discovered in the filefrag output
is always the same as the order of the devices in the "lfs getstripe" output?
Yes, FIEMAP does traverse the OSTs in the same order as the layout. I don't know if
this will always be true, but as yet there is no reason to do otherwise. I think this
doesn't really matter, since the original assumption is flawed. If the file-logical
FIEMAP was implemented the question would be moot.
Cheers, Andreas
In the output below, both have the order 4, 12, 9, 1. Is this coincidental or is it a
given?
% lfs getstripe -v /lus/scratch/p01940/pwrite.dat
/lus/scratch/p01940/pwrite.dat
lmm_magic: 0x0BD10BD0
lmm_seq: 0x2002dab80
lmm_object_id: 0xd1e
lmm_stripe_count: 4
lmm_stripe_size: 4194304
lmm_stripe_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 4
obdidx objid objid group
4 845042 0xce4f2 0
12 845220 0xce5a4 0
9 844466 0xce2b2 0
1 844523 0xce2eb 0
% /usr/sbin/filefrag -v /lus/scratch/p01940/pwrite.dat
Filesystem type is: bd00bd0
File size of /lus/scratch/p01940/pwrite.dat is 4194304000 (4096000 blocks of 1024 bytes)
ext: device_logical: physical_offset: length: dev: flags:
0: 0.. 10239: 1850072064..1850082303: 10240: 0004: network
.
115: 928768.. 1023999: 1851001856..1851097087: 95232: 0004: network
116: 0.. 16383: 14315287552..14315303935: 16384: 000c: network
.
270: 1021952.. 1023999: 14316310528..14316312575: 2048: 000c: network
271: 0.. 10239: 4657097728..4657107967: 10240: 0009: network
.
381: 945152.. 946175: 4658040832..4658041855: 1024: 0009: network
431: 978944.. 1023999: 4333852672..4333897727: 45056: 0001: network
/lus/scratch/p01940/pwrite.dat: 421 extents found