lfs_migrate failure
by Arman Khalatyan
Hello,
I wondered is lfs_migrate is only for striped files?
when I run on non striped one on lustre 2.4.1 I get following error:
lfs_migrate /lustre/arm2arm/tileimg_vis.png
lfs_migrate is currently NOT SAFE for moving in-use files.
Use it only when you are sure migrated files are unused.
If emptying OST(s) that are not disabled on the MDS, new
files may use them. To prevent MDS allocating any files on
OSTNNNN run 'lctl --device %{fsname}-OSTNNNN-osc deactivate'
on the MDS.
Continue? (y/n) y
/lustre/arm2arm/tileimg_vis.png: cannot swap layouts between
/lustre/arm2arm/tileimg_vis.png and a volatile file (Operation not
permitted)
error: migrate: migrate stripe file '/lustre/arm2arm/tileimg_vis.png' failed
Thanks,
Arman.
8 years, 8 months
Are there way to force to recheck quotas on 2.4.1?
by Arman Khalatyan
Hello,
After MDT failure fsck was fixing quota errors.
Now the lfs quota -u username /lustre does not show correct number of files
per user.
Is it possible to recheck quotas for all users?
thanks,
Arman.
8 years, 8 months
Suggestions for greenfield Lustre 2.5 site: dumb or RAID HBAs?
by Anthony Alba
Hello list,
I have the opportunity for a greenfield Lustre 2.5+ site.
We only have on-site experience with 2.1.x and hw RAID ldiskfs.
A major decision point is OSS HBA dumb or RAID, and ldiskfs or ZFS; so
could you chime in on, if you had the choice for 2.5+ would you go for:
1. RAID HBAs or dumb HBAs+RAID enclosure using hw RAID ldiskfs. Would you
even consider hw RAID ZFS?
E.g., some sample hardware (as we are mostly a Dell shop)
Dell PERC+MD1200/C8000XD encl,
Dell 6Gbps SAS (dumb) HBA + MD3200 encl
2. Dumb HBAs with ZFS raidz2
E.g. Dell 6Gbps SAS (dumb) HBA + MD1200 or C8000XD
non-RAID enclosures
Thanks for you wisdom.
A. Alba
8 years, 8 months
Tools to collect performance numbers on OSS / OSTs
by Singhal, Upanshu
Hello,
Can someone please suggest some tools to collect performance numbers like IOPS on OSTs while running load from Lustre Clients?
Thanks,
-Upanshu
Upanshu Singhal
EMC Data Storage Systems, Bangalore, India.
Phone: 91-80-67375604
8 years, 9 months
Re: [HPDD-discuss] [Lustre-discuss] Can't increase effective client read cache
by Dilger, Andreas
On 2013-09-24, at 14:15, "Carlson, Timothy S" <Timothy.Carlson(a)pnnl.gov> wrote:
> I've got an odd situation that I can't seem to fix.
>
> My setup is Lustre 1.8.8-wc1 clients on RHEL 6 talking to 1.8.6 servers on RHEL 5.
>
> My compute nodes have 64 GB of memory and I have a use case where an application has very low memory usage and needs to access a few thousand files in Lustre that range from 10 to 50 MB. The files are subject to some reuse and it would be advantageous to cache as much of the data as possible. The default cache for this configuration would be 48GB on the client as that is 75% of memory. However the client never caches more than about 40GB of data according to /proc/meminfo
>
> Even if I tune the cached memory to 64GB the amount of cache in use never goes past 40GB. My current setting is as follows
>
> # lctl get_param llite.*.max_cached_mb
> llite.olympus-ffff8804069da800.max_cached_mb=64000
>
> I've also played with some of the VM tunable settings. Like running vfs_cache_pressure down to 10
>
> # vm.vfs_cache_pressure = 10
>
> In no case do I see more than about 35GB of cache being used. To do some more testing on this I created a bunch (40) 2G files in Lustre and then copied them to /dev/null on the client. While doing this I ran the fincore tool from http://code.google.com/p/linux-ftools/ to see if the file was still in cache. Once about 40GB of cache was used, the kernel started to drop files from the cache even though there was no memory pressure on the system.
>
> If I do the same test with files local to the system, I can fill all the cache to about 61GB before files start getting dropped.
>
> Is there some other Lustre tunable on the client that I can twiddle with to make more use of the local memory cache?
This might relate to the number of DLM locks cached on the client. Of the locks get cancelled for some reason (e.g. memory pressure on the server, old age) then the pages covered by the locks will also be dropped.
You could try disabling the lock LRU and specify some large static number of locks (for testing, I wouldn't leave this set for production systems with large numbers of clients):
lctl set_param ldlm.namespaces.*.lru_size=10000
To reset it to dynamic DLM LRU size management set a value of "0".
Cheers, Andread
8 years, 9 months
Have you contributed to the Lustre Manual lately?
by Dilger, Andreas
The Lustre Filesystem Operations Manual is an important resource for the
Lustre Community for both new and experienced Lustre users to find out
more information about how to use and maintain a Lustre filesystem.
Unfortunately, while there are ongoing efforts to improve the manual over
the past few years it has not had the full-time care and attention that it
really needs to be kept up-to-date. Some sections of the manual were
written many years ago and need updates to the technical content (e.g.
updated examples and/or output from commands), others need to be expanded
with more detail.
Unlike in years past, the Lustre 2.x manual XML sources have been open for
direct user contributions through the Intel Git/Gerrit repository (the
same one used for managing the Lustre 2.x code) at
git://git.whamcloud.com/doc/manual.
If you want to contribute to the Lustre Community, but are not interested
in writing Lustre code, this is an opportunity for you. If you have ever
used the Lustre manual, but found a problem with it or wanted to improve
it, this is relatively easy to do.
Known issues and improvements to the manual (both in progress, or looking
for contributors) can be found at:
https://jira.hpdd.intel.com/browse/LUDOC
Details on how to contribute to the manual are available at:
https://wiki.hpdd.intel.com/display/PUB/Making+changes+to+the+Lustre+Manual
Thank you for your consideration and contributions.
Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division
8 years, 9 months
Registration Open - China LUG Conference
by OpenSFS Administration
Proud Gold Sponsor:
intel-logo
APAC LUG 2013
Beijing, China
October 15, 2013
Mark your calendar for October 15th and make plans to spend the day with the
leading developers and users of the Lustre file system. The China Lustre
User Group (LUG) conference will be held at the
<http://www.thelakeviewhotel.com.cn/en/info.html> Lakeview Hotel, located in
the heart of Zhongguancun. Regarded as the "Silicon Valley of China" and
close to leading universities and the Olympic Center, the Lakeview Hotel is
close to an array of mass transit options, making it the ideal venue for LUG
China 2013.
On behalf of the global Lustre community, OpenSFS has led the effort to
define the future of Lustre, the premier file system for high-performance
computing. Join OpenSFS and the worldwide Lustre community to learn how
Lustre is being used to solve today's most demanding and important storage
challenges.
Registration - Now Open
The LUG China event is free to attend but you must pre-register to attend.
Don't miss the opportunity to hear the latest Lustre release updates,
overview of the Lustre Test Infrastructure, details on the Exascale I/O and
Fastforward product, and related vendor updates.
<https://opensfs.wufoo.com/forms/z7x4m1/> Register now!
Technical Sessions - Call for abstracts is now open
LUG events are ideal opportunities to give a technical presentation about
how you're using the Lustre file system. We encourage you to submit a brief
abstract that describes the topic you'd like to present. Your presentation
should be 30 minutes long and showcase what you've learned or how you're
using Lustre. Presentation opportunities are limited, and nearly all of the
technical sessions will be presented in Mandarin Chinese.
The agenda for this event is being finalized based on the abstracts
received, but we invite you to take a look at the list of quality technical
sessions from <http://www.opensfs.org/past-events/> past LUG events and
consider submitting an abstract. Submit your abstract, topic title and your
contact information to <mailto:admin@opensfs.org> admin(a)opensfs.org.
Panel Discussions
Panel discussions are the ideal forum for hearing from leading developers,
vendors, and users of Lustre as they debate future requirements, explore
upcoming enhancements, and share real world best practices. If you'd like to
suggest a topic for a panel discussion - or be part of a panel, please
contact <mailto:panelist@opensfs.org> admin(a)opensfs.org.
Sponsorship Opportunities
If your company or institution is interested in being a sponsor, OpenSFS has
a number of sponsorship packages available. More details about sponsorship
opportunities can be found on the <http://www.opensfs.org/apac-lug-2013/>
APAC LUG event web page.
Together with leading institutions and vendors, these Lustre User Group
events are the ideal opportunity for storage managers and IT directors to
gather and share their expertise and experiences. We look forward to having
you as our guest at the China Lustre User Group event.
Open Scalable File Systems, Inc. is a strong and growing nonprofit
organization dedicated to the success of the Lustre file system. OpenSFS was
founded in 2010 to advance Lustre, ensuring it remains vendor-neutral, open,
and <http://lustre.opensfs.org/download-lustre/> free. Since its inception,
OpenSFS has been responsible for advancing Lustre and delivering
<http://lustre.opensfs.org/community-lustre-roadmap/> new releases on behalf
of the open source community. Through working groups, events, and ongoing
funding initiatives, <http://www.opensfs.org/> OpenSFS harnesses the power
of collaborative development to fuel innovation and growth of Lustre
worldwide.
_________________________
OpenSFS Administration
3855 SW 153rd Drive Beaverton, OR 97006 USA
Phone: +1 503-619-0561 | Fax: +1 503-644-6708
Twitter: <https://twitter.com/opensfs> @OpenSFS
Email: <mailto:admin@opensfs.org> admin(a)opensfs.org | Website:
<http://www.opensfs.org> www.opensfs.org
8 years, 9 months
Strange issue with MDS server (lustre 1.8.8)
by Kurt Strosahl
All,
I'm encountering some odd behavior with my lustre file system. It looks like the MDT is briefly losing contact with OSTs (this does not seem confined to a specific OSS, and I've found some OSSs that have OSTs that don't seem to be "flickering"). The server mounting the MDT (and the MDS) is showing an excpetionally high load, and the lustre file system itself still appears responsive to clients (I can write to it, lfs df / df seem to be working properly). A further piece of the puzzle is that several OSTs that were active are now showing as inactive from the mdt (via the lctl dl command).
I'm running lustre 1.8.8 on all of the lustre file servers with Centos 5.3 / Centos 5.5 (problem occurs on 5.3 and 5.5 servers). We are also using an IB network to push the data around.
Thank you,
Kurt J. Strosahl
System Administrator
Scientific Computing Group, Thomas Jefferson National Accelerator Facility
8 years, 9 months
Trouble to migrate files from deactivated ost
by Arman Khalatyan
Hello,
We are removing one ost from the lustre 2.4.1.
Only few files are written on that OST without any striping.
First we dectivate on MDS
lctl --device 15 deactivate
then on client:
lfs_migrate -y
/lustre/arm2arm/Projects/EFRE-TESTS/RAID6TEST/arman-io-stresstest/stresstest/run_test.sh
/lustre/arm2arm/Projects/EFRE-TESTS/RAID6TEST/arman-io-stresstest/stresstest/run_test.sh:
cannot swap layouts between
/lustre/arm2arm/Projects/EFRE-TESTS/RAID6TEST/arman-io-stresstest/stresstest/run_test.sh
and a volatile file (Operation not permitted)
error: migrate: migrate stripe file
'/lustre/arm2arm/Projects/EFRE-TESTS/RAID6TEST/arman-io-stresstest/stresstest/run_test.sh'
failed
What does it mean "a volatile file"?
Thanks,
Arman.
8 years, 9 months