Agreed. In addition to a file based backup strategy, capturing the MDTs with a device
level backup can protect against catastrophic loss of the MDT. In this situation,
restoring the MDT from backup and running consistency checks on the file system will be
far quicker than recreating the FS and implementing a full restore.
A lot depends on the criticality of the service being supported and the SLA for
operational availability of the platform. A strategy that is built on device level backup
of the MDTs along with file-based backup of the whole file system should provide sound
coverage. In addition, replication, properly implemented, represents the fastest time to
recovery in the context of a DR plan and can be useful in quickly rectifying mistakes in
production systems as well. The overhead is reduced overall capacity as well as the
additional processes required to fail-over and fail-back.
From: Dilger, Andreas
Sent: Wednesday, May 29, 2013 7:20 PM
To: Cowe, Malcolm J
Cc: gary.k.sweely(a)census.gov; hpdd-discuss(a)lists.01.org; Prasad Surampudi; Holtz, JohnX;
raymond.illian(a)census.gov; Chris Churchey; james.m.lessard(a)census.gov
Subject: Re: [HPDD-discuss] Anyone Backing up a Large LUSTRE file systems, any issues
I'd still also recommend a device level backup (using "dd", preferably of a
snapshot) for the MDT filesystem. This is absolutely critical information, and
backup/restore using "dd" is much more efficient than file-level backups, and
not unreasonable given the relatively small size of the MDT compared to the total
On 2013-05-28, at 17:27, "Cowe, Malcolm J"
I would recommend a file-based backup strategy where the backup processes run on Lustre
clients that are connected to the backup infrastructure. In fact this is the only
realistic way to be able to provide targeted restores of files/directories. We quite often
see data management or mover nodes in HPC architectures - servers on the boundary of the
cluster that can interface with external data systems such as tape libraries, either over
a network or fibre channel. By managing the backups like this, there is no need to
interface directly with the OSTs or MDTs and most if not all backup applications will work
perfectly well on the data management Lustre client.
One might also want to consider an online duplicate of the most critical data by syncing
to a separate lustre fs, since restore time from a tape vault can be considerable for a
large volume of data. Several strategies exist, depending on requirements and the
applications in use.
Malcolm Cowe, Systems Engineer
Intel High Performance Data Division
+61 408 573 001
[mailto:firstname.lastname@example.org] On Behalf Of
Sent: Wednesday, May 29, 2013 1:16 AM
Cc: Prasad Surampudi; Chris Churchey; Holtz, JohnX;
Subject: [HPDD-discuss] Anyone Backing up a Large LUSTRE file systems, any issues
Has anyone identified issues with backing up and restoring a large LUSTRE file system.
We want to be able to backup the file system and restore both individual files, and the
full file system.
Has anyone identified specific issues with backup and restore of the LUSTRE file system.
Backup needs to run while users are accessing and writing files to the file system.
1. How does it handle backup of data spread across multiple OST/OSS's yet maintain
consistency of the file segments?
2. Will backup system require backup media service pulling data over Ethernet, or can
the OSS's do direct backup and restore of EXT4 file systems for full system
backup/restores while maintaining consistency of the files spread across OSTs?
3. Is there a specific backup product used to solve some of the file consistency
We would be using a large tape drive library cluster that can strip the backup across
multiple tape drives to improve backup media performance. This would most likely mean
having several systems running backup concurrently to multiple tape drive strip sets. I
expect we would need to break the LUSTRE file systems into several backup segments running
concurrently, which would also mean several independent restores to restore the whole
system. But one major requirement is being able to restore a single file or directory when
Backup windows would be 8-14 hours.
RTO of single file would need to be under 1 hour.
RTO of full file system would be 4 days.
RPO is one day's worth of project data, 1 week's worth of source data.
We are considering a LUSTRE environment as follows;
30TB-50TB source data, potentially will grow out to about 200TB.
100TB to 500TB Project workspace.
30TB of user Scratch space (does not need to be backed up).
Initial total capacity 170TB growing to max size of 1PB.
Most likely initially using 2TB OST's, across 11+ OSS's. May user larger
OST's if no issues found in services/supportability/throughput.
We were thinking of breaking the total space into separate file systems to allow using
multiple MDS/MDT's for improving performance of the MDS's, which would also
facilitate easier full LUSTRE file system backup/restores. But this means loosing the
flexibility of having one large file system.
OSTs using EXT4 or XFS file systems.
About 25 Dedicated Clients servers with 20 to 40 CPU cores and 200GB-1TB RAM running
scheduled batch compute jobs. Grows as loads dictate.
Potentially add about 10-100 VMware Virtual client compute servers running batch jobs. (4
or 8 cores with 8 to 32GB ram).
About 2-5 interactive user nodes, nodes added as load needs dictate.
Truth 23. Your common sense is not always someone else's common sense. Don't
assume that just because it's obvious to you, it will be obvious to others.
Gary K Sweely, 301-763-5532, Cell 301-651-2481
SAN and Storage Systems Manager
US Census, Bowie Computer Center
Paper Mail to:
US Census, CSVD-BCC
Washington DC 20233
Office Physical or Delivery Address
17101 Melford Blvd
Bowie MD, 20715
HPDD-discuss mailing list