The LUSTRE manual provides minimal High Availability guidance.  Mostly focused on full server failure of MDS or OSS servers.
Has anyone worked through a process to rapidly identify and recover from a LUSTRE service failure where a server doesn't fail but the LUSTRE service process hangs?
 
 
---------------------------------------------------------------
Truth 23. Your common sense is not always someone else's common sense. Don't assume that just because it's obvious to you, it will be obvious to others.
---------------------------------------------------------------
Gary K Sweely, 301-763-5532, Cell 301-651-2481
SAN and Storage Systems Manager
US Census, Bowie Computer Center

Paper Mail to:
US Census, CSVD-BCC
Washington DC 20233

Office Physical or Delivery Address
17101 Melford Blvd
Bowie MD, 20715


-----Gary K Sweely/CSVD/HQ/BOC wrote: -----
To: hpdd-discuss@lists.01.org
From: Gary K Sweely/CSVD/HQ/BOC
Date: 05/28/2013 11:15AM
Cc: James M Lessard/CSVD/HQ/BOC@BOC, Raymond Illian/CSVD/HQ/BOC@BOC, "Prasad Surampudi" <prasad.surampudi@theatsgroup.com>, "Chris Churchey" <churchey@theatsgroup.com>, "Holtz, JohnX" <johnx.holtz@intel.com>
Subject: Anyone Backing up a Large LUSTRE file systems, any issues

Has anyone identified issues with backing up and restoring a large LUSTRE file system.
 
We want to be able to backup the file system and restore both individual files, and the full file system.
Has anyone identified specific issues with backup and restore of the LUSTRE file system.
Backup needs to run while users are accessing and writing files to the file system.
 
Backup concern: 
  1. How does it handle backup of data spread across multiple OST/OSS's yet maintain consistency of the file segments?
  2. Will backup system require backup media service pulling data over Ethernet, or can the OSS's do direct backup and restore of EXT4 file systems for full system backup/restores while maintaining consistency of the files spread across OSTs?
  3. Is there a specific backup product used to solve some of the file consistency issues?

We would be using a large tape drive library cluster that can strip the backup across multiple tape drives to improve backup media performance.  This would most likely mean having several systems running backup concurrently to multiple tape drive strip sets.  I expect we would need to break the LUSTRE file systems into several backup segments running concurrently, which would also mean several independent restores to restore the whole system. But one major requirement is being able to restore a single file or directory when needed.

Backup windows would be 8-14 hours.
RTO of single file would need to be under 1 hour.
RTO of full file system would be 4 days.
RPO is one day's worth of project data, 1 week's worth of source data.
 
 
We are considering a LUSTRE environment as follows;
 
30TB-50TB source data, potentially will grow out to about 200TB.
100TB to 500TB Project workspace.
30TB of user Scratch space (does not need to be backed up).
 
Initial total capacity 170TB growing to max size of 1PB.
 
Most likely initially using 2TB OST's, across 11+ OSS's.  May user larger OST's if no issues found in services/supportability/throughput.
 
We were thinking of breaking the total space into separate file systems to allow using multiple MDS/MDT's for improving performance of the MDS's, which would also facilitate easier full LUSTRE file system backup/restores.  But this means loosing the flexibility of having one large file system.
 
OSTs using EXT4 or XFS file systems.
 
About 25 Dedicated Clients servers with 20 to 40 CPU cores and 200GB-1TB RAM running scheduled batch compute jobs.  Grows as loads dictate.
Potentially add about 10-100 VMware Virtual client compute servers running batch jobs. (4 or 8 cores with 8 to 32GB ram).
About 2-5 interactive user nodes, nodes added as load needs dictate.
 
 
---------------------------------------------------------------
Truth 23. Your common sense is not always someone else's common sense. Don't assume that just because it's obvious to you, it will be obvious to others.
---------------------------------------------------------------
Gary K Sweely, 301-763-5532, Cell 301-651-2481
SAN and Storage Systems Manager
US Census, Bowie Computer Center

Paper Mail to:
US Census, CSVD-BCC
Washington DC 20233

Office Physical or Delivery Address
17101 Melford Blvd
Bowie MD, 20715