Very Large Database (VLDB)
Backup & Recovery Best Practices
An Oracle White Paper
July 2008
Very Large Database Backup & Recovery Best Practices
Introduction ....................................................................................................... 4
Key Database Protection Requirements ........................................................ 4
Determine Recovery Requirements................................................................ 4
Analyze and Identify the Cost of Unavailable Data ................................ 4
Assess Tolerance for Data Loss ................................................................. 5
Assess Total Recovery Time ....................................................................... 5
Determine Recovery Window..................................................................... 5
Architect Backup Environment ...................................................................... 5
Determine Backup Destination: Disk, Tape, Virtual Tape Library....... 6
Evaluate Backup Solutions .......................................................................... 6
Oracle Recovery Manager ....................................................................... 7
Oracle Secure Backup.............................................................................. 7
Oracle Flashback Technologies ............................................................. 7
Oracle Data Guard................................................................................... 7
Third Party Snapshot, Split Mirror, Continuous Data Protection
Solutions .................................................................................................... 8
Plan Data Layout .......................................................................................... 9
Read-only Tablespaces ............................................................................ 9
Table Compression .................................................................................. 9
Develop Backup Strategy........................................................................... 10
NOLOGGING Considerations .......................................................... 12
Other Backup Strategy Considerations.................................................... 12
Multi-cycle Full Backup......................................................................... 12
Tablespace-level Backup ....................................................................... 13
Multi-section Backup............................................................................. 13
Backup Compression............................................................................. 14
Guaranteed Restore Points ................................................................... 14
Develop Recovery Strategies..................................................................... 15
Media Failure .......................................................................................... 15
Block Corruption ................................................................................... 15
User Error ............................................................................................... 16
Disaster Recovery................................................................................... 16
VLDB Standby Database Creation............................................................... 16
Starbucks VLDB Backup & Recovery Case Study..................................... 16
Background.................................................................................................. 17
Backup Solution .......................................................................................... 20
Conclusion........................................................................................................ 23
Very Large Database Backup & Recovery Best Practices Page 2
References ........................................................................................................ 24
Appendix A - Starbucks RMAN Scripts ...................................................... 25
Daily_incr_bkup_disk – Daily Backup to FRA...................... 25
Daily_full_bkup_tape – Daily Backup of Incremental and
Archived Log Backups to Tape ................................................................ 26
Weekly_full_bkup_tape – Weekly backup of FRA to Tape . 26
Monthly_full_bkup_disk – Monthly Full Backup to Disk ... 26
Very Large Database Backup & Recovery Best Practices Page 3
Very Large Database Backup & Recovery Best Practices
INTRODUCTION
Over the years, the definition of a Very Large Database (VLDB) has radically
changed – whereas 100 GB might have been considered very large 20 years ago, in
today’s environment, a new database deployment in a large company might start at 1
TB. In fact, it is not uncommon to find databases in the tens to hundreds of TB’s
and even PB’s, serving traditional data warehouse and increasingly, OLTP activities
[1]. The dramatic rise in database sizes, particularly over the last few years. imposes
unique challenges for database administrators, who are expected to protect large
volumes of data in a timeframe that is ever shrinking due to stringent service level
requirements. This paper outlines best practices for meeting the challenges
associated with backing up and recovering VLDBs to protect against media
corruptions, data failures, as well as human and application errors. Finally,
Starbucks shares its hands-on experiences in the design and implementation of a
VLDB backup and recovery strategy.
KEY DATABASE PROTECTION REQUIREMENTS
The key VLDB backup and recovery questions which this paper addresses are:
• How to backup/recover in a satisfactory timeframe?
• How to recover at an application or business object level?
• How to protect data in 24x7 environment?
• How to manage software, infrastructure, operational costs for data
protection/disaster recovery?
With these questions in mind, an outline for thinking through backup and recovery
design and deployment is now presented.
DETERMINE RECOVERY REQUIREMENTS
A backup environment and strategy serves no good unless it can fulfill the stated
recovery requirements. So, the first task is to assess these requirements.
Analyze and Identify the Cost of Unavailable Data
What is the business impact of lost revenue and productivity due to unavailable
databases or portions of the database? Quantifying these costs upfront paves the
Very Large Database Backup & Recovery Best Practices Page 4
way for justification of appropriate hardware, storage, and software expenditures to
meet business-mandated service levels.
Assess Tolerance for Data Loss
Recovery Point Objective (RPO) is the degree to which data loss can be tolerated,
e.g. 24 hours, 1 hour, zero data loss. Lower tolerance for data loss will require more
frequent backups and/or additional hardware and storage infrastructure. For
example, if zero data loss is required, relying on more frequent tape backups will
not be sufficient; instead, solutions such as Oracle Data Guard should be
considered.
Are some database objects (e.g. tables, tablespaces) less critical than others? If so,
these objects could potentially be backed up less frequently than others, thus
reducing overall backup time and storage.
Is point-in-time recovery required? If so, the database needs to be in archived log
mode, and additional space on production storage and backup media must be
provisioned for these logs and their backups.
The responses here play a key role in dictating requirements for the backup
environment and strategy.
Assess Total Recovery Time
Recovery Time Objective (RTO) is the acceptable recovery time for the whole
database and for subsets of data (if applicable). Note that total recovery time not
only includes restoring backups followed by media recovery, but also includes time
to identify the problem and plan appropriate recovery, in addition to any hardware,
storage, and Oracle software restoration time.
To achieve very low recovery time objectives for VLDBs, relying on traditional tape
backups will not suffice. Therefore, solutions such as disk backup and Oracle Data
Guard should be considered, along with their required hardware and storage
infrastructure. Again, the cost to the business, while data is unavailable, will drive
justification of additional infrastructure.
Determine Recovery Window
A recovery window ensures that a sufficient number of backups are maintained to
provide point-in-time recovery within the specified timeframe. The recovery
window is typically set with a backup retention policy. The longer that backups
must be retained with a regular backup schedule, the more disk and/or tape will
typically be needed.
ARCHITECT BACKUP ENVIRONMENT
At this point, the documented recovery requirements now drive the decision on a
backup infrastructure (hardware, storage) and backup strategy (tools, procedure). If
the optimal infrastructure and strategy are not accomodated by the available budget,
Very Large Database Backup & Recovery Best Practices Page 5
the solutions can be iteratively scaled back and reviewed, assuming that any risks to
recovery requirements are communicated to the business owners.
Determine Backup Destination: Disk, Tape, Virtual Tape Library
Backups are commonly made to disk, tape, and/or virtual tape library (VTL). VTL
allows backups to be written to a storage array via a standard tape interface; thus,
one can achieve the speed and reliability of disk backup, without having to change
the existing tape backup interface or procedure. Backups on VTL are then migrated
to tape for long-term archival by the media manager. A high-level comparison of
these options follows:
Disk Tape Library Virtual Tape Library (VTL)
Compatibility,
Provisioning
- OS, drivers,
storage must be
compatible
- Requires ongoing
capacity monitoring
- Well-known
interface across
heterogeneous
systems
- Tapes easily
added, as needed
- Emulates
standard tape
devices
- Reduced
administrative
complexity versus
disk
Performance - Fast, random I/O
access
- Slower, sequential
access
- Fast, random I/O
access
Disaster
Recovery
- Optional block-
level mirroring
- Offsite tapes for
long-term archival
and/or DR
- Optional file-
based
replication with
deduplication
Cost
- Price/capacity
starts at few
dollars/GB (ATA)
- Best
price/capacity, e.g.
LTO-3 ($60 list)
holds 800 GB
compressed
- VTL license +
disk cost
(appliance-based)
Figure 1. Comparison of Disk, Tape Library, and Virtual Tape Library
As can be seen, disk and VTL offer the best performance due to inherent random
I/O access, but at a higher cost. Whether the higher cost can be justified is again
dictated by the business-mandated service levels and available budget. For example,
rather than deploying VTL backups for all VLDBs, which might be cost
prohibitive, prioritize each database’s criticality and leverage VTL for the most
critical ones and tape for all others. Another method to reduce overall disk cost, is
to use both disk and tape backup for the most critical tablespaces and use only tape
backup for all others.
Evaluate Backup Solutions
In tandem with selecting the appropriate backup infrastructure, various backup
solutions should be evaluated, both in terms of functionality and cost. The
following details a few of the most popular solutions.
Very Large Database Backup & Recovery Best Practices Page 6
Oracle Recovery Manager
Recovery Manager (RMAN) is the Oracle-native solution for creating and managing
physical database backups and automating database restore and recovery [2].
RMAN offers multiplexed, parallel, block-validated full and incremental backups. It
supports data file, tablespace, and database backup and recovery, in addition to
block media recovery. With Oracle Database 10g, RMAN can leverage an Oracle-
aware disk backup strategy with the Flash Recovery Area, a disk location where all
recovery-related files (e.g. backups, archived logs) are stored and managed by
Oracle (e.g. automatic deletion of obsolete files and files backed up to tape, when
additional space is required). Backups can be taken seamlessly in a RAC
environment and/or Data Guard, where backup operations can be offloaded to
specific nodes or physical standby databases. RMAN is integrated with Oracle
Secure Backup for tape backup and supports all leading 3rd party tape backup
products.
Oracle Secure Backup
Oracle Secure Backup (OSB) provides low-cost, centralized tape backup
management for database & file systems [3]. Through native integration with
RMAN, OSB offers the fastest database backup to tape on the market via
elimination of unused blocks and committed undo during backup. OSB is
affordably priced at just $3,500/tape drive with no additional component or option
costs.
Oracle Flashback Technologies
Flashback Technologies is a broad suite of fast user error analysis and correction
tools at the row, transaction, table, and database level [4]. Flashback operates by
undoing changes to the specific data, relative to the desired point-in-time versus
performing traditional restore and recovery from disk or tape backup, which can
take several hours or even days for a VLDB. Flashback Database provides built-in,
optimized Continuous Data Protection (CDP) for the database, similar to a fast
‘rewind’ capability. Other features such as Flashback Table and Flashback
Transaction provide granular point-in-time recovery at the table level and
automated transaction backout.
Oracle Data Guard
Oracle Data Guard is software available with the database, that creates, maintains,
and monitors one or more standby databases to protect enterprise data from
failures, disasters, errors, and corruptions [5]. Data Guard maintains these standby
databases as synchronized copies of the production database. These standby
databases can be located at remote disaster recovery sites thousands of miles away
from the production data center, or they may be located in the same city, same
campus, or even in the same building. If the production database becomes
unavailable because of a planned or an unplanned outage, Data Guard can switch
Very Large Database Backup & Recovery Best Practices Page 7
any standby database to the production role, thus minimizing the downtime
associated with the outage, and preventing any data loss.
Third Party Snapshot, Split Mirror, Continuous Data Protection Solutions
Third party storage solutions are generally not aware of Oracle block formats and
structures, so cannot provide the granularity of recovery as the aforementioned
Oracle technologies (e.g. recovery of individual tables). However, being optimized
for storage-native operations, these solutions can provide fast ‘backup’ methods,
albeit with additional licensing and storage costs.
The most popular third party techniques are snapshot, split mirror, and continuous
data protection (CDP). Snapshot technology provides fast point-in-time copies of
the database using incremental storage block tracking; however, the availability of
the copy typically depends on the availability of the production array. For disaster
recovery purposes, the copy should be fully re-created on a separate array. Similarly,
disk mirroring technology allows a block-level mirrored volume to be quickly ‘split’
onto a separate array to create a point-in-time copy of the database. RMAN
backups can be configured on snapshot and split mirror copies, but requires
additional coordination with storage operations [6]. Supported backup, restore, and
recovery of third party snapshots is documented in Metalink Note 604683.1 [7].
Note that Oracle fully supports offloading backups to a physical standby database
and these backups can be used to restore any database in the Data Guard
configuration [8] [9].
CDP solutions extend point-in-time snapshot capabilities to a continuous snapshot,
for fast recovery to any point-in-time within a specified time period. Similar to
snapshots, CDP leverages copy-on-write or allocate-on-write technology to track
changed blocks.
Flashback Database offers distinct advantages over snapshot and CDP solutions
for the database. When a block changes occurs, snapshots typically require an
additional block read as a precursor to copying the storage block to the snapshot
volume. Flashback Database does not require this additional block read before it
copies a before-change block to the Flashback log. CDP solutions incur additional
I/O overhead to maintain a “change journal” of all database files (e.g. data file,
redo, control file), whereas Flashback Database only tracks data file block changes.
In addition, CDP solutions must track each and every storage block change,
whereas Flashback Database only logs an Oracle block change every 30 minutes,
regardless of the number of changes to the block – this optimization is only
possible with Oracle, where log-based recovery is leveraged during the Flashback
operation itself to reach the desired point-in-time within the 30 min interval. Again,
these space and I/O optimizations are only possible since Flashback Database is
fully Oracle-aware.
Very Large Database Backup & Recovery Best Practices Page 8
Plan Data Layout
Backup time and space usage can be optimized through efficient data layout, before
the first backup is ever taken. Two common techniques are read-only tablespaces
and table compression.
Read-only Tablespaces
The advantage of a read-only tablespace is that it only needs to be backed up once
within a defined tape recycling period, rather than with every backup job. If a data
warehouse contains five years of historical data and the first four years of data can
be made read-only, then a regular backup of the database will only backup 20% of
the data, until the read-only backup tapes are recycled. This can dramatically reduce
the amount of time required to back up the data warehouse.
Many data warehouses store their data in tables that have been range-partitioned by
time. For example, a retailer may accept returns up to 30 days beyond the date of
purchase, so sales records could change frequently during this period. However,
once data ages past a certain date after 30 days, it becomes relatively static.
By leveraging partitioning, users can make the static portions of their data read-
only. Consider the following processes to do so:
• Implement a regularly scheduled process to move partitions from a read-
write tablespace to a read-only tablespace when the data matures to the
point where it is entirely static.
• Or, create a series of tablespaces, each containing a small number of
partitions and regularly modify one tablespace from read-write to read-
only as the data in that tablespace ages.
Table Compression
Table compression is another method to reduce overall database size and
consequently, backup size. Table compression was first introduced in Oracle9i to
allow data to be compressed during bulk load operations such as direct path load,
CREATE TABLE AS SELECT operations, etc. This form of compression is
ideally suited for data warehousing environments where most data is loaded in the
database via batch processes. Additionally, there is no measurable degradation
when reading compressed data, as Oracle can read the compressed block directly
without requiring uncompression. With Oracle Database 11g, OLTP Table
Compression was introduced to further minimize the system impact due to updates
on compressed tables, through batch compression of the block following a series of
writes, versus havin