Data replication means same data is stored on multiple storage devices. In some situations, having duplicate databases is useful, such as in a high-availability environment where spreading the workload among identical databases in different hardware or even data centers can preserve functionality during peak usage times or disasters.
Replication can be active or passive:
- Active replication is performed by recreating and storing the same data at every replica from every other replica.
- Passive replication involves recreating and storing data on a single primary replica and then transforming its resultant state to other secondary replicas.
Replication has two dimensions of scaling:
- Horizontal data scaling has more data replicas.
- Vertical data scaling has data replicas located further away in distance geographically.
Multi-master replication, where updates can be submitted to any database node and then ripple through to other servers, is often desired, but increases complexity and cost.
Replication transparency occurs when data is replicated between database servers so that the information remains consistent throughout the database system and users cannot tell or even know which database copy they are using.
The two primary replication patterns are mirroring and log shipping (see this Figure).
- In mirroring, updates to the primary database are replicated immediately (relatively speaking) to the secondary database, as part of a two-phase commit process.
- In log shipping, a secondary server receives and applies copies of the primary database’s transaction logs at regular intervals.
The choice of replication method depends on how critical the data is, and how important it is that failover to the secondary server be immediate. Mirroring is usually a more expensive option than log shipping. For one secondary server, mirroring is effective; log shipping may be used to update additional secondary servers.