EMC RecoverPoint Architecture and Basic Concepts

This is my first blog on RecoverPoint; in this initial post I will detail some of the basic concepts and terminology around RecoverPoint and the GEN 5 hardware appliance specification.

•Overview
•Gen5 Hardware
•Terminology

Overview

RecoverPoint provides continuous data protection for storage arrays running on a dedicated appliance (RPA) allowing for the protection of data at both local and remote levels. RecoverPoint provides bi-directional replication enabling the recovery of data to any point in time while replicating data over any distance; within the same site (CDP), to another distant site (CRR), or both concurrently (CLR). Data transfer inside the same site is performed using fibre channel connectivity and for transfer between sites both FC and IP (WAN) is supported. Synchronous replication is supported when the remote sites are connected through FC and provides for a zero RPO. For a synchronous configuration the lag between the production and the remote is always zero since RecoverPoint does not acknowledge the write before it reaches the remote site. Asynchronous replication provides crash-consistent protection and recovery to specific points in time.

An example of a local Continuous Data Protection (CDP) solution:
Untitled2

From the above image you can see that the splitter sends a copy to the Production LUN and the RPA.The write is acknowledged by the LUN and the RPA. The RPA writes the data to the journal volume along with a time stamp and bookmark metadata.The data is then distributed to the local replica in a write-order-consistent manner. This means that if your consistency groups contains many LUNs, all the data being written is write-order consistent.

An example of a Continuous Remote Replication (CRR) solution:
Untitled

If we examine the IO sequence of the CRR solution we can see again that the IO is split sending one copy to the production LUN and the other to the RPA. The Process as mentioned can be:

1. Asynchronous – In Asynchronous repl the write IO from the host is sent to the RPA. The RPA acks it as soon as data arrives into its memory.
2. Synchronous – In Sync mode no data is ack’d by the RPA until it reaches the memory of the DR’s RPA or DR persistent storage depending on whether the “measure lag to remote RPA” flag setting is enabled in the configuration. Sync replication can be run over FC or IP with the requirement that when using FC the latency limit does not exceed 4ms for a full round trip and for IP the latency does not exceed 10ms for a full round trip.

For a concurrent local and remote (CLR) solution, both CDP and CRR occur simultaneously to provide CLR.

The RecoverPoint family consists of three license offerings:
RecoverPoint/CL (Classic) for replicating across EMC Arrays and non-EMC storage platforms with the use of VPLEX. Note: capacity is ordered per RPA cluster not per RP system. Supports all EMC array splitters.
RecoverPoint/EX for VMAXe™, VPLEX™, VNX™ series, VNXe3200, CLARiiON® CX3 and CX4 series, XtremIO, ScaleIO and Celerra® unified storage environments.
RecoverPoint/SE for VNX series, VNXe 3200, CLARiiON CX3 and CX4 series, and Celerra unified storage environments.

Gen5 Hardware

The RecoverPoint appliance (RPA) is a 1u hardware based server (Intel R1000). The specification of the RPA is as follows:

• 2 x Quad Core Sandy Bridge Processors
• Two 300GB 10K RPM 2.5” SAS Drives in RAID1 configuration
• 6 x 1GE ports (RJ-45) WAN, LAN & Remote management + 3 ports are unused
• 16 Gig DDR3 Memory
• PCIe slot 1: Quad Port 8GB FC QLogic 2564 Card (PCIe slot 2 is empty)

From the image below you can see the port usage for WAN, LAN and the HBA Port Sequence (left to right) 3-2-1-0. For each RPA, we use two Ethernet cables to connect the Management (LAN) interface to eth1 and the WAN interface to eth0.

GEN5 RPA:

RP_GEN5_Rear

Note: RecoverPoint clusters must have a minimum of 2 RPAs and a maximum of 8 RPAs. Cluster sizes must be the same at each site of an installation. A RecoverPoint Environment can have up to 5 clusters either local or remote although RP/SE has a limit of two clusters. GEN4 & GEN5 RPAs can co-exist in the same RP cluster.

Terminology

Splitter – The function of the Array-based splitter is to ensure that the RPA receives a copy of each write to the protected LUN. In the Production site the function of the splitter is to split the IO’s so that both the RPA and the storage receive a copy of the write while maintaining write-order fidelity. In the DR site, the responsibility of the splitter is to block unexpected writes from hosts and support the various types of image accesses.

RecoverPoint Repository Volumes – are dedicated volumes on the SAN-attached storage at each site, one repository volume is required for each RPA cluster. The repository holds the configuration information about the RPAs and consistency groups. Repository volumes are only exposed to the RPAs. The minimum size for the repository is 2.86GB.

RecoverPoint Journal Volumes – are SAN-attached storage volume(s) for each copy that is used in a consistency group (the production copy, local replica copy, and remote replica copy). Again journal volumes are exposed only to the EMC RPAs, not to the hosts. There are two types of journal volumes:
1. Replica journals – used to hold snapshots that are either waiting to be distributed, or that have already been distributed to the replica storage. It also holds the meta-data for each image and bookmarks. The replica journal holds as many snapshots as its capacity allows.
2. Production journals – are used when there is a link failure between sites, in this situation marking information is then written to the production journal and synced to the replica when the link comes online. This process is known as delta marking (Marking Mode). The production journal does not contain snapshots used for PIT recovery. Note: Minimum size of journal volumes is 10GB for a standard consistency group and 40GB for a distributed consistency group.

Replication Set – a protected SAN-attached storage volume from the production site and its replica (local or remote) are known as a replication set.

Consistency Group – consists of replication sets grouped together to ensure write order consistency across all the replication sets’ primary volumes. A configuration change on a consistency group will apply to all its replication sets, such as changing compression and bandwidth limits on the group. A RecoverPoint system has a maximum limit of 128 CGs max per RP system and a max of 64 CGs per RPA, if an RPA in the cluster fails the CGs running on that RPA will fail over to another RPA in the cluster.

Distributed Consistency Group – in order to obtain higher throughput rates it is possible to configure the CG as a DCG which can use up to 4 RPAs (1 RPA is used per standard CG), you can configure a maximum of 8 DCGs. 128 CGs (CG&DCG) max per RP system.

Image Access – refers to providing host access to the replication volumes, while still keeping track of source changes. Image access can be physical (also known as logged), which provides access to the actual physical volumes, or virtual, with rapid access to a virtual image of the same volumes.

In the next RecoverPoint blog I will detail sizing and performance characteristics for the Journal and Replica volumes.

23 thoughts on “EMC RecoverPoint Architecture and Basic Concepts

  1. Dave its really a clear information you have give in the blog . Can you provide more steps of creating RPA with screen shots.

  2. Hi Dave..
    It was really very nice to have such a clear & concise info, but I have query with respect to RPA maintenance, for instance at DR site there is a maintenance activity scheduled for 5-6 hours, so which parameter is required to ber verified at the Primary site, so that when the link is up, it will start replicating and there will not be a problem of any Data Lost..
    as we are replicating using CDP solution.

    Regards,
    Shaibaz

  3. Thanks a lot Dave…It was very helpful..I was really confused about the RPA operation but you made everything clear.

    Best regards
    Srinath

  4. Very nice Dave… thanks. Can you please write a blog explaining the basic operations in RPA.. like adding luns to a CG, setting up replication, etc..

    thanks!

  5. its very useful david… In my environment the upcoming plan is that XIO are the backend storage arrays connected to VPLEX . From Vplex we have 4 RPA Clusters for replication to remote sites… could you please help me out with the steps carried.. I mean if we recieve a reuest to allocate 4 TB to a server… what are steps we need to follow from XIO-VPLEX -RPA…could you please help me out

  6. Thanks Dave!
    A nice blog on RP basics, explained in simple way.. It helped me to understand RP and related terms

  7. Which RecoverPoint/SE component is responsible for sending an acknowledgement to the production host during a write operation?
    A. RecoverPoint appliance
    B. Splitter
    C. Storage System
    D. Production Journal

    • Hello, As per the product guide for 4.4 :

      The write phase is the RecoverPoint replication phase in which host writes are intercepted by the splitter and received by the local RPA, prior to The transfer phase on page 80.
      Generally, the flow of data for write transactions is as follows:
      1. The production host writes data to the production volumes which is intercepted by the splitter. The splitter sends the write data to the RPA.
      2. Immediately upon receipt of the write data, the local RPA returns an ACK to the splitter.
      3. The splitter then writes the data to the production storage volume.
      4. The storage system returns an ACK to the splitter upon successfully writing the data to storage.
      5. The splitter sends an ACK to the host that the write has been completed successfully.
      The sequence of events 1-5 can be repeated multiple times, and in parallel, for multiple writes.
      Note
      The flow of data varies per splitter.

    • How to retrieve the RPA hardware serial number and ‘Gen’ version using the RecoverPoint CLI

      1. Establish an SSH session to the RPA.
      2. Using PuTTY log in as admin.
      3. Run the following command:

      get_box_states

  8. What is different between image access log capacity(IALC) and journal capacity (JC) during test copy phase? since i have a situation where some test cg will increase at IALC and some increase at JC.

    Another thing is once ive finished up my test which consumed lot at journal capacity and want to retry another test copy on same CG wolud take to much time to logged access to be open.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s