EMC RecoverPoint Architecture and Basic Concepts

This is my first blog on RecoverPoint; in this initial post I will detail some of the basic concepts and terminology around RecoverPoint and the GEN 5 hardware appliance specification.

•Overview
•Gen5 Hardware
•Terminology

Overview

RecoverPoint provides continuous data protection for storage arrays running on a dedicated appliance (RPA) allowing for the protection of data at both local and remote levels. RecoverPoint provides bi-directional replication enabling the recovery of data to any point in time while replicating data over any distance; within the same site (CDP), to another distant site (CRR), or both concurrently (CLR). Data transfer inside the same site is performed using fibre channel connectivity and for transfer between sites both FC and IP (WAN) is supported. Synchronous replication is supported when the remote sites are connected through FC and provides for a zero RPO. For a synchronous configuration the lag between the production and the remote is always zero since RecoverPoint does not acknowledge the write before it reaches the remote site. Asynchronous replication provides crash-consistent protection and recovery to specific points in time.

An example of a local Continuous Data Protection (CDP) solution:

From the above image you can see that the splitter sends a copy to the Production LUN and the RPA.The write is acknowledged by the LUN and the RPA. The RPA writes the data to the journal volume along with a time stamp and bookmark metadata.The data is then distributed to the local replica in a write-order-consistent manner. This means that if your consistency groups contains many LUNs, all the data being written is write-order consistent.

An example of a Continuous Remote Replication (CRR) solution:

If we examine the IO sequence of the CRR solution we can see again that the IO is split sending one copy to the production LUN and the other to the RPA. The Process as mentioned can be:

1. Asynchronous – In Asynchronous repl the write IO from the host is sent to the RPA. The RPA acks it as soon as data arrives into its memory.
2. Synchronous – In Sync mode no data is ack’d by the RPA until it reaches the memory of the DR’s RPA or DR persistent storage depending on whether the “measure lag to remote RPA” flag setting is enabled in the configuration. Sync replication can be run over FC or IP with the requirement that when using FC the latency limit does not exceed 4ms for a full round trip and for IP the latency does not exceed 10ms for a full round trip.

For a concurrent local and remote (CLR) solution, both CDP and CRR occur simultaneously to provide CLR.

The RecoverPoint family consists of three license offerings:
RecoverPoint/CL (Classic) for replicating across EMC Arrays and non-EMC storage platforms with the use of VPLEX. Note: capacity is ordered per RPA cluster not per RP system. Supports all EMC array splitters.
RecoverPoint/EX for VMAXe™, VPLEX™, VNX™ series, VNXe3200, CLARiiON® CX3 and CX4 series, XtremIO, ScaleIO and Celerra® unified storage environments.
RecoverPoint/SE for VNX series, VNXe 3200, CLARiiON CX3 and CX4 series, and Celerra unified storage environments.

Gen5 Hardware

The RecoverPoint appliance (RPA) is a 1u hardware based server (Intel R1000). The specification of the RPA is as follows:

• 2 x Quad Core Sandy Bridge Processors
• Two 300GB 10K RPM 2.5” SAS Drives in RAID1 configuration
• 6 x 1GE ports (RJ-45) WAN, LAN & Remote management + 3 ports are unused
• 16 Gig DDR3 Memory
• PCIe slot 1: Quad Port 8GB FC QLogic 2564 Card (PCIe slot 2 is empty)

From the image below you can see the port usage for WAN, LAN and the HBA Port Sequence (left to right) 3-2-1-0. For each RPA, we use two Ethernet cables to connect the Management (LAN) interface to eth1 and the WAN interface to eth0.

GEN5 RPA:

Note: RecoverPoint clusters must have a minimum of 2 RPAs and a maximum of 8 RPAs. Cluster sizes must be the same at each site of an installation. A RecoverPoint Environment can have up to 5 clusters either local or remote although RP/SE has a limit of two clusters. GEN4 & GEN5 RPAs can co-exist in the same RP cluster.

Terminology

Splitter – The function of the Array-based splitter is to ensure that the RPA receives a copy of each write to the protected LUN. In the Production site the function of the splitter is to split the IO’s so that both the RPA and the storage receive a copy of the write while maintaining write-order fidelity. In the DR site, the responsibility of the splitter is to block unexpected writes from hosts and support the various types of image accesses.

RecoverPoint Repository Volumes – are dedicated volumes on the SAN-attached storage at each site, one repository volume is required for each RPA cluster. The repository holds the configuration information about the RPAs and consistency groups. Repository volumes are only exposed to the RPAs. The minimum size for the repository is 2.86GB.

RecoverPoint Journal Volumes – are SAN-attached storage volume(s) for each copy that is used in a consistency group (the production copy, local replica copy, and remote replica copy). Again journal volumes are exposed only to the EMC RPAs, not to the hosts. There are two types of journal volumes:
1. Replica journals – used to hold snapshots that are either waiting to be distributed, or that have already been distributed to the replica storage. It also holds the meta-data for each image and bookmarks. The replica journal holds as many snapshots as its capacity allows.
2. Production journals – are used when there is a link failure between sites, in this situation marking information is then written to the production journal and synced to the replica when the link comes online. This process is known as delta marking (Marking Mode). The production journal does not contain snapshots used for PIT recovery. Note: Minimum size of journal volumes is 10GB for a standard consistency group and 40GB for a distributed consistency group.

Replication Set – a protected SAN-attached storage volume from the production site and its replica (local or remote) are known as a replication set.

Consistency Group – consists of replication sets grouped together to ensure write order consistency across all the replication sets’ primary volumes. A configuration change on a consistency group will apply to all its replication sets, such as changing compression and bandwidth limits on the group. A RecoverPoint system has a maximum limit of 128 CGs max per RP system and a max of 64 CGs per RPA, if an RPA in the cluster fails the CGs running on that RPA will fail over to another RPA in the cluster.

Distributed Consistency Group – in order to obtain higher throughput rates it is possible to configure the CG as a DCG which can use up to 4 RPAs (1 RPA is used per standard CG), you can configure a maximum of 8 DCGs. 128 CGs (CG&DCG) max per RP system.

Image Access – refers to providing host access to the replication volumes, while still keeping track of source changes. Image access can be physical (also known as logged), which provides access to the actual physical volumes, or virtual, with rapid access to a virtual image of the same volumes.

In the next RecoverPoint blog I will detail sizing and performance characteristics for the Journal and Replica volumes.

Posted on March 25, 2013 30 By David Ring RecoverPoint Posted in RecoverPoint, Uncategorized Tagged #EMC, RecoverPoint

David Ring

30 Comments »

raki
June 19, 2013 at 11:27 am Reply

Dave its really a clear information you have give in the blog . Can you provide more steps of creating RPA with screen shots.
- Dave Ring
  June 20, 2013 at 12:03 pm Reply
  
  Thank you for the Feedback. I can certainly do a Blog with screenshots of Deployment Manager setup
Shaibaz Inamdar
March 11, 2014 at 10:38 am Reply

Hi Dave..
It was really very nice to have such a clear & concise info, but I have query with respect to RPA maintenance, for instance at DR site there is a maintenance activity scheduled for 5-6 hours, so which parameter is required to ber verified at the Primary site, so that when the link is up, it will start replicating and there will not be a problem of any Data Lost..
as we are replicating using CDP solution.

Regards,
Shaibaz
Nisar
May 14, 2014 at 11:38 am Reply

Excellent Article!!!
Need details of the deployment which would be helpful
phenorhapsody
July 7, 2014 at 10:03 am Reply

Reblogged this on PhenoRhapsody.
srinath
August 20, 2014 at 5:12 pm Reply

Thanks a lot Dave…It was very helpful..I was really confused about the RPA operation but you made everything clear.

Best regards
Srinath
shashidhar D
January 29, 2015 at 4:01 am Reply

Thank you very much…
vijay
February 6, 2015 at 7:50 pm Reply

Dave it really good documents…
MK
February 10, 2015 at 6:56 am Reply

thank you for making it so simple.
Scott E
March 3, 2015 at 3:44 pm Reply

Useful post, thankyou.
One key question I am trying to determine. What metric on the RPAs should we monitor, to know when we should add an additional RPA into a cluster?
Chittaranjan
March 31, 2015 at 5:49 pm Reply

Very nice Dave… thanks. Can you please write a blog explaining the basic operations in RPA.. like adding luns to a CG, setting up replication, etc..

thanks!
Sachin
November 26, 2015 at 1:19 pm Reply

Nice..Doc.
Santosh Lohar
December 3, 2015 at 9:15 pm Reply

very useful information, thanks for the sharing
- David Ring
  December 3, 2015 at 10:16 pm Reply
  
  Thanks Santosh
Kenji Fujita
January 27, 2016 at 2:12 pm Reply

It’s very good information. thanks

In my experience, I think 200W or less of power consumption　on this hardware.
But our supplier says RPA G5 ‘s Maximum watts is over 800W!!!!

Specsheet also describes 750W!!

Click to access h2770-recoverpoint-ss.pdf

are they True? or not.
I cannot believe both infromation now
klk
April 20, 2016 at 1:30 pm Reply

its very useful david… In my environment the upcoming plan is that XIO are the backend storage arrays connected to VPLEX . From Vplex we have 4 RPA Clusters for replication to remote sites… could you please help me out with the steps carried.. I mean if we recieve a reuest to allocate 4 TB to a server… what are steps we need to follow from XIO-VPLEX -RPA…could you please help me out
shia
May 17, 2016 at 10:07 am Reply

Thanks Dave!
A nice blog on RP basics, explained in simple way.. It helped me to understand RP and related terms
- David Ring
  May 19, 2016 at 8:22 am Reply
  
  Glad to Hear! Thank you
  - WomanVista (@womanvista)
    September 13, 2016 at 3:16 pm
    
    Dave,
    Would you know the maximum number of replication sets that a cg can have? for Gen 5?
  - David Ring
    September 14, 2016 at 9:39 am
    
    Hi, Maximum number of replication sets per consistency group in RP 4.4 is 8192 (when using physical RPAs) 2048 (when using virtual RPAs), Regards David
Boyet
September 8, 2016 at 10:52 pm Reply

Which RecoverPoint/SE component is responsible for sending an acknowledgement to the production host during a write operation?
A. RecoverPoint appliance
B. Splitter
C. Storage System
D. Production Journal
- David Ring
  September 21, 2016 at 11:05 pm Reply
  
  Hello, As per the product guide for 4.4 :
  
  The write phase is the RecoverPoint replication phase in which host writes are intercepted by the splitter and received by the local RPA, prior to The transfer phase on page 80.
  Generally, the flow of data for write transactions is as follows:
  1. The production host writes data to the production volumes which is intercepted by the splitter. The splitter sends the write data to the RPA.
  2. Immediately upon receipt of the write data, the local RPA returns an ACK to the splitter.
  3. The splitter then writes the data to the production storage volume.
  4. The storage system returns an ACK to the splitter upon successfully writing the data to storage.
  5. The splitter sends an ACK to the host that the write has been completed successfully.
  The sequence of events 1-5 can be repeated multiple times, and in parallel, for multiple writes.
  Note
  The flow of data varies per splitter.
  - YB
    August 2, 2021 at 9:55 am
    
    For above sequence of write, it means it has chance in host side get longer response time from production volume, as it needs to wait for the local RPA’s ack to the splitter due to local RPA suffering high load?
Deb Nelson
December 6, 2016 at 7:46 pm Reply

Where do you find a serial # on the GEN5 RPA?
- David Ring
  December 6, 2016 at 10:23 pm Reply
  
  How to retrieve the RPA hardware serial number and ‘Gen’ version using the RecoverPoint CLI
  
  1. Establish an SSH session to the RPA.
  2. Using PuTTY log in as admin.
  3. Run the following command:
  
  get_box_states
Deeo
January 12, 2017 at 8:57 am Reply

What is different between image access log capacity(IALC) and journal capacity (JC) during test copy phase? since i have a situation where some test cg will increase at IALC and some increase at JC.

Another thing is once ive finished up my test which consumed lot at journal capacity and want to retry another test copy on same CG wolud take to much time to logged access to be open.
Tony Hoover
June 22, 2017 at 6:15 pm Reply

Hello,

Would you happen to know the Server Model this Appliance was built on? I know its a Dell Server, but I can’t seem to find the Model Number anywhere on the chassis.
Any help would be greatly appreciated.
- David Ring
  June 22, 2017 at 6:17 pm Reply
  
  Hi Tony latest is R610 https://davidring.ie/2016/09/19/recoverpoint-gen6-hardware-appliance/
Yasmin
October 25, 2018 at 10:30 am Reply

Hello,

I have a quite basic question regarding recovery with RP. How far back in time can we go with the any point-in-time concept with RecoverPoint?

Does it have a limitation to how many days/weeks we can recover an image from?

Thank you,
- David Ring
  November 3, 2018 at 10:56 pm Reply
  
  Hi Yasmin, the following post may help answer : https://davidring.ie/2013/04/04/recoverpoint-considerations-for-journal-sizing-protection-windows/
  
  thanks for reading
  David