Information security management system

Effortless Admin's security policies and procedures

Print

​12​ Disaster Recovery Policy

The EA Contingency Plan establishes procedures to recover EA following a disruption resulting from a disaster. This Disaster Recovery Policy is maintained by the EA Security Officer and Privacy Officer.

The following objectives have been established for this plan:

  1. Maximize the effectiveness of contingency operations through an established plan that consists of the following phases:
    • Notification/Activation phase to detect and assess damage and to activate the plan;
    • Recovery phase to restore temporary IT operations and recover damage done to the original system;
    • Reconstitution phase to restore IT system processing capabilities to normal operations.
  2. Identify the activities, resources, and procedures needed to carry out EA processing requirements during prolonged interruptions to normal operations.
  3. Identify and define the impact of interruptions to EA systems.
  4. Assign responsibilities to designated personnel and provide guidance for recovering EA during prolonged periods of interruption to normal operations.
  5. Ensure coordination with other EA staff who will participate in the contingency planning strategies.
  6. Ensure coordination with external points of contact and vendors who will participate in the contingency planning strategies.

Example of the types of disasters that would initiate this plan are natural disaster, political disturbances, man made disaster, external human threats, internal malicious activities.

EA defined two categories of systems from a disaster recovery perspective.

  1. Critical Systems. These systems host application servers and database servers or are required for functioning of systems that host application servers and database servers. These systems, if unavailable, affect the integrity of data and must be restored, or have a process begun to restore them, immediately upon becoming unavailable.
  2. Non-critical Systems. These are all systems not considered critical by definition above. These systems, while they may affect the performance and overall security of critical systems, do not prevent Critical systems from functioning and being accessed appropriately. These systems are restored at a lower priority than critical systems.

​12.1​ Line of Succession

The following order of succession to ensure that decision-making authority for the EA Contingency Plan is uninterrupted. The Chief Technology Officer (CTO) is responsible for ensuring the safety of personnel and the execution of procedures documented within this EA Contingency Plan. If the CTO is unable to function as the overall authority or chooses to delegate this responsibility to a successor, the CEO shall function as that authority. To provide contact initiation should the contingency plan need to be initiated, please use the contact list below.

​12.2​ Responsibilities

The following teams have been developed and trained to respond to a contingency event affecting the IT system.

  1. The Ops Team is responsible for recovery of the EA hosted environment, network devices, and all servers. Members of the team include personnel who are also responsible for the daily operations and maintenance of EA. The team leader is the CTO and directs the Dev Ops Team.
  2. The Web Services Team is responsible for ensuring all application servers, web services, and platform features are working. It is also responsible for testing deployments and assessing damage to the environment. The team leader is the CTO and directs the Web Services Team.

Members of the Ops and Web Services teams must maintain local copies of contact information. Additionally, the CTO must maintain a local copy of this policy in the event Internet access is not available during a disaster scenario.

​12.3​ Testing and Maintenance

The CTO shall establish criteria for validation/testing of a Contingency Plan, an annual test schedule, and ensure implementation of the test. This process will also serve as training for personnel involved in the plan’s execution. At a minimum the Contingency Plan shall be tested annually (within 365 days). The types of validation/testing exercises include tabletop and technical testing. Contingency Plans for all application systems must be tested at a minimum using the tabletop testing process. However, if the application system Contingency Plan is included in the technical testing of their respective support systems that technical test will satisfy the annual requirement.

​12.3.1​ Tabletop Testing

The primary objective of the tabletop test is to ensure designated personnel are knowledgeable and capable of performing the notification/activation requirements and procedures in a timely manner. The exercises include, but are not limited to:

​12.3.2​ Technical Testing

The primary objective of the technical test is to ensure the communication processes and data storage and recovery processes can function at an alternate site to perform the functions and capabilities of the system within the designated requirements. Technical testing shall include, but is not limited to:

​12.4​ Disaster Recovery Procedures

​12.4.1​ Notification and Activation Phase

This phase addresses the initial actions taken to detect and assess damage inflicted by a disruption to EA. Based on the assessment of the Event, sometimes according to the EA Incident Response Policy, the Contingency Plan may be activated by either the CTO.

The notification sequence is listed below:

​12.4.2​ Recovery Phase

This section provides procedures for recovering the application at an alternate site, whereas other efforts are directed to repair damage to the original system and capabilities.

The following procedures are for recovering the EA infrastructure at the alternate site. Procedures are outlined per team required. Each procedure should be executed in the sequence it is presented to maintain efficient operations.

Recovery Goal: The goal is to rebuild EA infrastructure to a production state.

The tasks outlined below are not sequential and some can be run in parallel.

  1. Contact Partners and Customers affected - Web Services
  2. Assess damage to the environment - Web Services
  3. Begin replication of new environment - Dev Ops
  4. Test new environment using pre-written tests - Web Services
  5. Test logging, security, and alerting functionality - Dev Ops
  6. Assure systems are appropriately patched and up to date. - Dev Ops
  7. Deploy environment to production - Web Services
  8. Update DNS to new environment. - Dev Ops

​12.4.3​ Reconstitution Phase

This section discusses activities necessary for restoring EA operations at the original or new site. The goal is to restore full operations within 24 hours of a disaster or outage. When the hosted data center at the original or new site has been restored, EA operations at the alternate site may be transitioned back. The goal is to provide a seamless transition of operations from the alternate site to the computer center.

  1. Original or New Site Restoration
    • Begin replication of new environment - Dev Ops
    • Test new environment using pre-written tests. - Web Services
    • Test logging, security, and alerting functionality. - Dev Ops
    • Deploy environment to production - Web Services
    • Assure systems are appropriately patched and up to date. - Dev Ops
    • Update DNS to new environment. - Dev Ops
  2. Plan Deactivation
    • If the EA environment is moved back to the original site from the alternative site, all hardware used at the alternate site should be handled and disposed of according to the EA Media Disposal Policy.
< Previous Next >