5.1.1.1.2 Procedures in place to monitor and receive notifications Copy URL

The repository shall have procedures in place to monitor and receive notifications when hardware technology changes are needed.

This is necessary to ensure expected, contracted, secure, and persistent levels of service.

Audits of capacity versus actual usage; audits of observed error rates; audits of performance bottlenecks that limit ability to meet user community access requirements; documentation of technology watch assessments; documentation of technology updates from vendors.

The repository should conduct or contract frequent environmental scans regarding hardware status, sources of failure, and interoperability among hardware components. The repository should also be in contact with its hardware vendors regarding technology updates, points of likely failure, and how new components may affect system integration and performance. The objective is to track when changes in service requirements by the designated communities require a corresponding change in the hardware technology, when changes in ingestion policies require expanded capabilities, and when changes in preservation policies require new preservation capabilities. This can be driven by changes in capacity requirements (the time needed to read all media is longer than the media lifetime), by changes in delivery mechanisms (new clients for displaying authentic records), and changes in the number and size of archived records.

As with monitoring, there is a multi pronged approach for APTrust to receive notifications across the environment. 

The first one is the offboarding of datacenter and physical hardware monitoring to AWS as well as the logical hardware (IaaS) by migrating to a platform solution (PaaS). As defined in the AWS Shared Responsibility model AWS is now responsible for both:

 AWS is responsible for protecting the infrastructure that runs all of the services offered in the AWS Cloud. This infrastructure is composed of the hardware, software, networking, and facilities that run AWS Cloud services.”

“For abstracted services, …  AWS operates the infrastructure layer, the operating system, and platforms, and customers access the endpoints to store and retrieve data.

 

AWS provides alerts for the logical resources configured and managed by APTrust when the underlying physical infrastructure requires maintenance or patching. Currently minor ones are updated automatically, major updates are tested before manually updating.

 

“A new ElastiCache service update, elasticache-redis-6-2-6-update-20230109, is now available for your ElastiCache cluster(s). Service updates improve the security, reliability, and operational performance of your ElastiCache nodes and can be applied using ElastiCache console, API or AWS CLI.

 

For more information on such updates including risk assessment, cluster impact analysis and whether to apply them or not, see ElastiCache Service Updates FAQs https://aws.amazon.com/elasticache/elasticache-maintenance/.

 

Applicable ElastiCache cluster name(s) for this service update:

 

……..

 

Service Update Summary:

Service Update Name: elasticache-redis-6-2-6-update-20230109, Severity: medium, Update Type: engine-update, AWS Recommended Apply By Date: 2023-03-18 07:59:59 UTC, Auto-Update after Due Date: no, AWS Region: us-east-1”

 

Additionally, for APTrust managed resources, such as the containers, there are custom alerts configured for health issues, and in many cases self healing automated responses in place to correct failures. 

 

You are receiving this email because your Amazon CloudWatch Alarm “mid-tier-prod-RDSAlarmCPUHigh-UBN3S8J42LNZ” in the US East (N. Virginia) region has entered the OK state, because “Threshold Crossed: 2 out of the last 3 datapoints [32.34166666666667 (15/11/22 20:07:00), 41.041666666666664 (15/11/22 20:06:00)] were not greater than the threshold (90.0) (minimum 2 datapoints for ALARM -> OK transition).” at “Tuesday 15 November, 2022 20:09:01 UTC”.

 

View this alarm in the AWS Management Console:

https://us-east-1.console.aws.amazon.com/cloudwatch/deeplink.js?region=us-east-1#alarmsV2:alarm/mid-tier-prod-RDSAlarmCPUHigh-UBN3S8J42LNZ

 

Alarm Details:

– Name:                       mid-tier-prod-RDSAlarmCPUHigh-UBN3S8J42LNZ

– Description:                An alert for when the RDS CPU usage is very high for multiple periods.

– State Change:               ALARM -> OK

– Reason for State Change:    Threshold Crossed: 2 out of the last 3 datapoints [32.34166666666667 (15/11/22 20:07:00), 41.041666666666664 (15/11/22 20:06:00)] were not greater than the threshold (90.0) (minimum 2 datapoints for ALARM -> OK transition).

– Timestamp:                  Tuesday 15 November, 2022 20:09:01 UTC

– AWS Account:                997427182289

– Alarm Arn:                  arn:aws:cloudwatch:us-east-1:997427182289:alarm:mid-tier-prod-RDSAlarmCPUHigh-UBN3S8J42LNZ