DOF Production Support and Application Monitoring Support Team to support the below:
Core Infrastructure
-Creation and configuration of non-prod environments for all in scope applications.
-Implementation of the ELF for non-production environments
-Triage and resolution of non-prod environment related issues
-Deployment of application baselines to non-production environments for all in scope applications
-Certificate renewals
-Configuration and maintenance of the CI/CD pipelines
-Performance environment (PT WYN, PT DOR) creation, configuration, deployment and support.
-DB activates for non-prod: Install/Maintain application schemas, DB issue resolution, DB configuration, DB maintenance scripts
-GIT administration, Artifactory, SonarQube, ETL admin, UCD admin, UCD scripts, Jenkin configuration,Logstash, ELK, Jmeter
-On and off boarding of new users to access the non-prod environments.
-Implementation of the ELF for production environments
-Deployment of application baselines to production for all in scope applications
-Certificate renewals for prod environments
-MOP updates and reviews for all production deployments
-Configuration and maintenance of the CI/CD pipelines
-Production patching activities for in scope applications
-Production monitoring – (Liveliness Probe, BM worker node, DataGrid, SOSS, POD restarts)
-DB activities for Production: Install/Maintain application schemas, DB issue resolution, DB configuration, DB maintenance scripts, Optimization
-GIT administration, Artifactory, SonarQube, ETL admin, UCD admin, UCD scripts, Jenkin configuration, MDM server admin, Logstash, ELK, Jmeter
-On and off boarding of new users to access the non-prod environments.
AMS Resources
-Manage traffic diversion during deployments
-Validation of code deployment success via backdoor sanity in OM
-Post deployment health monitoring
-Hourly post deployment reporting
-Production patching activities for in scope applications
-Production monitoring – (Liveliness Probe, BM worker node, DataGrid, SOSS, POD restarts)
-Report on System Health Metrics using Dynatrace
-Monitor and action the alert using Bell Monitoring Tools (Dynatrace, BAM, Grafana)
-Monitor of DB server to verify through daily sanity check
-Verify Table Space status and warn if it?s reaching capacity
-Verify Disk Space status and warn if it?s reaching capacity
-Verify Memory and Processor usage and warn if it?s reaching capacity
Production Monitoring:
-Diagnosing and tracking Incidents and problems with Severity Critical (P1) and High (P2) through to Resolution
-Providing the required Production Logs or access to Production Logs to analyze the incidents.
-Provide the Root Cause Analysis for all Critical Incidents.
-Repairing data and associated work caused by invalid data where validation code does not exist or where a -documented Incident caused by a transaction results in failures.
-Providing workarounds for Critical and High Incidents
-Updating relevant system, configuration or process documentation.
-Document and promptly notify Bell of any emergency changes required.
-Participate in AMS Operations Governance meetings (assumed to be bi-weekly)
-Responding to Application-related questions, performing data extraction as required
-Handling ad-hoc requests from end users for information, queries, or reports.
-Providing holiday support coverage
-Performing peak period monitoring and reporting for specific critical applications
-Perform daily health checks for Critical applications.