The right notifications
Ideally, the benefits of a deployment automation pipeline mean that your project team focuses on feature delivery while your deployment pipeline reliably builds and deploys your solution. However, what happens when something inevitably goes wrong? When eSimplicity implements automation we look at two corresponding decisions. What requires notification and who should receive a notification? We strategically target notifications to do two things: (1) require a specific action, (2) identify the right team members. We do not want to blast all team members for all events because then all team members get in the habit of avoiding notifications. This is why we use tools like PagerDuty. Within these tools, we can subdivide the project team into smaller groups tied to specific team level responsibilities. These subdivided groups include notification methods (e.g., text, email, etc.) and specific response timelines before an escalation is required. We use tabletop exercises during the Innovation and Planning window to define an end-to-end project Service Level Agreements (SLAs).
Operational monitoring
When we implement operational monitoring, eSimplicity’s monitoring tools address infrastructure, cloud services, and application logging. We prefer using Splunk in the majority of our projects. For most off-the-shelf components and cloud services, those systems can easily integrate into Splunk or comparable systems. However, custom applications require a little more forethought. eSimplicity integrates our application logging with Splunk through a common set of open source logging libraries (Fluentd, Logstash, Syslog-ng, etc.). We are a huge proponent of logging frameworks, application logging patterns, and rules. We use these logging frameworks, patterns and rules to facilitate system event monitoring and error trapping. One last concern is Personal Identifying Information (PII) and Personal Health Information (PHI) from your application events. At eSimplicity, we address this concern with our logging patterns. We obfuscate PHI and PII within system events and mark those system events so they can be tracked end-to-end regardless of success or an error code.
Data analysis
Why do you look at your logging data? eSimplicity sets up our logging systems to analyze system triage, audits, and predictions. Delivering functionality for your end-users is normally only the first stage. Inevitably you will come across a situation where you must triage a system or user-based issue. Secondly, to maintain an authority to operate (ATO) in government agencies eSimplicity uses the logging systems to fulfill system audits and trigger automated alerts. We use our logging frameworks with application logging patterns for both of these scenarios. Our cloud-based projects often used container and server auto-scaling, but that may not cover all traffic patterns. We use our logging capabilities to optimize the system resources, prevent bottlenecks, and even predict costs. One way to improve the project’s total cost of ownership (TCO) is to use Splunk reports to identify the appropriate amount of resources. While we always prefer to use auto-scaling of containers of servers, having the right baseline is usually is even more cost-effective. Using Splunk reports along with cost management tools helps us to pick a baseline number of resources during off-peak and peak windows.