Experienced and data-driven Site Reliability Engineering leader with a track record of building cross-functional, geodistributed DevOps teams. Skilled at defining and implementing SRE best practices, driving innovation, with a strong focus on uptime and customer experience.
My strong mix of development and operations skills, deep experience with AWS, and understanding of the interplay between software development and operations have enabled me to build and lead high-performing teams that drive business results.
Responsible for the availability, scalability, confugration, deployment, and monitoring of our online banking platform, serving millions of contracted users.
Built and enhanced CI, deployment and maintenance workflows in Jenkins and TeamCity
Designed and implemented an ELK Stack cluster for centralized logging, capable of handling the 10-25k events per second emitted from Dev, QA, Staging, and Production
Architected and wrote PowerShell modules and the fleet-wide deployment process used for all areas of application operations, deployment, and configuration management
Wrote custom web and Windows services for Monitoring / Operations and to integrate Jira, New Relic, and DynDNS/CloudFlare Data in our applications and Hipchat/Slack
Created infrastructure automation using Bash, PowerShell, Terraform, and Packer. Reduced new environment spin up time from weeks to hours
Created automated Testing Tools (Custom HttpClient, NUnit, and Selenium) to ensure quality releases, effective monitoring, and to power rollback decisioning
Worked directly with architecture and product engineering to shape and improve microservices strategy, and codify SRE requirements and NFRs