Experienced and data-driven Site Reliability Engineering leader with a track record of building cross-functional, geodistributed DevOps teams. Skilled at defining and implementing SRE best practices, driving innovation, with a strong focus on uptime and customer experience.
My strong mix of development and operations skills, deep experience with AWS, and understanding of the interplay between software development and operations have enabled me to build and lead high-performing teams that drive business results.
Helped lead the SRE team through the path to IPO and oversaw the completion of governance processes relevant to my teams
Interviewed M&A targets and helped develop the due dilligence processes for potential aquisitions with a focus on system architecture, system availability, technology stacks / depreciation, and operational readiness
Helped define divisional OKRs, and assisted in measuring the completion of R&D division goals on a quarterly basis
Developed cross functional SRE mentoring programs, to improve internal candidate pipelines, employee engagement + morale, and reduce attrition
Managed M&A integration from a deployment, system development, security and vulnerability management program standpoint
Presented to prospects and contributed to RFPs on the subjects of system availability, security defense in depth, vulnerability management, and cloud native technology adoption
Oversaw the execution and deliverables from network, system, release, network, and container platform engineering teams
Mentored and grew managers, along with individual contributors from a large cross-section of the business
Democratized operations and change management. Moved from constant firefighting to standard, repeatable, prompt process, without downtime for all patching, release, and administrative tasks. Developed and published OLAs to reduce friction and measure the success of process-driven Ops
Built out a Kubernetes focused team and partnered with development to launch our first microservices in EKS. Worked with leaders in Product and Development to build a comprehensive roadmap to Kubernetes in order to minimize infrastructure spend and release complexity, while maximizing uptime
Participated in the company's security and compliance steering committee, focused on providing world class protection for our systems and customer data
Developed the Release and System Engineering teams from the ground up and matured Operations and Network Engineering. Built teams responsible for core system operations, automated infrastructure provisioning, application deployment, and incident response
Successfully led the migration of Alkami's privately hosted customer and corporate environments to AWS
Oversaw the maintenance of remaining corporate hardware, implementation of enterprise vulnerability management programs, and acted as the product owner for my teams
Ran the production certification for PCI/SOC 2 Type 2/SOX assessments, and led the technical response for gap item resolution
Created repeatable incident response and retrospective processes and automation used for all severity 1 and 2 incidents. Reduced MTTR, customer satisfaction, and data quality through the development of a custom Slack chatbot focused on incident response and client communications.
Implemented a robust monitoring program using NewRelic and ElasticSearch. Championed the use of Infrastructure as Code. Transformed the team from point and click system builders to developers who produced highly automated, repeatable, and thoroughly tested infrastructure
Modernized the release and deploy process by implementing identical infrastructure and deployment automation in all environments. Built self-service tooling to allow developers to deploy code and infrastructure easily, without direct access to systems. Scaled deployments to thousands of application releases per month
Championed best practices with the development and product organizations. Helped lead cross-functional incident reviews to drive action items and build strong data around areas of concern. Worked with product to use this data to drive focused technical debt paydown and rearchitecture, with quantifiable ROI
Responsible for the availability, scalability, confugration, deployment, and monitoring of our online banking platform, serving millions of contracted users.
Built and enhanced CI, deployment and maintenance workflows in Jenkins and TeamCity
Designed and implemented an ELK Stack cluster for centralized logging, capable of handling the 10-25k events per second emitted from Dev, QA, Staging, and Production
Architected and wrote PowerShell modules and the fleet-wide deployment process used for all areas of application operations, deployment, and configuration management
Wrote custom web and Windows services for Monitoring / Operations and to integrate Jira, New Relic, and DynDNS/CloudFlare Data in our applications and Hipchat/Slack
Created infrastructure automation using Bash, PowerShell, Terraform, and Packer. Reduced new environment spin up time from weeks to hours
Created automated Testing Tools (Custom HttpClient, NUnit, and Selenium) to ensure quality releases, effective monitoring, and to power rollback decisioning
Worked directly with architecture and product engineering to shape and improve microservices strategy, and codify SRE requirements and NFRs
As the QA automation team lead I provided direction and support to the QA Automation team
Authored and executed coded UI tests using handwritten C# against recorded controls
Owned the resources required for QAs virtual and physical environments through the planning/purchasing, implementation, and system administration phases
Maintained 40+ virtual machines for the QA/Dev teams and the entire virtual lab environment including maintenance of system images and automated VM rollout
Provided IT support for the QA teams
Responsible for researching, planning, and implementing advanced Test Manager / VSTS 2010 features such as code coverage and test impact analysis using TFS' symbol server capabilities
Worked with the TFS API to provide advanced work item management for the QA team and customizing TFS work items / artifacts / security when necessary to facilitate QA progress
Owned of the automated build process in Team Build 2010, automatic deployment of the .NET web applications, and primary owner of automated tests tied to the build
Backup TFS administrator for the QA/Engineering department
Developed various scripts (mostly Powershell) to streamline QA processes and data collection
Recognized by clients on numerous occasions for 'above and beyond' level support.
Prior experience as lead processor allows for thorough resolution of problems and complete knowledge of DFS services and products
Provides high-level support for DFS customers including custom site scripting, financial file reformatting, XML and regular expression mappings.
Overview: Member of an outstanding team that provides the first level of direct support for ASP/hosted ReconNET and AssureNET clients and Dataflow Services financial information reporting customers.
Details:
Works directly in client application databases to provide in-depth support for common application errors, feature and usage implementations, and application training.
Overview activities yearly within the team to ensure SAS70 compliance.
Increased data security by assisting in the planning and implementation of central servers for data collection and updating outdated archive and backup policy.
Implemented key processes including new automation, software and hardware resources allowing an increase in customer response time, report accuracy, and reduction of operational staff requirements by 25%.
Designed internal metrics for team member's performance in order to accurately gauge output and productivity.
Overview: Supervisor of entry-level data processors responsible for maintaining high levels of client satisfaction and efficiency within the department.
Details:
Oversees the collection and formatting of over 18,000 corporate and retail bank accounts' data daily.
Overview: Entry-level call center position responsible for fielding basic HR inquiries from over 50,000 Xerox staff.
Details:
Worked as part of a 16-person team acting as the first line of direct HR inquiry for Xerox personnel.
Managed the employee database and updated personnel information as needed.
Responded to court subpoenas for information as well as directives to garnish wages.
Assisted in the implementation of E-Time, a new payroll tracking and punchcard system.
Special participation on the Reduction-In-Force processing team; completed the processing and delivery of voluntary and involuntary layoff paperwork and rights notifications to impacted employees.