Job Summary:
As a Lead Platform Engineer, you will oversee the design, implementation, and
optimization of our CI/CD pipelines and AWS cloud infrastructure. You will lead a team
of Platform engineers, collaborate with cross-functional stakeholders, and ensure our
systems are scalable, secure, and resilient. Your expertise in Kubernetes, Infrastructure
as Code, and Observability will be critical in modernizing our technology stack and
fostering a culture of automation and reliability.
This role includes strategic planning, technical leadership, and hands-on engineering.
Participation in an on-call rotation and occasional support outside of business hours is
expected.
Key Responsibilities:
- Lead and mentor a team of Platform engineers, fostering growth and technical
excellence. - Architect and manage cloud infrastructure using Infrastructure as Code tools via
Terraform. - Lead and perform research to find solutions for complex business problems as
they relate to infrastructure. - Oversee CI/CD pipeline development and optimization using GitHub Actions and
Argo. - Drive automation initiatives using Golang, Helm, and Bash for infrastructure and
monitoring. - Enhance system observability through logging, monitoring, and alerting
solutions. - Collaborate with development, architecture, operations, and security teams to
align infrastructure with business goals across time zones. - Ensure high availability, performance, and security of production systems.
- Evaluate and integrate emerging Platform tools and practices to improve
efficiency and reliability. - Lead incident response and root cause analysis for infrastructure-related issues.
Qualifications:
- 8+ years of experience in Platform or related infrastructure engineering, with at
least 2 years in a technical leadership role. - Deep expertise with Infrastructure as Code tooling (Terraform), CI/CD pipelines,
and strong knowledge of Argo, Helm, and GitHub Actions. - Proficiency in scripting languages and knowledge of cloud platforms.
- Strong communication, collaboration, and analytical skills. Ability to work in a
team and manage multiple tasks simultaneously. - Expertise with observability tools and techniques (e.g., logging, metrics,
monitoring, and alerting). - Expertise with compliance and risk management requirements (e.g., security,
PII, SOC, ISO, etc.) - Excellent troubleshooting and debugging skills, with experience resolving
complex infrastructure and application issues. - Excellent communication and collaboration skills, with the ability to work with
minimal supervision. - Experience developing system requirements, documentation, architecture
diagrams, and implementation plans.
Bonus Skills: - AWS or DevOps-related certifications
- Expertise with Cloud Optimization strategies
- Experience working with multiple cloud providers