Skip to content

Senior Site Realibility Engineer

  • Remote
    • Philippines

Job description

Senior Site Reliability Engineer (SRE)

Brief

CC.Talent is seeking a Senior Site Reliability Engineer (SRE) to join our global infrastructure team. You will be a guardian of our production environment, responsible for its health, performance, and scalability. Your mission is to apply software engineering principles to solve operational problems, automate everything, and ensure our platform exceeds the reliability expectations of our customers. You'll work with a talented, distributed team of engineers across different time zones, making your mark on a platform that processes millions of transactions. This role requires a deep passion for eliminating toil, a proactive approach to system stability, and excellent communication skills to thrive in a remote-first environment.

Client Details

Our client is a rapidly growing Series A funded fintech company in the payment industry. They are expanding their operations by launching an engineering hub in Southeast Asia and have chosen us as their trusted partner to make it happen.

Responsibilities

Architect & Automate:Design, build, and maintain our core infrastructure using Infrastructure as Code (IaC) principles. You'll be instrumental in evolving our CI/CDpipelines to ensure safe, rapid, and reliable releases.

Enhance Reliability & Scalability:Proactively identify and address performance bottlenecks, single points of failure, and scalability limits. You'll define and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to maintain and improve platform health.

Champion Observability:Implement and manage comprehensive monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, ELK Stack) to provide deep insights into system behavior and ensure rapid incident detection.

Lead Incident Management:Participate in our on-call rotation, acting as a key player in incident response and resolution. You'll lead blameless post-mortems to identify root causes and implement preventative measures.

Collaborate & Empower:Work closely with software engineering teams to foster a culture of reliability. You'll provide guidance on building resilient services, implementing best practices for observability, and improving the developer experience.

Secure the Foundation:Implement and maintain security best practices across our cloud infrastructure, ensuring our platform is robust and compliant.

Job requirements

Minimum Requirements

  • 5+ years of hands-on experience with a major cloud provider, preferably AWS (EC2, S3, RDS, VPC, IAM, etc.).

  • Deep proficiency with tools like Terraform or CloudFormation to manage infrastructure declaratively.

  • Strong experience with Docker and container orchestration systems like Kubernetes (EKS) or ECS.

  • Proven ability to build, optimize, and manage CI/CD pipelines using tools like GitLab CI, Jenkins, or CircleCI.

  •  Hands-on experience with modern monitoring and logging tools (e.g., Prometheus, Grafana, Loki, Alertmanager, ELK Stack).

  • Proficiency in at least one programming language, such as Go, Python, or Bash, for automation and tooling.

  • Excellent written and verbal communication skills, with a proven ability to work effectively and asynchronously in a distributed team environment.

Preferred Qualifications:

  • Experience in the payments or FinTech industry.

  • Familiarity with service mesh technologies like Istio or Linkerd.

  • Experience with database administration (e.g., PostgreSQL, MySQL).

  • Knowledge of networking, security principles, and compliance standards (e.g., PCIDSS).

Details

Remote
  • CC.Talent - Pampanga

or

Apply with Indeed unavailable