Site Reliability Engineer


Our client is looking for a Site Reliability Engineer to join the client’s rapidly growing company in support of multiple SaaS applications. You will be responsible for cloud infrastructure, availability, reliability, performance, and security of production applications and systems.


9:00 AM – 6:00 PM Eastern Standard Time (10:00 PM – 7:00 AM Philippine Standard Time), follows Philippine holidays

9:00 AM – 6:00 PM Pacific Standard Time (1:00 AM – 10:00 AM Philippine Standard Time), follows Philippine holidays




·       Create, deploy, and maintain production infrastructure within the AWS accounts, using IAC/Terraform

·       Utilize various AWS services, including EC2, EKS, RDS, RedShift, S3, and IAM

·       Create, implement, and maintain automated application releases using Bitbucket Pipelines

·       Create, implement, and maintain application and infrastructure performance monitoring using Datadog or Prometheus/Loki/Grafana

·       Create, implement, and maintain application and infrastructure availability monitoring using Datadog or Prometheus/Loki/Grafana

·       Apply security practices and policies to identify and remediate security vulnerabilities

·       Oversee incident response procedures, including analysis and documentation of incidents to prevent future occurrences


·       A 4-year college degree (technical or quantitative science) is preferred or equivalent work experience with evidence of proficiency and achievement in virtual infrastructure management

·       3+ years’ experience in cloud computing and Infrastructure as Code (IaC) specifically for Terraform

·       Strong experience using Kubernetes

·       Experience with cloud-native tooling (Helm Charts, ArgoCD, HashiCorp Vault, Harbor, Reloader, Grafana, Prometheus, and Loki) is a plus

·       Experience with cloud native analytics tools (ElasticSearch, MongoDB, RedShift/SnowFlake, and Looker)

·       Any AWS certification is a big plus

·       Proficient in Linux system administration and security

·       Proficient with code versioning tools (e.g., Git, Bitbucket, etc.)

·       Proficient with CI/CD tools (e.g., Bitbucket Pipelines, etc.)

·       Proficient in scripting languages such as Bash and Python

·       Exposure to Open Telemetry and Distributed Tracing

·       Awareness of recent industry trends related to observability and monitoring

·       Strong troubleshooting and problem-solving skills, with the ability to quickly diagnose and resolve complex issues

·       Excellent oral and written communication skills