Site Reliability Engineer (SRE)
Thailand, Bangkok | Full Time | Technology
Strong knowledge & experience in following items are required:
- Experience in negotiating SLO/SLI with product owner
- Experience in building highly available & observable systems at scale
- Proven track record working as Site Reliability Engineer in managerial level
- Implement/Improve SRE principles by working with Infra/DevOps members and engineers in the greater organization to spread SRE knowledge and best practices.
- Responsible as a multi-hat team member with software and system engineer mindset, passion for system reliability and observability
- Build reliability as a feature into our core infrastructure and applications
- Knowledge of scalable production architectures (config management, monitoring, infrastructure-as-a-code, load balancing, CDNs, distributed systems)
- Experience with cloud infrastructure (e.g. AWS, Alibaba cloud), Kubernetes, and most of the following technologies: Helm, Docker, Terraform, Graylog, Prometheus, Jaeger, Kafka/RabbitMQ
- Good understanding of the SLIs, SLOs, and SLAs concepts
- Experience in using data/metrics/logs to diagnose and troubleshoot complex systems
- Experience as a software developer, preferably polyglot [C#, Python or Go]
- Ability to work anywhere in the stack
- Knowledge of operating system internals
- Familiarity with operations: metrics/statistics, incident management, post mortems, etc.
- Good understanding of MTTD, MTTR, and MTBF metrics
- Have "Automate things, removing toils" in your DNA
- Strong passion about observability and sharing knowledge
We offer an attractive remuneration package, a fast-paced and exciting working environment, and provide challenging opportunities for life-long learning and career development.
Interested candidates are asked to apply via this job post providing a comprehensive resume with current and expected salary package.
Please note that only shortlisted candidates will be notified.