Skip to main content

Site Reliability Engineer | Site Reliability Engineer SDET

SRE SDET Responsibilities: Automation: Develop and maintain automation tools and frameworks to streamline deployment, configuration management, and routine operational tasks. This includes programming skills, understanding source codes, designing and developing test automation frameworks. Development Collaboration: Forge close collaboration with software development teams to foster and drive adoption of QE practices. Documentation: Maintain clear and up-to-date documentation of system configurations, procedures, and troubleshooting guides. Continuous Improvement: Stay updated with industry trends, emerging technologies, and best practices in Site Reliability Engineering to suggest and implement improvements in processes and systems. Test Case Creation and Risk Management: Design standards on development of test cases and risk management strategies to enhance the reliability and performance of our systems. Performance Optimization: Identify performance bottlenecks, conduct capacity planning, and optimize system performance to deliver a highly responsive and reliable user experience. Minimum Qualifications / Skills Bachelor's degree in computer science, Information Technology, or a related field (or equivalent practical experience). Proficiency in programming (Java) and preferably with scripting languages (JavaScript) to automate tasks and develop tools. Strong knowledge of cloud computing platforms (e.g., AWS, GCP, Azure) and containerization technologies (e.g., Docker, Kubernetes). Experience with version control systems (e.g., Git). Ability to work collaboratively in cross-functional teams and excellent communication skills. Experience in Selenium or other automation tools, and willingness to learn new tools Preferred Qualifications/ Skills Master's degree in computer science, Information Technology, or a related field. Certifications in cloud platforms (e.g., AWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer). Experience with microservices architecture and continuous integration/continuous delivery (CI/CD) pipelines. Knowledge of database management systems (e.g., MySQL, PostgreSQL, MongoDB). Understanding of cybersecurity principles and best practices. Site Reliability Engineer Responsibilities Application of Observability principles within our infrastructure and software Application of SLI's, SLO's, metrics and error budgets Develop and implement comprehensive observability with monitoring, logging and alerting technologies Establishing reliable Release Engineering best practices including releases, feature toggling and rapid & safe rollbacks Collaboration with feature teams to ensure reliability practices are integrated into development Eliminate recurring incidents and Implement continuous learning through blameless post-mortems and actions Participate in improving incident management with the application of best practices Developing reliability software patterns and practices (e.g. circuit breaker, retry etc). Help achieve the goal of 'push on green' for deployments. Skills Required Strong experience in software engineering and reliable coding practices Strong experience in SLO, metrics, logging, and tracing Proven record of accomplishment in automating tool  Excellent understanding of modern software development practices, tools and technologies A passion for getting things done, balancing short- and long-term needs Influencing software engineers on reliability Influence and technical leadership capabilities to drive change in environments, manage stakeholders and get alignment cross functionally Strong DevOps fundamentals with preference for Java, Golang, Microservices and cloud technologies (AWS, Azure & GCP) Excellent written and verbal communication, problem-solving, and teamwork skills

Site Reliability Engineer | Site Reliability Engineer SDET

Trinity Workforce Solutions, Inc.
Makati, Metro Manila
Full time

Published on 04/25/2024

Share this job now