Engineering

Site Reliability Engineer

Hyderabad
Work Type: Full Time
Responsibilities:
  • Managing customer issues related to the installation, configuration, and implementation of products on a timely basis, providing effective and clear communication, and establishing appropriate expectations with clients
  • Automate repetitive tasks to improve operational efficiency and reduce manual intervention.
  • Provide primary operational support and engineering for large-scale distributed software applications
  • Monitor and analyze system performance, ensuring optimal performance and scalability.
  • Respond to incidents, perform root cause analysis, and implement preventive measures.
  • Implement and maintain a comprehensive monitoring and alerting system to ensure early detection of anomalies and issues.
  • Design, build, and manage deployment pipelines to facilitate seamless and reliable application releases.
  • Conduct regular performance testing and capacity planning to identify and address bottlenecks in the infrastructure.
  • Participate in on-call rotation and handle production incidents as necessary.
  • Ensure customers are effectively represented to the Product Management and Engineering teams by writing actionable, detailed Defect reports and Enhancement requests in Jira

Skills and Experience:
  • Proven experience as a Site Reliability Engineer or a similar role in a large-scale production environment.
  • Strong expertise in scripting and automation using languages like Python, Bash, or similar.
  • Strong Linux skills, including command-line tools, shell scripting, and system diagnostics.
  • Proficiency with cloud platforms (e.g., AWS, Azure, GCP) and container technologies (Docker, Kubernetes).
  • Excellent customer service skills, empathy, and a sense of urgency
  • Deep understanding of monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack)
  • Knowledge of networking, security, and system administration.
  • Certification in relevant technologies (e.g., AWS Certified DevOps Engineer, Certified Kubernetes Administrator).
  • Experience with Infrastructure as Code (IaC) tools (e.g., Terraform).
  • Knowledge of Continuous Integration/Continuous Deployment (CI/CD) pipelines
  • Ability to read source code ( especially Scala ) is a plus
  • Previous experience with databases(Postgres and MongoDB) and data management systems.
  • Excellent problem-solving and communication skills, with the ability to work effectively in a team-oriented environment.

Education:

  • Bachelors or Masters from premier Institutes preferred.
  • Experience 3-10 years

Submit Your Application

You have successfully applied
  • You have errors in applying
Or
  Autofill with LinkedIn