Managing customer issues related to the installation, configuration, and implementation of products on a timely basis, providing effective and clear communication, and establishing appropriate expectations with clients
Automate repetitive tasks to improve operational efficiency and reduce manual intervention.
Provide primary operational support and engineering for large-scale distributed software applications
Monitor and analyze system performance, ensuring optimal performance and scalability.
Respond to incidents, perform root cause analysis, and implement preventive measures.
Implement and maintain a comprehensive monitoring and alerting system to ensure early detection of anomalies and issues.
Design, build, and manage deployment pipelines to facilitate seamless and reliable application releases.
Conduct regular performance testing and capacity planning to identify and address bottlenecks in the infrastructure.
Participate in on-call rotation and handle production incidents as necessary.
Ensure customers are effectively represented to the Product Management and Engineering teams by writing actionable, detailed Defect reports and Enhancement requests in Jira

Skills and Experience:

Proven experience as a Site Reliability Engineer or a similar role in a large-scale production environment.
Strong expertise in scripting and automation using languages like Python, Bash, or similar.
Strong Linux skills, including command-line tools, shell scripting, and system diagnostics.
Proficiency with cloud platforms (e.g., AWS, Azure, GCP) and container technologies (Docker, Kubernetes).
Excellent customer service skills, empathy, and a sense of urgency
Deep understanding of monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack)
Knowledge of networking, security, and system administration.
Certification in relevant technologies (e.g., AWS Certified DevOps Engineer, Certified Kubernetes Administrator).
Experience with Infrastructure as Code (IaC) tools (e.g., Terraform).
Knowledge of Continuous Integration/Continuous Deployment (CI/CD) pipelines
Ability to read source code ( especially Scala ) is a plus
Previous experience with databases(Postgres and MongoDB) and data management systems.
Excellent problem-solving and communication skills, with the ability to work effectively in a team-oriented environment.

Education:

Apply With Resume *

Autofill with LinkedIn

First Name*

Middle Name

Last Name*

Email*

Mobile*

Social Network and Web Links

Provide us with links to see some of your work (Git/ Dribble/ Behance/ Pinterest/ Blog/ Medium)

Total Relevant Experience (in Years)*

Master - (Major/Institution Name & Place/Year of Passing/Grade)

Bachelors - (Major/Institution Name & Place/Year of Passing/Grade)*

12th - (Major/Institution Name & Place/Year of Passing/Grade)*

10th - (Institution Name & Place/Year of Passing/Grade)*

Current CTC in INR *

Expected CTC in INR*

Notice Period in Days*

Current City *

Site Reliability Engineer