Site Reliability Engineer

Site Reliability Engineer

22
Mumbai
Job Views:

Created Date: 2026-06-19

End Date: 2026-08-17

Experience: 5 - 8 years

Salary: 3000000

Industry: Financial services / Banks/ NBFC/ BFSI

Openings: 1

Primary Responsibilities :

Site Reliability Engineering

  • Develop, automate, deploy, and maintain cloud infrastructure using Pulumi, Python, and Shell scripting.
  • Build and automate CI/CD pipelines for application and infrastructure deployments.
  • Deploy, manage, and upgrade Kubernetes environments (GKE) and tools such as Istio, Argo CD, and cert-manager.
  • Monitor, troubleshoot, and optimize cloud networking components including VPCs, subnets, DNS, firewall rules, peering, and Private Service Access.
  • Deploy, scale, monitor, and support microservices running on Kubernetes.
  • Manage cloud resources including virtual machines, cloud functions, and storage services.
  • Support and troubleshoot databases, caching, and messaging systems such as PostgreSQL, MongoDB, Redis, Spanner, and Confluent.
  • Configure and support authentication systems and API integrations.
  • Implement observability, monitoring, logging, and alerting solutions.
  • Participate in security audits and implement cloud security best practices.

Operational Excellence

  • Improve CI/CD processes, release management, rollback strategies, and production support.
  • Enhance monitoring, dashboards, logging, tracing, and alert quality.
  • Participate in incident management, troubleshooting, and root cause analysis.
  • Create and maintain technical documentation, runbooks, and architecture documents.
  • Track tasks, issues, and releases using project management tools.
  • Support continuous improvement initiatives and engineering best practices.

Leadership & Collaboration

  • Work closely with Software Engineering, QA, Security, and Infrastructure teams.
  • Provide technical guidance and support to junior SRE team members.
  • Mentor and assist Level 1 SRE engineers.
  • Support cross-functional teams in delivering reliable and scalable solutions.
Experience Requirements:
  • 5+ years of experience in Site Reliability Engineering, DevOps, Cloud Engineering, or related roles.
  • Strong experience with cloud platforms, preferably Google Cloud Platform (GCP).
  • Hands-on experience with Kubernetes (GKE) and containerized environments.
  • Proficiency in Python, Shell Scripting, YAML, and JSON.
  • Experience building and maintaining CI/CD pipelines.
  • Strong Linux administration and troubleshooting skills.
  • Experience with infrastructure automation tools and APIs.
  • Knowledge of networking concepts including VPCs, DNS, firewalling, and cloud connectivity.
  • Experience with monitoring and observability platforms such as Prometheus, Datadog, Splunk, or Google Monitoring.
  • Strong understanding of Git, DevOps practices, and automation workflows.
  • Excellent English communication skills.

Preferred Qualifications

  • Experience with Istio Service Mesh, Argo CD, and cert-manager.
  • Knowledge of PostgreSQL, MongoDB, Redis, Spanner, and messaging platforms.
  • Familiarity with authentication and identity management systems.
  • Experience with security audits and cloud security best practices.
  • GCP Certifications such as:
    • Professional Cloud DevOps Engineer
    • Professional Cloud Architect
    • Professional Cloud Security Engineer
  • Additional certifications in Python, DevOps, or Observability platforms.
Location

: Alliance Recruitment Agency UAE

Share Job :

Loading share buttons...