Site Reliability Engineer
About this role
Company Description
Job Description
Responsibilities:
• Perform deep dives into both systemic and latent reliability issues; partner with software and systems engineers across the organization to produce and roll out fixes.
• Troubleshoot issues across the entire stack. Solve problems relating to mission critical services and build automation to prevent problem recurrence; with the goal of automating response to all non-exceptional service conditions
• Identify and drive opportunities to improve automation
• Engage in service capacity planning and demand forecasting, software performance analysis and system tuning.
• Participate in periodic on call duties.
• Represent the SRE team in design reviews and operational readiness exercises for new and existing services
Minimum qualifications:
• BS degree in Computer Science or related technical field, or equivalent practical experience.
• Minimum 5+ years of managing services in an internet scale *nix environment
• Practical knowledge of various aspects of service design, including messaging protocols & behavior, caching strategies and software design practices
• Experience in one or more of: Java, Tomcat, Elastic Search, MySQL or scripting experience in Shell and Python.
• Experience working with Unix/Linux systems from kernel to shell and beyond, with experience working with system libraries, file systems, and client-server protocols.
• Strong hands on experience with configuration management tools like Ansible, Puppet, or Chef
• Experience with network theory e.g. TCP/IP, UDP, ICMP, etc., MAC addresses, IP packets, DNS, OSI layers, and load balancing.
• Must work well with and be able to influence myriad personalities at all levels
• Ability to prioritize tasks and work independently
• Must be adaptable and able to focus on the simplest, most efficient & reliable solutions
• Track record of successful practical problem solving, excellent written and interpersonal communication, and documentation skills
Desired qualifications:
• Expertise in designing, analyzing and troubleshooting large-scale distributed systems.
• In-depth knowledge of operating systems (processes, threads, concurrency issues, locks, mutexes, semaphores, monitors and how they work).
• Familiarity with algorithms, data structures and complexity analysis.
• Hands on Java and Apache optimization, performance tuning and configuration
• Systematic problem solving approach, coupled with a strong sense of ownership and drive.
Qualifications
Linux Administration,Tomcat. Puppet
Additional Information
Multiple Openings
Frequently Asked Questions
Is the salary disclosed for the Site Reliability Engineer position at jobsbridge1?
Where is the Site Reliability Engineer position at jobsbridge1 located?
Is the Site Reliability Engineer role at jobsbridge1 full-time or part-time?
How do I apply for the Site Reliability Engineer position at jobsbridge1?
When was the Site Reliability Engineer job at jobsbridge1 posted?
You'll be redirected to jobsbridge1's official application page on SmartRecruiters.