About this role

Company Description

Job Description

Responsibilities: 


• Perform deep dives into both systemic and latent reliability issues; partner with software and systems engineers across the organization to produce and roll out fixes. 

• Troubleshoot issues across the entire stack. Solve problems relating to mission critical services and build automation to prevent problem recurrence; with the goal of automating response to all non-exceptional service conditions 

• Identify and drive opportunities to improve automation 

• Engage in service capacity planning and demand forecasting, software performance analysis and system tuning. 

• Participate in periodic on call duties. 

• Represent the SRE team in design reviews and operational readiness exercises for new and existing services 


Minimum qualifications: 


• BS degree in Computer Science or related technical field, or equivalent practical experience. 

• Minimum 5+ years of managing services in an internet scale *nix environment 

• Practical knowledge of various aspects of service design, including messaging protocols & behavior, caching strategies and software design practices 

• Experience in one or more of: Java, Tomcat, Elastic Search, MySQL or scripting experience in Shell and Python. 

• Experience working with Unix/Linux systems from kernel to shell and beyond, with experience working with system libraries, file systems, and client-server protocols. 

• Strong hands on experience with configuration management tools like Ansible, Puppet, or Chef 

• Experience with network theory e.g. TCP/IP, UDP, ICMP, etc., MAC addresses, IP packets, DNS, OSI layers, and load balancing. 

• Must work well with and be able to influence myriad personalities at all levels 

• Ability to prioritize tasks and work independently 

• Must be adaptable and able to focus on the simplest, most efficient & reliable solutions 

• Track record of successful practical problem solving, excellent written and interpersonal communication, and documentation skills 



Desired qualifications: 


• Expertise in designing, analyzing and troubleshooting large-scale distributed systems. 

• In-depth knowledge of operating systems (processes, threads, concurrency issues, locks, mutexes, semaphores, monitors and how they work). 

• Familiarity with algorithms, data structures and complexity analysis. 

• Hands on Java and Apache optimization, performance tuning and configuration 

• Systematic problem solving approach, coupled with a strong sense of ownership and drive.

Qualifications

Linux Administration,Tomcat. Puppet

Additional Information

Multiple Openings 

Frequently Asked Questions

Is the salary disclosed for the Site Reliability Engineer position at jobsbridge1?
The salary for this Site Reliability Engineer role at jobsbridge1 is not publicly listed. Click "Apply Now" to learn more about the compensation package on their official careers page.
Where is the Site Reliability Engineer position at jobsbridge1 located?
This Site Reliability Engineer role at jobsbridge1 is based in CA, San Francisco, San Francisco, CA, United States, us. The position is listed as on-site or hybrid. Check the full job description or apply directly to confirm the work arrangement.
Is the Site Reliability Engineer role at jobsbridge1 full-time or part-time?
This is listed as a Full time position. It is posted as a Site Reliability Engineer role at jobsbridge1.
How do I apply for the Site Reliability Engineer position at jobsbridge1?
Click the "Apply Now" button on this page. You will be redirected to jobsbridge1's official application portal hosted on smartrecruiters where you can submit your application directly.
When was the Site Reliability Engineer job at jobsbridge1 posted?
This Site Reliability Engineer position at jobsbridge1 was posted on Dec 2, 2015. Apply as soon as possible — early applications are often reviewed first.
Site Reliability Engineer
jobsbridge1
Apply for this role ↗

You'll be redirected to jobsbridge1's official application page on SmartRecruiters.