Systems Reliability Engineer

Houghton Mifflin Harcourt

Job Requisition ID: 7241

Software Engineering at HMH is focused on building fantastic software to meet the challenges facing teachers and learners. Enabling and supporting a wide range of next generation learning experiences. We design, and build custom applications and services used by millions.  We are creating teams full of innovative, eager software professionals to build the products that will transform our industry.  

We are staffing new, small, self-contained teams of engineers with people who love solving problems, building high quality products and services. We are looking for people with a wide range of skills, who can contribute broadly bringing the right tool to the right job. We use a wide range of technologies, and are building up a next generation services platform that can make our learning tools and content available to all our products, and someday to the world. If you want to make a difference in the lives of students and teachers, and understand what it takes to deliver high quality software, we would love to talk to you about this opportunity.

The Opportunity – Systems Reliability Engineer – Technical Services Team

Who We Are

The Bedrock Technical Services Team operate, develop, maintain and support the newest services delivery platform for Houghton Mifflin Harcourt. We focus on reliability, availability, performance and scalability of the newest efforts in the education space.

What you’ll do

As a member of the Bedrock Technical Services Team you will be working with engineers and fellow team members, worldwide to deliver reliable, scalable services running on Apache Mesos/Aurora. You will work closely with these engineers to develop the latest in cloud systems and software solutions in a rapidly developing space.

You will be involved in all levels of design, development and delivery of these applications and infrastructure in addition to helping define the ever evolving standards and processes that facilitates an efficient and healthy work environment.

  • You will participate with team efforts to lower operation cost through improved design, effective implementation, elimination of break/fix behaviours and extensive automation.
  • Your responsibilities include but are not limited to
  • You will identify, investigate and fix application performance, systemic inadequacies and latent reliability issues.
  • You will influence standardization and process refinement within the engineering organization.
  • Troubleshoot stack-wide engineering issues related to hardware, software, network, applications and cloud service providers.
  • You will mentor engineers and team members on methodology, standards, monitors and best practices.
  • Take part in peer code reviews providing qualitative feedback and facilitate and learning environment through equitable exchange of ideas.
  • Represent the teams efforts and advances through internal presentations, blog articles and thorough documentation of these efforts.

Who You Are

  • Expert knowledge of Linux servers, specifically RHEL/CentOS/Amazon Linux.
  • Demonstrable knowledge of TCP/IP, security (application and systems) and experience supporting multi-tiered technologies.
  • Practical knowledge of high level scripting language (Python, etc).
  • Practical knowledge of caching technologies, messaging protocols and software design and life cycle.
  • Practical knowledge of practices and use of source control, specifically git.
  • Practical understanding of application and systems design and an ability to communicate trade-offs, benefits and pitfalls of various decisions in those designs.
  • Possess the ability to adapt and adjust to rapidly changing land-scape and priorities.
  • Strong ability to work independently and prioritize tasks with little or no direction.
  • Demonstrable ability to effectively and efficiently troubleshoot and communicate and influence a disparate range of individuals through effective communication and decision making.
  • A critical thinker looking past the immediate concerns and distractions and considering the landscape of the future.
  • A passionate lifelong learner and innovator realizing the current technologies and problems are perishable and change is inevitable.

Desired Skills

  • Practical experience in Java, Scala or GoLang.
  • Experience with or a strong desire to learn Apache Mesos/Aurora, Zookeeper and Amazon Web Services.
  • Familiarity with open source software development culture, community and workflows.
  • Practical experience with monitoring and alerting workflows and technologies.

ABOUT US:
Houghton Mifflin Harcourt (NASDAQ:HMHC) is a global learning company dedicated to changing people’s lives by fostering passionate, curious learners. As a leading provider of pre-K–12 education content, services, and cutting-edge technology solutions across a variety of media, HMH enables learning in a changing landscape. HMH is uniquely positioned to create engaging and effective educational content and experiences from early childhood to beyond the classroom.  HMH serves more than 50 million students in over 150 countries worldwide, while its award-winning children’s books, novels, non-fiction, and reference titles are enjoyed by readers throughout the world.
For more information, visit http://careers.hmhco.com  
PLEASE NOTE:  
Houghton Mifflin Harcourt is an equal employment opportunity employer and participates in E-Verify. All qualified applicants will receive consideration for employment and will not be discriminated against on the basis of gender, race/ethnicity, gender identity, sexual orientation, protected veteran status, disability, or other protected group status

To apply for this job please visit the following URL: http://itjobpro.com/60514 →