At WorkFusion, we build software that is changing the world & transforming workplaces. Our technology automates repetitive, data-intensive work so people can be freed from the mundane to pursue the meaningful, companies can grow further, & customers can be served faster & better.
WorkFusion is increasingly recognized as the world leader in industry-specific process automation, offering AI-powered software with particular focus on the needs of banking, financial services, & insurance enterprises. Our Intelligent Automation Cloud combines RPA, machine learning & analytics in one unrivaled platform that can be deployed quickly & scale without limit. We compete in the world's fastest-growing software segment & we are growing at record pace with customers spanning the globe.
As the health & safety of our teams is always a primary concern, we are currently a primarily remote workforce worldwide. Officially, our headquarters is in New York City (on Wall Street) with additional hubs in Canada, Europe & Asia.
The ideal candidate is self-driven, data-driven, & has the ability to work in a distributed team. This professional holds a strong knowledge of Site Reliability Engineering & DevOps methodologies related to Delivery solutions & Platform Automation. In this role, you will be leading the Site Reliability team, sharing your experience in the field with our Delivery, Support, Product Engineering, & Infrastructure teams. You will simultaneously focus on technical excellence & ensuring the team quickly delivers value to customers who have deployed our software in production. The person who fills this role is a subject matter expert who excels in collaboration, open communication, & reaching across functional borders.
Manage a team of SRE resources, providing guidance & mentorship, technical leadership, & be a primary point of contact for coordination with Support & Product organizations
Provide support for Site Reliability / DevOps driven solutions for cloud & on-premise environments, troubleshoot issues with applications & middleware components
Take learnings from the field & take ownership of making sure the improvements/fixes/learnings make it back into the product & to WorkFusion documentation
Work on Ansible-based product installers & automation scripts
Build & support Monitoring Systems around the product, as well as highly available & scalable services
Troubleshoot MySQL & Postgres DBs & the physical & virtual resources on which they run for optimal performance
Architect & implement increasingly better HA, DR, & backup solutions
Creates the vision & improves the whole lifecycle of services - from inception & design, through deployment, operation, & refinement. This includes researching gaps in automation & laying out the plan to remove the gaps.
Recommends & implements strategies, policies, & procedures by evaluating organization outcomes; identifying problems; evaluating trends; & anticipating requirements.
Show ownership of customer success with WorkFusion platform management.
Partner with Delivery, Engineering, & Product to steer SRE alignment & strategy to ensure the reliability of WorkFusion platform deployments
Respond to client reliability concerns & agile problem resolution.
Lead resource management & efficiency strategy. Provide technical leadership development & recruitment inside the organization.
Collaborate with our growing DevOps/Infra team to build & iterate on our infrastructure to improve reliability & performance
Identify parts of the system that do not scale, provide immediate palliative measures, & drives long-term resolution of these incidents.
Identify Service Level Indicators (SLIs) that will align the team to meet the availability & latency objectives.
Advanced knowledge of the Platform features & functionalities
Provide L3 escalation support to provide expert & minimize the business outage.
Document solutions & techniques for resolving issues, ensuring information is available to the team through technical notes & the internal knowledge base
Strong expertise (7-10 years) in administration & engineering of Linux & Windows OS (Amazon Linux, RedHat, Centos, Windows 2016)
Hands-on experience (>3 years) working with Tomcat or other Java servlet containers
Practical knowledge of administering & tuning web servers (Nginx), application servers, & databases (MySQL, PostgreSQL, MongoDB/MSSQL)
Proficiency in Bash (> 3 years)
Familiarity with Windows Systems & its Services (Microsoft SQL a huge plus)
Strong Knowledge of AWS, Azure, or GPC cloud services - EC2, ASG, LB, KMS, S3, Route53, Azure LB
Solid experience (>2 years) in Ansible CM, or similar
Deep understanding of CI/CD tools (Jenkins/Sonar/Nexus)
Secret Management Software (Hashicorp Vault), RabbitMQ, Marathon, Mesos
Advanced Storage Knowledge
Scripting languages (Bash, PowerShell)
Monitoring & alerting experience (ELK)
Databases (MSSQL, MySQL, PostgreSQL/MSSQL)
Network administration, DNS, TCP/IP, Security, PKI Certificate management
Would be a plus
Familiarity with ELK stack; Grafana/Splunk
Practical knowledge of Hashicorp Vault
Experience with Java development
Deep understanding of Linux kernel, networking
Smartest people in the industry & the most interesting product on Belarusian market
Indefinite employment contract employment form
Opportunity to work remotely
Comprehensive social benefits package, including:
o Health insurance covering all the best med centers for you & your family
o Sport expenses compensation
o Psychologist services compensation
o Professional & English trainings
o Team activities