Jobs at The DarkStar Group, LLC

Site Reliability Engineer (TS/SCI + CI Poly)

Herndon, VA

Description

The DarkStar Group is seeking a Site Reliability Engineer with a TS/SCI + CI Poly clearance to join one of our top projects in Herndon, VA. Below is an overview of the project, as well as information on our company, our benefits, and our $25,000 referral program.

THE PROJECT

The DarkStar Group's team solves unique and challenging intelligence problems for a Special Operations customer. This work is as close to the mission as a technologist can get, so the environment is fast-paced: team members face rapidly-changing requirements and priorities as mission needs evolve. If you hate monotony and want to use your skills to have a direct impact on real-world operational success, this is the project for you.

We are a multi-faceted software development and systems administration team working to build and maintain software applications backed by a self-managed cloud infrastructure (OpenStack) with a true big-data footprint (over 10 petabytes). Our diverse background of experience in mission support and software development serves as a catalyst to solve unique and challenging intelligence problems in support of special operations analysts and their on-going activities. Prototyping and frequent, iterative feedback are core to our delivery approach, anchored by a need to work quickly in support of our missions.

The technical stack is quite robust and includes Java, Python, C#, C/C++, Geospatial tools, Big Data and Graph Products (Hadoop, MapReduce, Spark, ElasticSearch, Neo4j), Linux, OpenStack, AWS, Ansible, SQL/NoSQL, Text Processing, Cloud Services, Containerization, Infrastructure as Code (IAC), and more.

Work on this program takes place in the Herndon, VA area (we cannot support remote work) and requires a TS clearance and a willingness to obtain a CI Poly: a current TS/SCI + CI Poly is preferred.

THE ROLE

The DarkStar Group is seeking a Site Reliability Engineer (RSE) our OpenShift PaaS organization, you will be responsible for ensuring the availability, performance, and scalability of our OpenShift environments. You will collaborate with development, operations, and product teams to automate processes, build robust monitoring systems, and enhance the overall reliability of our platforms.

Key Responsibilities:

System Reliability & Scalability: Design, implement, and maintain highly available OpenShift clusters to support mission-critical applications.
Automation & Infrastructure as Code (IaC): Develop and maintain automation scripts and tools to streamline deployment, scaling, and recovery processes using tools like Ansible, Terraform, and Helm.
Monitoring & Incident Management: Build and enhance monitoring and alerting systems (e.g., Prometheus, Grafana, ELK). Respond to and resolve incidents, conducting post-mortem analyses to identify root causes.
Performance Optimization: Analyze and optimize system performance, ensuring minimal latency and maximum throughput.
Collaboration: Work closely with development teams to implement DevOps best practices, CI/CD pipelines, and platform enhancements.
Security & Compliance: Ensure platforms meet security and compliance requirements by integrating tools for vulnerability scanning, policy enforcement, and logging.

Required Skills

Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
Minimum 5+ years of experience as an SRE, DevOps Engineer, or related role.
Expertise in OpenShift or Kubernetes platform administration.
Strong knowledge of Linux systems, networking, and containerization technologies (Docker).
Proficiency in scripting languages such as Python, Bash, or Go.
Experience with CI/CD pipelines (e.g., Jenkins, GitLab CI/CD).
Familiarity with monitoring and logging tools like Prometheus, Grafana, ELK, or Splunk.

Desired Skills (Optional)

OpenShift certification (e.g., Red Hat Certified Specialist in OpenShift Administration).
Experience with cloud platforms (AWS, Azure, or GCP).
Knowledge of service mesh technologies (Istio, Linkerd).
Strong understanding of microservices and distributed systems architecture.

About The DarkStar Group

Our Company

The DarkStar Group is a small business that solves BIG problems. We're one of the Inc. 5000 fastest-growing private companies in the US, and our engineers and scientists support the most critical national security missions in Virginia, Maryland, and elsewhere. Data Science, Software Engineering, Cloud/AWS Infrastructure, and Cyber/CNO are our core areas of expertise. We offer interesting and important work, job security, some of the best and most flexible benefits you'll find in the IC, and salaries so strong that they'll likely surprise you.

Our Benefits

The DarkStar Group offers exceptional compensation and benefits:

very strong salaries;
100% company-paid medical, dental, and vision premiums for you and all dependents;
the ability to get increased salary if you don't need medical/dental/vision;
100% company-paid disability and life insurance benefits;
a generously-funded HSA;
an 8% 401(k) contribution;
31 days of PTO/holidays to start (more with tenure);
the ability to flex time across pay periods without using your PTO;
a generous training budget;
$25,000 employee referral bonuses;
business development / growth incentives; and
top notch company swag.

** We have a huge growth opportunity, so we are offering up to a $25,000 reward for anyone new you refer whom we hire. **

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.

Jobs at The DarkStar Group, LLC

Site Reliability Engineer (TS/SCI + CI Poly)

About The DarkStar Group

Share This Job