DevSecOps Modernization
at NASA’s Application and Platform Services
Application and Platform Services
NASA's Application and Platform Services (APS) manages and modernizes critical enterprise applications and IT systems across the agency. It oversees the adoption of innovative technologies, cloud infrastructure, and DevSecOps practices to enhance efficiency, reduce costs, and improve security in NASA’s IT operations.
The Problem
In 2015 APS supported over 100 applications and services. The approaches APS used for software development, delivery, security, and maintenance had grown outdated and this was causing considerable obstacles that hindered their effectiveness and their ability to bring on more workloads.
Manual Code Deployment: Code was being deployed manually by human operators across various environments, making the process error-prone, slow and labor intensive.
Mutable Infrastructure: Software was run on Virtual Machines that were updated in place, manually over time. Drift naturally occurred and not all environments were the same leading to runtime inconsistencies.
Siloed Teams: Development, security, and operations teams operated in silos, with limited and often contentious communication. The majority of communication was not direct but proxied through Jira.
Limited Traceability: Auditing code deployments was challenging. In many cases it was impossible to know which Subversion commit was actually deployed to the various environments.
The Solution
TekFive was tasked with devising and implementing a plan to address these shortcomings. We began the effort by creating a comprehensive modernization strategy focused on adopting DevSecOps principles and transitioning APS to cloud-based infrastructure. As part of this effort we performed an Opportunity Assessment that detailed the potential benefits, risks, and a high-level roadmap for migration, emphasizing business value, workforce training, and enhanced security posture.
One of the key details of the approach was that APS would first be transitioned to a private cloud solution. At the time, NASA had stricter requirements about which workloads could run off premises and there was more general concern about non-Government infrastructure. This simplifying solution allowed NASA to gain more immediate benefit from the technology, people and process transformation while paving the road for an eventual Agency hybrid cloud solution.
TekFive partnered with the Marshall and Agency Computing Services (MACS) team to identify RedHat’s OpenShift as the ideal hosting solution, blending MACS’s infrastructure expertise with TekFive’s platform integration skills, and then crafted a detailed, multi-year Project Execution Plan—approved by NASA leadership—that outlined technical milestones and workforce training, targeting three critical areas: source code control modernization, CI/CD pipeline implementation, and legacy application transformation.
Migrating APS projects from Subversion to Git was important due to the general community adoption of Git and its native integration with many Kubernetes tools. TekFive conducted a product evaluation of available Git providers and chose GitLab due to its community support, open source code base, and additional capabilities it provides such as pipelines. TekFive team members implemented GitLab and then developed a process for safely transitioning APS Subversion projects to GitLab while maintaining the full commit history. The Git conceptual model is significantly different from Subversion and some of these new concepts were difficult for team members to grasp. TekFive served as the subject matter experts on all things Git and provided numerous team and one-on-one training sessions as well as producing APS specific Git documentation to bring developers up to speed.
Implementing a CI/CD pipeline was key because it would help eliminate many of APS issues including manual code deployments, siloed groups and traceability. Originally we envisioned this pipeline would enable teams to perform Continuous Deployment but found that this was challenging to get approval for in NASA. NASA leadership still wanted a full chain of manual control that explicitly approved each release. We also found that even with the automation and approval in place, many NASA owners still wanted their applications to be deployed during the weekly outage window. Finding a CI/CD platform that supported these requirements proved difficult. Instead TekFive implemented a custom pipeline platform that worked in conjunction with GitLab’s CI capabilities to provide the following features.
- Automatic and continual container image scanning for known vulnerabilities.
- DAST and Accessibility scanning of web applications.
- Publishing pipeline scanning results to security team’s system of records.
- Human in the loop approval state that allowed NASA officials to manually sign off on all staging and production code releases.
- Scheduling capabilities to automate the code deployment at certain day and time.
- Full code deployment traceability back to Git commit history.
- Automatic code rollbacks.
At the time of this transformation APS supported ~100 applications. TekFive led the effort to transition these legacy applications so they could be deployed through the pipeline, run on OpenShift and take advantage of each. Early on in this process TekFive piloted the transformation of multiple applications. This provided us early and continual feedback on the operation and usability of the DevSecOps platform and served as the blueprint for other application transformations. TekFive worked directly with each team to provide guidance and expertise on how OpenShift and the CI/CD platform works and how their applications should be modified. TekFive was also responsible for transitioning "orphaned" applications that were not associated with an active team. We took a Minimal Viable Transformation (MVT) approach during the transformation process by only modifying the necessary pieces to be deployed and run on the new platforms. This resulted in a much shorter transition period and a much more stable transition while still gaining much of the benefit from this new approach.
The Benefits
Overall the transition to DevSecOps and private cloud infrastructure has been a huge success for APS. Their overall time to release went from, on average, 3 months to 2 weeks with Annual savings of over $2M in labor and infrastructure costs. Other key metrics include the following.
- 30% reduction in infrastructure costs.
- Retired 73 Virtual machines.
- Allowed APS to increase supported workloads by 200% with no increase in staffing.
- 90% of security vulnerabilities are now caught prior to deployment. Previously these were all caught post deployment.
- 80% reduction in the development lifecycle on average from first commit to production deployment.
Once APS was fully transitioned to a DevSecOps approach on private cloud infrastructure, TekFive began conducting pilots with AWS, Azure and GCP to incorporate public cloud services into the existing infrastructure to supplement and enhance APS’s offerings. This resulted in services from all three public clouds being available for APS customers and AWS and Azure also now provide active/active, active/passive and/or cloud bursting capabilities for the NASA agency because of these efforts.