DevSecOps Infrastructure for CI/CD
DevSecOps Infrastructure and Pipeline Development: Software projects use processes and tools to create continuous integration and deployment (CI/CD) pipelines. Examples may include multiple repository integration, workflow implementation, security scanning, and pipeline automation. What DevSecOps tools does your team use?
Offering Secure CI and Development Resources for Small Teams: Providing resources to empower development, whether for your individual project or an entire team, presents challenges and added security concerns. What open-source solutions and potential strategies are available that can offer a lower barrier to getting started without introducing additional risk for your project?
Maintaining Secure Scientific Software: Security concerns challenge development, deployment, and maintenance efforts across the spectrum. What challenges does your project face? What solutions are working? We encourage talks about sharing the work for your project, team, or facility.
CI/CD in Practice at Government Labs: Describe current usage and adoption of CI/CD for major and smaller scientific software projects at your lab. What is a minimal set of CI/CD for responsible scientific software development? Should funding proposals require a Scientific Software Management Plan? Regression test data can be significant. How do you handle regression test data? In the same repo, a separate repo, or in a database of successful runs?
Tools and Infrastructure for Integrated Software Development in HPC: Enabling industry standard DevSecOps practices across provided HPC facilities is a difficult challenge. Often, these systems are provided by operational facilities to an external community. Additionally, these systems are so advanced and specialized that their environment is not easily replicated anywhere else. These are some factors that make it challenging to adopt standard CI/CD practices when targeting these HPC systems. What tools and infrastructure would help us bridge this gap?
Heterogenous System Workflows: Complete workflows often operate over several disparate compute platforms. This results in extra difficulties in moving data, organizing computations, viewing results, and archiving. A moderately complex example of this is creating a simulation input on a workstation, running the simulation(s) on an HPC machine, viewing results through a web portal, and then archiving for future use. Making these heterogenous system workflows reliable and robust requires significant care and vigilance. What are some of the barriers from preventing development of such workflows? How do you manage the complexity, especially when many things are out of your control (e.g., HPC down-time).
Containers: Container technologies are improving, and many teams are exploring containers for application deployments and as infrastructure. Discuss the work at your site regarding container research and/or applications. What compatibility, portability, and performance characteristics exist? Are containers benefitting your workflows and providing a return on investment? What are the drawbacks to the adoption of containers?
Scaling CI/CD to HPC Systems: Software intended for use on HPC systems is often developed on personal workstations. Similarly, most if not all automated testing of software is also done on workstations. What strategies can be employed to ensure that code changes that work fine on personal workstations and test machines also work well on HPC systems at scale?
Overcoming Challenges with CI/CD
Dependency Management: Modular software development requires substantial investment in building and maintaining dependencies. What tools are part of your infrastructure to support these processes? Are you able to share dependency builds across projects? What opportunities are there to simplify these processes?
DevSecOps of Organizations: With an increasing focus on modular software, many projects now contain multiple, separately deployed, sub-projects. Often the DevSecOps needs of the sub-projects overlap and there is a need to avoid duplicating DevSecOps infrastructure. The common infrastructure must then be maintained on an organization level. How are organizations maintaining generalized infrastructure? What tooling gaps exist? Have any best practices emerged?
Software Configuration Management: A defensible and formal software configuration management plan provides a framework for increased quality, compliance with departmental policies, and unassailable application requirements and code changes throughout the development lifecycle. How do you manage and track software changes? Can you attribute all software changes to a given requestor, developer, tester, and approver?
Funding and Open-Source Pathways for Software Sustainability: How do we grow a sustainable community around software, including hard money funding sources and open-source development? Please share your ideas/approached on this theme?
Incorporating Applications into Larger Workflows
Outage Recovery: Handling incidents within deployments in a structured, controlled, and recorded fashion is core to having a stable service. How are you handling these incidents, if at all, at your site or on your team?
Secure and Sustainable Cross-Facility Workflows: Discuss operational integration between facilities and workflow systems. While individual user access to facility provided HPC systems is easily addressed, how do we facilitate external workflow system integration with facility-provided HPC systems? This requires new modes of access and trust between facilities and their external integrations.
Scientific Software Development Challenges
Software Engineering for Sustainable Scientific Software: Developers across the complex are adapting and applying software engineering best practices to the development of scientific software. Examples include test-driven development, Agile methodologies, version control, requirements engineering, and verification & validation graded approach. What are some of your practices for engineering sustainable scientific software? What strategies are being employed to invert the testing pyramid?
Coding For Graphics Processing Units (GPUs): Creating production code that runs efficiently on GPUs is a challenge. How is your team or site overcoming these challenges? And what challenges still need addressing? Are there lessons learned that can be shared? And with a variety of vendor provided accelerators one must use a programming model to take advantage of GPU’s. There are many programming models available: OpenMP, SYCL/DPC++, CUDA, HIP and so on. What are your experiences with various programming models?
Integrating Open-Source Software (OSS): Supporting a vibrant community of developers on a public code repository often means accepting contributions from unknown sources or individuals whose public accounts cannot be directly linked to their facility permissions. Though public resources are designed with these cases in mind (e.g., a Pull Request and insolated CI resources) the same is not true for many existing HPC systems. If CI/CD is enabled for existing systems, developers will need to undertake additional time-consuming responsibilities that often remove its “continuous” aspects to integrate testing with their external repository. Are there tools or workflows that could be integrated that minimize a developer’s required actions while still ensuring a HPC systems assurance requirements are met?
Funding and Open-Source Pathways for Software Sustainability: How do we grow a sustainable community around software, including hard money funding sources and open-source development? How have you secured stable funding sources to promote open-source development? Please share your ideas/approached on this theme.
AI/ML in DevSecOps
The Role of AI in Scientific Software Development: Tools like ChatGPT, GitHub Copilot, Ask Codi, and others are rapidly improving and present the opportunity for a unique shift in software development practices. It is important to understand the capabilities and risks associated with these tools in the context of scientific software workflows. How should these technologies be best integrated into our existing processes?
Data and Model Curation to Support Machine Learning (ML) Workloads
HPC System Management and DevSecOps
System Management: HPC system management is evolving with new software, techniques, testing, and deployment at large scale. Discuss how your site has adapted, best practices, DevSecOps, configuration management, and tools. What does the future of HPC system management look like? How does cloud native software fit into HPC system management?
Optimizing Jobs for HPC systems: Sometimes jobs submitted to HPC systems are not optimized for “fit” so that the system can be used to increase throughput of all the jobs submitted via a scheduling system. What is your site doing to help users optimize the fit of their jobs to the HPC system? What are your team’s or site’s best practices/configurations for using Slurm, PBS, etc?
DevSecOps Principle 2 of Continuous Feedback
Programming the Data Center: Workflows in scientific computing are becoming more complex, depend on an increasing number of components of a data center, including compute, network, and storage, and are incorporating automation to manage their complexity, and to reduce their overall time to solution. Data centers and computing facilities which provide resources to enable these workflows must become more dynamic and programmable to enable, support, and advance this evolution. What methodologies and tools are your data centers using to adapt to this new model of scientific computing? What are the challenges with creating a programmable or software-defined data center?
Data Centric Computing/Data Discovery: A lot of data is generated across tiering structures loosely falling within hot, warm, and cold access patterns; from testing applications to logs of users running on the systems. Talk about the research/exploration/application work ongoing at your sites on this topic.
Building Successful Teams
Managing Technical Teams: Managing a technical project can be learned from a textbook or taking a certification, but managing a technical team comes with nuances that cannot be learned from other industries. The success and popularity of cross-functional agile teams introduces challenges to the Project Manager that were not previously evident with siloed teams of software engineers, system administrators, project managers, business analysts, etc. What are your experiences with managing cross-functional teams? What are ways to facilitate collaboration across multiple national laboratories, universities, and other government agencies on development of scientific software?
Research Software Engineering (RSE) as a Career: Please share your RSE career experience. How do you manage the sometimes competing demands of quality assurance requirements and research in developing scientific software. How do you help create this category at your site? How do you distinguish this career from that of a general software engineer?
Role of Software Developers in DevSecOps: Many decisions go into creating a scientific software package, ranging from the choice of primary programming language to source formatting. While some of these choices may be dictated by use cases, others are configurable. Are there decisions which ease DevSecOps? Are there decisions which make DevSecOps more complicated? How do we engage with developers so that software is designed to be easier to maintain? What resources do teams need to better understand the DevSecOps consequences of their decisions?
Successful Strategies for Sourcing and Mentoring Talent
Software Engineering Research
Metadata management
Bringing Human Centered Design Approaches to Support Scientific and Engineering Workflows
Incorporating Technical Advancements to Support the Software Engineering Research Community Activities, (such as VR/AR, IOT, Robotics etc.)