Themes

Security in DevOps

Security and DevOps (DevSecOps): Security concerns challenge development, deployment, and maintenance efforts across the spectrum. What challenges are your site facing? What solutions are working? We encourage talks about sharing the work at your site or specific to your team.

Understanding New Technologies for DevOps

Containers: Container technologies are improving, and many teams are exploring containers for application deployments and as infrastructure. Talk about the work at your site regarding container research and/or application.

The Role of AI in Scientific Software Development: Tools like ChatGPT, GitHub Copilot, Ask Codi, and others are rapidly improving and present the opportunity for a unique shift in software development practices.  It is important to understand the capabilities and risks associated with these tools in the context of scientific software workflows.  How should these technologies be best integrated into our existing processes?

Overcoming Challenges with Continuous Integration and Continuous Deployment

Dependency Management: Modular software development requires substantial investment in building and maintaining dependencies. What tools are part of your infrastructure to support these processes? Are you able to share dependency builds across projects? What opportunities are there to simplify these processes?

DevOps of organizations: With an increasing focus on modular software, many projects now contain multiple, separately deployed, sub-projects. Often the DevOps needs of the sub-projects overlap and there is a need to avoid duplicating DevOps infrastructure. The common infrastructure must then be maintained on an organization level. How are organizations maintaining generalized infrastructure? What tooling gaps exist? Have any best practices emerged?

Software Configuration Management: A defensible and formal software configuration management plan provides a framework for increased quality, compliance with departmental policies, and unassailable application requirements and code changes throughout the development lifecycle. How do you manage and track software changes? Can you attribute all software changes to a given requestor, developer, tester, and approver?

Funding and Open-Source Pathways for Software Sustainability: How do we grow a sustainable community around software, including hard money funding sources and open-source development? Please share your ideas/approached on this theme?

DevOps Infrastructure for CI/CD

DevOps infrastructure and pipeline development: Software projects use processes and tools to create continuous integration and deployment (CI/CD) pipelines. Examples may include multiple repository integration, workflow implementation, security scanning, and pipeline automation. What DevOps tools does your team use?

CI/CD in Practice at Government Labs: Describe current usage and adoption of CI/CD for major and smaller scientific software projects at your lab. What is a minimal set of CI/CD for responsible scientific software development? Should funding proposals require a Scientific Software Management Plan? Regression test data can be significant. How do you handle regression test data? In the same repo, a separate repo, or in a database of successful runs?

Tools and Infrastructure for Integrated Software Development in HPC:  Enabling industry standard DevOps practices across provided HPC facilities is a difficult challenge. Often, these systems are provided by operational facilities to an external community. Additionally, these systems are so advanced and specialized that their environment is not easily replicated anywhere else. These are some factors that make it challenging to adopt standard CI/CD practices when targeting these HPC systems. What tools and infrastructure would help us bridge this gap?

Heterogenous System Workflows: Complete workflows often operate over several disparate compute platforms. This results in extra difficulties in moving data, organizing computations, viewing results, and archiving.  A moderately complex example of this is creating a simulation input on a workstation, running the simulation(s) on an HPC machine, viewing results through a web portal, and then archiving for future use. Making these heterogenous system workflows reliable and robust requires significant care and vigilance. What are some of the barriers from preventing development of such workflows? How do you manage the complexity, especially when many things are out of your control (e.g., HPC down-time).

Incorporating Applications After Deployment into Larger Workflows

Outage Recovery: Handling incidents within deployments in a structured, controlled, and recorded fashion is core to having a stable service. How are you handling these incidents, if at all, at your site or on your team?

Scientific Software Development Challenges

Research Software Engineering as a Career: In this session, veteran DevOps professionals and software engineers from national labs will share their perspectives on the unique career challenges and opportunities in national labs. How does one manage the sometimes-competing demands of quality assurance requirements and research in developing scientific software.

Software engineering for sustainable scientific software: Developers across the complex are adapting and applying software engineering best practices to the development of scientific software. Examples include test-driven development, Agile methodologies, version control, requirements engineering, and verification & validation graded approach. What are some of your practices for engineering sustainable scientific software? What strategies are being employed to invert the testing pyramid?

Coding For Graphics Processing Units (GPUs): Creating production code that runs efficiently on GPUs is a challenge. How is your team or site overcoming these challenges? And what challenges still need addressing? Are there lessons learned that can be shared? And with a variety of vendor provided accelerators one must use a programming model to take advantage of GPU’s. There are many programming models available: OpenMP, SYCL/DPC++, CUDA, HIP and so on. What are your experiences with various programming models?

Integrating Open-Source Software (OSS): Supporting a vibrant community of developers on a public code repository often means accepting contributions from unknown sources or individuals whose public accounts cannot be directly linked to their facility permissions. Though public resources are designed with these cases in mind (e.g., a Pull Request and insolated CI resources) the same is not true for many existing HPC systems. If CI/CD is enabled for existing systems, developers will need to undertake additional time-consuming responsibilities that often remove its “continuous” aspects to integrate testing with their external repository. Are there tools or workflows that could be integrated that minimize a developer’s required actions while still ensuring a HPC systems assurance requirements are met?

HPC System Management and DevOps

System Management: HPC system management is evolving with new software, techniques, testing, and deployment at large scale. Discuss how your site has adapted, best practices, DevOps, configuration management, and tools. What does the future of HPC system management look like? How does cloud native software fit into HPC system management?

Optimizing jobs for HPC systems: Sometimes jobs submitted to HPC systems isn’t optimized for “fit” so that the system can be used to increase throughput of the jobs submitted via a scheduling system. What is your site doing to help users optimize the fit of their jobs to the HPC system?

DevOps Principle 2 of Continuous Feedback

Programming the data center: Workflows in scientific computing are becoming more complex, depend on an increasing number of components of a data center, including compute, network, and storage, and are incorporating automation to manage their complexity, and to reduce their overall time to solution. Data centers and computing facilities which provide resources to enable these workflows must become more dynamic and programmable to enable, support, and advance this evolution. What methodologies and tools are your data centers using to adapt to this new model of scientific computing? What are the challenges with creating a programmable or software-defined data center?

Data Centric Computing/Data Discovery: A lot of data is generated across tiering structures loosely falling within hot, warm, and cold access patterns; from testing applications to logs of users running on the systems. Talk about the research/exploration/application work ongoing at your sites on this topic.

Building Successful Teams

Human Resource Management for Technical Teams: Managing a technical project can be learned from a textbook or taking a certification, but managing a technical team comes with nuances that cannot be learned from other industries. The success and popularity of cross-functional agile teams introduces challenges to the Project Manager that were not previously evident with siloed teams of software engineers, system administrators, project managers, business analysts, etc. What are your experiences with managing cross-functional teams? What are ways to facilitate collaboration across multiple national laboratories on development of scientific software?