Case study

CI/CD and testing for Nutanix Xi Epoch -
design and development

Our client, Nutanix, is a global leader in cloud software and hyperconverged infrastructure solutions. One of the company’s products, Nutanix Xi Epoch, is a real-time observability tool for multi-cloud applications. It provides an instant view into the system, monitors its health and makes visible the interactions between the components in a distributed environment.

Read more

Challenge

Nutanix Xi Epoch is the full-stack monitoring and alerting service for every enterprise application at any scale across public clouds and private infrastructure. It generates terabytes of metrics and integrates with over 200 common applications to holistically understand application performance. The system is operations-ready from first launch.

The challenge was to build a continuous integration system that would be easy to use and simple to develop for multiple engineers around the world.

Logo Nutanix
CodiLime’s DevOps teams helped us reduce our development cycle time by 50%, and its software engineering helped us improve the quality of our product by increasing our test coverage by more than 80% in crucial parts of our system. This allows us to deliver new features and capabilities seamlessly without compromising on simplicity to our global customer base. We’re looking forward to further collaboration with CodiLime as the Epoch product evolves with time.
Harjot Gill
General Manager
Nutanix Xi Epoch

Results & benefits

Nutanix results

The benefits of the solution include:

  • Decreased full-build time from 2 hours to around 1 hour.
  • Easier way to identify critical issues and automatically block deployment of unstable code.
  • Few test layers, including smoke tests covering collectors.
  • Improved CI/CD quality - overall architecture is constructed from separate blocks with minimal dependencies.
  • More efficient build system - thanks to separated components and clean CI workflows.
  • Improved product quality - the team discovers and fixes issues immediately.

Solution

The project was split into two phases - designing the new CI/CD system and building the test automation framework.
The needs of the system included:

Nutanix solution

CI/CD

  • Run periodical, scheduled builds
  • Run full system builds or selected components only
  • Run partial builds triggered by pull request
  • Ensure high scalability and reliability
  • Complex build system
  • Multiple cross-dependencies with multiple management systems

Tests automation

  • Complex, cloud-based architecture
  • Multiplatform framework needed
  • Smoke tests
  • Single component tests
  • e2e tests
  • UI tests
  • High scalability and reliability

The first stage of the project involved migrating CI workflows from Jenkins to CircleCI. This was achieved by methodically gathering requirements and creating reusable resources to improve build reproducibility and reliability. Changes to the toolchain brought corresponding changes to the build system’s structure and flows. Multiple stages have been separated to parallelize the work and speed up the process and make it as close to Agile and DevOps standards as possible. Ultimately, faster feedback means the work is done more effectively. Using CircleCI empowered us to make the builds highly reproducible and improve the way we conducted them.

The second stage involved preparing a scalable test automation framework that would be easy to integrate with a continuous integration system and enable developers to launch it locally against any built part of a system. This was achieved by mocking up every part of the system and using a containerized environment (docker and docker-compose) for the data aggregators and test execution.

Diagram 1 presents the test architecture in CI for testing the Collector. All components in this architecture, except Agent, run within separated docker containers connected together using docker-compose.

Diagram 2 presents the scheme for Periodic builds that are executed by CircleCI in constant time intervals (currently every 6 hours). This process performs full builds of every component in the system, runs them using Kubernetes and finally performs all the tests. Additionally, the test results are delivered to the user via Slack.

Download a PDF version of this case study

Download

Need support with your specific case?