Skip to content

Bixby Load Test Analysis

Background

Bixby is the core media server backbone for Airtime’s distributed real-time communications framework. It routes audio/visual data to clients as needed and generates raw metrics that are utilized in creating usage and billing reports for third-party applications.

A Bixby Load Test is manually started on Jenkins with a given Bixby version to run against and does the following:

  1. Deploy the specified Bixby Version to the automation host and begin running tests to collect probe data on how CPU, network, and memory is utilized for different use-cases of publisher-to-subscriber scenarios
  2. Parses and averages each publisher-to-subscriber test for an intuitive reading on how the Bixby server handles a test over a set period of time
  3. Generates a CSV file that contains the Bixby Load Test result for the specific build
  4. Lastly, the CSV file is utilized by Jenkins to plot figures similar to the following for CPU, Network, and Memory usage.

Problem

Image of the current Jenkins static CPU graph

The graphs generated by Jenkins, as shown above, is not user-friendly nor interactive. Although data is displayed, it is difficult to interpret how each Bixby Version changed the attributes probed and to compare the behavior between different Bixby branches. Additionally, Jenkins does not have an intuitive way to navigate between displaying different test types. Lastly, there is currently no way of disregarding or marking a build result as invalid, so adding a functionality that could archive a build on the front end is desired.

Goals

Part One: store load test results in a persistent fashion.
Part Two: generate a data visualization that allows user to compare the performance between each Bixby Versions, as well as with historical data.
Part Three: mathematically categorize whether recent builds conform to the trends of previous builds.

Part One: Using AWS RDS to store test results
In order to store test results in a persistent manner, I wrote a python script that is called at the end of a successful Jenkins build to connect to AWS Relational Database Service(AWS RDS) and the most recent build result is appended to the Bixby Load Test table. MariaDB is the underlying database that is run by AWS RDS. Automated daily snapshots of the database is captured and stored for 30 days in case of any uploading failure or data corruption.

The column attributes currently stores the type of test that is being performed, timestamp of when the data is uploaded, Bixby Version that is being built against, four separate CPU usage values (idle, nice, system, user), two network behavior values, one memory available value, the specific Jenkins URL associated with the build, as well as a notes attribute that can contain any additional information about the build.
It was determined that the integration of the data persistence would be at the end of a load test and would utilize the outputted CSV file to update the database.

There are certain builds of the Bixby Load Test, which were identified as less meaningful during data analysis (part 3) and are noted with more information as to why they are disregarded (for now).

Automatic Backups and Database Recovery
Our AWS RDS instance is configured to create an automatic backup each day and can easily be restored through the AWS RDS console. Furthermore, the decision was made to keep the script which outputs a CSV file as a fail-safe, in the event that both the database and its backups cannot be recovered, the data can still be found in individual Jenkins jobs.

Part Two: Data Visualization
The main purpose of a data visualizer for the Bixby Load Test is to be able to see the behavior of the Bixby server’s performance overtime and to compare one specific branch with another.

In order to create an intuitive Bixby load tests visualizer, it should be interactive without overloading the user with controls.

The load test visualization must contain the following capabilities:
Each test must have its own graphs of three (CPU, network, memory), which totals to twelve graphs that would need to be accounted for

  • Comparison is only needed for different branches of the same test type, so at any point, at least six graphs must be present, to display the release branch versus other branches (develop and feature)
  • User should be able to see the average behavior of each Bixby version as well as individual tests that were done
  • Allow user to identify and append to the database to note if a build is to be disregarded for analysis (part 3)

Bokeh is a free and open source data visualization python library, that was chosen as our weapon of choice to tackle this portion of the project due to its dedicated developers that are communicative on Bokeh forums. It also abstracts the process of serving Bokeh content, and does not require any HTML/CSS to display but uses objects that build upon one another to construct the graphs.

Pandas and NumPy were two Python packages that were heavily utilized along with Bokeh to read files and databases, as well as grouping attributes of the database for analysis.

Screenshot of a portion of the finalized version of the Bixby Load Test Dashboard internal webpage.

The resulting data visualization is hosted on an internal webpage, called the Bixby Load Test Dashboard, as shown above.

Detecting Failure and Restarting Host
Since the Bixby visualization dashboard is hosted on its own VM, sysctl is used to monitor the Bokeh server and will kill and restart the program if any anomaly is detected. Additionally, a crontab file is created to check whether new changes have been made in the data visualization repository, and automatically pulls all changes to the host and restarts the dashboard.

An abridged visual of how the Bixby Load Test Dashboard is structured.

Part 3: Bixby Load Test Pass/Fail
There are currently no metrics to categorize whether a Bixby build is in line with the recent trend-line and Bixby Load Test results have historically been assessed visually for validity and manually disregarded/archived if results were erroneous.

The process can be laborious and riddled with human subjectivity. In order to automate the analysis of a load test result with historical data, a python script is created to run at the end of a successful Jenkins build. The script parses the outputted load_test.csv that is automatically generated at the end of a Jenkins build and contains the load test data. The historical median is then used to determine whether the most recent run conforms to the trend of previous data.

An abridged visual of the new workflow for the Bixby Load Test.

List of some of the things I did and learned this summer:

  • Utilized a database that is constantly updated and queried to create the front end components of the data visualization.
  • Learned about how Ansible can contain playbooks that describes an environment to which a host should include when being created and can automatically (and synchronously!) deploy hosts to which follows the specified playbook “template”.
  • How VPC security groups are created and defined to limit access to an IP
    Learned how Jenkins can be utilized to run tests by allocating a build parameterized host environment and obtain a specific Github branch to build with.
  • Practiced creating technical documentation.
  • Learned about AMIs and how hosts are automated to check system health and redeploy if needed using cron and sysctl.
  • Better understand how credentials that are needed in scripts can be better protected using vaults and hidden config files.
  • Learned about crontab files and cron processes.
  • Utilized Pandas, NumPy, Bokeh, and more to create the load test pass/fail and the data visualization.
  • Created a team-wide wiki that lists major active projects names and how they fit into the Airtime media stack.
Published inUncategorized

Be First to Comment

Leave a Reply

%d bloggers like this: