Big data analytics is revolutionizing the way we interact with data across all industries. However, to bring analysis, visualizations and interactions to end users, software products usually go through a process that is lengthy, arduous, and hard to maintain. Many choose to integrate a Business Intelligence product and find themselves with additional infrastructure complexity and points of failure. Others choose to build these visualizations and interactions themselves and find their team caught up in a complex project instead of concentrating on developing their core product and business.
To date, there is no well-established modular framework that simplifies the implementation of data products. These data products need to enable their users to visualize and interact with data while at the same time integrate with existing services and security infrastructure. To this end, we introduced ChartFactor, a component-based toolkit that speeds up the implementation of data applications that visualize and interact with data, no matter if it resides in your big data engine, behind a BI server, in the cloud or in your homegrown server. In contrast, BI toolkits only interact with their own server. ChartFactor builds upon and combines the latest advances in data visualization, data access, and simplified programming models.
In this article, I illustrate this flexibility by using two examples that visualize data stored on Spark SQL, exposed as a REST API secured and managed by the Amazon API Gateway. ChartFactor’s modular architecture allows to interact with the gateway with simplicity while keeping gateway's library dependencies separate. At the same time, its powerful programming model allows the developer to build full dashboards with a few lines of code.
Example 1: Building a data application with a few lines of code
This example shows how easy it is to create data applications that interact with custom backends using ChartFactor. In this case, I use Spark SQL fronted by the Amazon API Gateway to leverage its IAM security services.
The dataset is derived from the Amazon’s Sample Database for Redshift. The data represents ticket sales activity for a fictional TICKIT website, where users buy and sell tickets online for different types of events.
Example 2: Putting it all together using ChartFactor Studio
This video shows how you can build complete and interactive data applications, in this case a Gun Violence analysis, using ChartFactor and ChartFactor Studio. We are pulling data from a custom backend, which is Spark SQL fronted by the Amazon API Gateway.
The gun violence dataset is derived from datasets provided by the United Nations Office on Drugs and Crime (UNODC) and the World Bank for per-capita GDP figures. Their Wikipedia links for the source datasets are below.
Please note that the gun violence dataset is incomplete at best because important metrics such as Gun Homicides per 100k people are not provided by many countries.
Additional technologies used in the examples
ChartFactor customers usually have their data applications integrated with OAuth2 and Single Sign On (SSO) via Security Assertion Markup Language (SAML). For this article, I use the Amazon API Gateway to leverage its IAM security services since it takes a few minutes to set up. I also use Spark SQL to power the queries on the datasets. Spark SQL extends Apache Spark for processing structured data using the familiar SQL syntax.
To wire things up, I use Spring Boot where I can easily create a Spark SQL local context and enable SQL queries via REST. It is also possible to create a remote Spark SQL context in a YARN cluster environment if needed.
The high level architecture for these examples looks like this:
The picture below shows the Amazon API Gateway console after setting up the REST endpoints and enabling AWS IAM security.
Conclusion
ChartFactor is a modular and future-proof data visualization toolkit that can adapt and evolve to match changing market requirements, your needs, and technology advancements. In this article I showed how ChartFactor components enable the visualization and interaction of SQL data with just a few lines of code.