Polaris: A System for Query, Analysis, and Visualization of Multidimensional Relational Databases

Over the last couple of decades, large multi-dimensional databases have become ubiquitous in a vast array of application areas, such as corporate data warehouses as well as projects in scientific computing such as the Human Genome Project and the Digital Sky Survey. One of the major challenges in extracting meaningful information from such large scale databases is the “discover structure, find patterns, and derive causal relationships” from the data. A popular approach is to treat these databases as $n$-dimensional cubes, where each dimension corresponds to a dimension in the relational schema. One of the most popular interfaces for working with multi-dimensional databases is Pivot Table, largely popularized by Microsoft Excel, which allows the aforementioned data cubes to be rotated or pivoted so as to encode its various dimensions as rows or columns of the table. Previous work in this area can broadly be categorized into 3 main areas of focus: (a) formalisms for graphical specifications which include earlier works such as Bertin’s ‘Semiology of Graphics’ as well as recent work such as Wilkinson’s ‘The Grammar of Graphics’, (b) table-based displays which include static table displays such as scatterplot matrices and Tellis displays as well as interactive ones such as Pivot Tables, and (c) tools for visual exploration of datasets, such as VQE, Visage, DEVise, Tioga-2, and VisDB. This paper presents Polaris, a multi-dimensional database exploration interface extending the Pivot Table interface and allowing for direct generation of “rich, expressive set of” graphical displays. Using an algebraic formalism over the database fields, Polaris constructs tables consisting of layers and panes, with the possibility of a different graphics in each pane. For the sake of brevity of this summary, although the paper provides detailed description of the Polaris system, we only discuss its major components here.

An analysis tool dealing with multi-dimensional databases must (a) allow the creation of data-sense displays since databases can have a large number of records and dimensions, (b) be able to create multiple display types in order to cater to the different analysis tasks, and (c) offer an exploratory interface to permit “unpredictable exploration” of data by allowing them to ranpidly change the data being viewed. The visual specification of Polaris consists of: (a) specification of table configurations, (b) type of graphic in each pane, and (c) visual encoding details. The authors propose a formal mechanism using a defined algebra for specifying table configurations. The interface defines the table configuration by placing fields on the axis shelves, and the shelf content is interpreted as expressions in table algebra. Operands are interpreted as sets, and instead of 4 possible field types (nominal, ordinal, quantitative, and interval), Polaris uses only 2 by mapping interval and nominal to quantitative (Q) and ordinal (O) respectively. There are 3 algebraic operators: concatenation ($+$; ordered union of sets), cross ($\times$; Cartesian product of sets), and nest ($/$; a variant of cross but only creates entries for “records with those domain values”). Next, for specifying the type of graphic in each pane, the space of graphics has been categorized into three families depending on the type of fields: O-O (e.g., tables), O-Q (e.g., bar charts), and Q-Q (e.g., maps), and each family contains several variants of graphics depending on how records map to marks (marks in Polaris are rectangle, circle, glyph, text, Gantt bar, line, polygon, and image). For encoding the visual details, the authors use a set of retinal properties for shape, size, orientation, and color of the marks. Polaris allows construction of visual queries by selecting the records using user-defined filters, which are then partitioned and put into panes, followed by transforming these records within the panes using either aggregation or relational predicate filters. Finally, the authors present 2 scenarios to demonstrate the capabilities of Polaris. First, a commercial database from a nation-wide coffee chain is used to identify the product(s) with high marketing costs and insufficient profit. Next, a parallel graphics library, Argus, is analyzed to isolate and identify performance bottlenecks, and visualization using Gantt charts is able to identify the source of the performance issues.

This paper presents Polaris, an interface for rapidly generating table-based graphical displays of data from multi-dimensional relational databases. The authors introduce a formalism for specifying table configurations and graphics visualizations, and allow interpretation of data transformations and visual queries. The paper is written quite well, and is easy to follow for most parts. The content in Section 5 could have been made clearer with examples, either visual or just text-based explanations, which would lead to a better understanding of the data transformations. Another shortcoming, as noted by the authors, is the lack of a standard SQL statement to perform the requisite partitioning in one query. Finally, as suggested by the authors, another improvement would be to allow Polaris to generate database tables from a selected set of graphical marks, this leveraging the correspondence of graphical marks to tuples.

The Polaris system went on to become the now massively popular interactive data visualization software Tableau.

This summary was written in Fall 2020 as a part of the CMPT 757 Frontiers of Visual Computing course.