Posts

Polaris: A System for Query, Analysis, and Visualization of Multidimensional Relational Databases

Over the last couple of decades, large multi-dimensional databases have become ubiquitous in a vast array of application areas, such as corporate data warehouses as well as projects in scientific computing such as the Human Genome Project and the Digital Sky Survey. One of the major challenges in extracting meaningful information from such large scale databases is the “discover structure, find patterns, and derive causal relationships” from the data. A popular approach is to treat these databases as $n$-dimensional cubes, where each dimension corresponds to a dimension in the relational schema. One of the most popular interfaces for working with multi-dimensional databases is Pivot Table, largely popularized by Microsoft Excel, which allows the aforementioned data cubes to be rotated or pivoted so as to encode its various dimensions as rows or columns of the table. Previous work in this area can broadly be categorized into 3 main areas of focus: (a) formalisms for graphical specifications which include earlier works such as Bertin’s ‘Semiology of Graphics’ as well as recent work such as Wilkinson’s ‘The Grammar of Graphics’, (b) table-based displays which include static table displays such as scatterplot matrices and Tellis displays as well as interactive ones such as Pivot Tables, and (c) tools for visual exploration of datasets, such as VQE, Visage, DEVise, Tioga-2, and VisDB. This paper presents Polaris, a multi-dimensional database exploration interface extending the Pivot Table interface and allowing for direct generation of “rich, expressive set of” graphical displays. Using an algebraic formalism over the database fields, Polaris constructs tables consisting of layers and panes, with the possibility of a different graphics in each pane. For the sake of brevity of this summary, although the paper provides detailed description of the Polaris system, we only discuss its major components here. ...

Deep Convolutional Priors for Indoor Scene Synthesis

Given the importance and the ubiquity of indoor spaces in our everyday lives, the ability to have computer models which can understand, model, and synthesize indoor scenes is of vital importance for many industries such as but not limited to interior design, architecture, gaming, virtual reality, etc. Previous works towards this goal have relied on constrained synthesis of scenes with statistical priors on object pair relationships, “human-centric relationship priors”, or constraints based on “hand-crafted interior design principles”. Moreover, owing to the difficulty of unconstrained room-scale synthesis of indoor scenes, prior work has focused on either small regions within a room or additional inputs (in the form of fixed set of objects, manually specified relationships, natural language description, sketch, or 3D scan of the room) as constraints, and deep generative models such as GANs and VAEs struggle with producing multi-modal outputs. Driven by the success of convolutional neural networks (CNNs) in scene synthesis tasks and the availability of large 3D scene datasets, this paper proposes the first CNN-based autoregressive model to design interior spaces, where given the wall structure and the type of a room, the model predicts the selection and placement of objects. ...

PolyGen: An Autoregressive Generative Model of 3D Meshes

Polygonal meshes are widely used in computer graphics, robotics, and game development to represent virtual objects and scenes. Exisitng learning-based methods for 3D object generation have relied on template models and parametric shape families. Progress with deep learning based approaches has also been limited because meshes are challenging to work with for deep networks, and therefore recent works have instead used alternative representations of object shape, such as voxels, point clouds, occupancy functions, and surfaces. These works, however, leave mesh reconstruction as a post-processing step, leading to inconsistent mesh quality. Drawing inspiration from the success of previous neural autoregressive models applied to sequential raw data (e.g., images, text, and raw audio waveforms) and building upon previously proposed components (e.g., Transformers and pointer networks), this paper presents PolyGen, a neural autoregressive generative model for generating 3D meshes. ...

CARLA: An Open Urban Driving Simulator

The development and subsequent deployment of autonomous ground vehicles is a popular instantiation of achieving perfect sensorimotor control in 3D environments. It is a perception-driven control task, with one of the most difficult scenarios being navigating in densely populated urban environments, primarily because of but not limited to ...

Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments

The ability to provide human language instructions to robots for carrying out navigational tasks has been a longstanding goal of robotics and artificial intelligence. This task involves achieving visual perception and natural language understanding objectives in tandem, and while advancements in visual question answering and visual dialog have enabled models to combine visual and linguistic reasoning, they do not “allow an agent to move or control the camera”. Natural language-only commands abstract away the visual perception component, and are not very linguistically rich. While hand-crafted rendering models and environments and simulators built thereupon try to address these problems, they possess a limited set of 3D assets and textures, converting the robot’s challenging open-set problem in the real world to a fairly simpler closer set problem, which in turn deteriorates the performance on previously unseen environments. Finally, although reinforcement learning has been used to train navigational agents, they either do not leverage language instructions or rely on very simple linguistic settings. This paper proposes MatterPort3D Simulator, “a large-scale reinforcement learning environment based on real imagery” and an associated Room-to-Room (R2R) dataset with the hope that these will help push forward advancements in vision-and-language navigation (VLN) tasks and improve generalizability in previously unseen environments. ...