Beyond the Basics of D3

D3 pic.png

Many people have heard about the data graphing library called D3, but as a D3 enthusiast, I jumped at the chance to see a “beyond basics” presentation this summer by Matt Lane (@mmmaaatttttt), and wrote this article on it.

The D3.js library has many intricacies to deal with; however, those same complexities become useful for creating a very flexible diagramming experience when the situations calls for it. The key to harnessing D3 and creating the data transformations of your dreams, lies in understanding the D3 tool-set. For instance, learning how selection and appending nodes works properly, could improve your D3 application by making your code, your DOM, and your in-memory objects more efficient. Of course, keeping certain principles like moving all data manipulation to the server will also help you get far in your visualizations.

Calling .csv() is usually a good starting point for getting static or generated data into your application. It uses the first row of a text file to designate names of properties, which are then used to record values taken from each subsequent line. Calling .csv() is also nice because you can supply a ‘row’ function as a parameter to it and that function will be used to process/parse each row in any custom way you want; you can also skip a row by returning undefined or not returning at all. A lot of how D3 visualizes data has to do with how data is structured and formatted after it makes it into browser memory. For example, calling .map() is a nice way to remove non-unique values from the data by creating a hash map (and overriding entries with already existing keys).

The .histogram() function clusters your data into separate bins for you, useful for visualizing distributions. You can instruct D3 exactly how to separate the data by chaining a .value() function to the generator (this also works for pies and stacks). D3 makes educated guesses on how to split the bins and how many items to put in each bin, as well as how to draw the bins in terms of graph coordinates. You can also call .threshold() to try to set the number of bins. I say ‘try’ because D3 is opinionated in certain measurements, which are hard to force to an exact value sometimes. Another good way to shape your data is with .pie() to make pie charts, which can make up for some nice custom wedge visualizations.

In its inner workings, D3 is completely independent of jQuery, the popular DOM selection and manipulation library, and instead uses the native document.querySelector() function; however, many D3 functions such as .attr() work exactly like they would with jQuery. So after you use D3 to select DOM elements, you can also manipulate them with D3 (for example, you can call .style() or .transition() to make CSS changes). You can also call .text() and set the text content of selected elements, or only for specific elements by passing in a function with an ‘index’ parameter.

Unlike jQuery, D3 does not use data attributes like those used by jQuery’s .data() function and just keeps track of data objects and elements on its own. This process is initiated by calling D3′s own .data() function, which will join data to elements and return the ‘update’ selection (elements that were successfully bound to data). Since mismatches can and do occur, calling .enter() will give you the ‘enter’ selection (placeholder nodes for each datum that didn’t match a DOM element), and calling .exit() will give you the ‘exit’ selection (existing DOM elements for which no datum was found). All three of these functions select and iterate over data/nodes with whatever functions that are chained to them, like for example, you can use .exit().remove() on a selection to remove elements that could not be bound to any data. D3 attempts to correct these mismatches to the best of its ability, but enter nodes and exit elements can and will happen. You can also put breakpoints or log calls in the functions chained after selections to see how each node/element is processed.

Understanding how new elements are added by D3 is also important. Calling .append() will create children for each selected node, and set the ‘parent’ of these nodes to the selected node. You have to be careful when doing this, because if .append() is called on non-existing elements, it will instead append the new elements to the bottom of the page. You can also use .node() to get the HTML for each selected element.

Of course, when using D3 in conjunction with React, joining data to elements is not used as much because React already iterates over data and creates HTML elements from it; however, React’s declarative nature works perfectly with a lot of other useful geometry such as SVG paths. To understand more about the benefits of using D3 with React together, check out this article.

In general, SVG elements work amazingly with D3. Their declarative nature (compare to something like Canvas) allows React to take control of as much geometry as it needs to, in order to render something amazing. g elements work nice to group other SVG elements and to transform them in groups, while path and line elements are perfect for plotting line charts. D3’s svg functions allow you to pass data easily into SVG paths and lines, which is easier than coding these SVG elements via their various letter commands. Hovering and displaying different values on a graph as you move the mouse can also be achieved by rendering SVG shapes along a path and making them invisible. Then, one could join data on top of each SVG shape using D3, as well as add a listener to a parent element that would watch for any mouse movements along the path and display whatever data was joined to the local SVG shape. This is, in fact, much easier than trying to calculate mouse positions on a graph and then trying to find and retrieve the correct piece of data on your own.

Scales are a mapping between two ranges of values, with the default one constructed via .scaleLinear(). Scale functions are usually called in a chain with .domain() and .range(), and essentially define value transformations from the domain range to the scale range. You can also use .domain() and .range() as getters. Using .scaleThreshold() creates a simple value mapping function between two arrays of values. Calling .min() and .max() will allow you to define lowest and highest points of data values. D3 Axes (plural for axis) allow you to render values on a line with ticks and labels, but are a little bit harder to work with than other D3 geometry and are less compatible with React (you still need to select nodes with D3).

D3 is also a great tool for making maps, especially because you can use it to take GeoJSON data and transform it into path elements. TopoJSON, a popular extension of GeoJSON, stitches together geometries from shared line segments called arcs, making for smaller data files. The cool thing about TopoJSON is that you can get it for any zip code in the United States.

Thanks for reading and be sure to check out this presentation by Irene Ros (@ireneros) on what’s new in D3v4 here:

Jeff Poyzner