How to Transform Data in Plotrb

Acknowledgment: This tutorial is based on Vega's documentation.


Data Transform is one of the most important components of Plotrb. It is responsible for performing operations on data set prior to visualization.

A transform is specified by its type. In this article I will describe all transform types allowed by Vega.

First of all, to create a new Transform instance, you can either

t = Plotrb::Transform.new(:filter) # filter is one of the allowed types

Or directly call

t = Plotrb::Transform.filter

Differernt types will have different attributes to further specify the transforms.

Data Manipulation Transforms

array

Maps a data object to an array of selected values referenced by fields.

t = Plotrb::Transform.array.fields('pop.weight', 'pop.height')

You can also use take to specify the field references.

t = Plotrb::Transform.array do
  take 'pop.weight', 'pop.height'
end

copy

Copys values into a top-level data object.

You can also use take to replace fields if it reads more natural.

t = Plotrb::Transform.copy do
  from 'population'
  take 'weight', 'height'
  as 'w', 'h'
end

cross

Computes the cross-product of two data sets.

t = Plotrb::Transform.cross.with('another_data').include_diagonal

If you don't supply the secondary data set, the cross-product will be against the data set itself.

For example, if data is [1, 2, 3], cross.include_diagonal will produce

[
  {"a":1, "b":1},
  {"a":1, "b":2},
  {"a":1, "b":3},
  {"a":2, "b":1},
  {"a":2, "b":2},
  {"a":2, "b":3},
  {"a":3, "b":1},
  {"a":3, "b":2},
  {"a":3, "b":3},
]

facet

Organizes data into groups.

This is similar to group by operation in SQL, so you may replace keys with group_by.

t = Plotrb::Transform.facet.group_by('category')

For more details of how the output is organized, please refer to Vega's wiki page here.

filter

Filter the data set to remove unwanted items according to test expression.

t = Plotrb::Transform.filter.test('d.data.x > 10')

flatten

Converts a faceted or hierarchical data set back into a flat, tabular structure.

t = Plotrb::Transform.flatten

fold

Collapses one or more data properties referenced by fields into two: a key (containing the original data property name) and a value (containing the data value).

You can also use into to replace fields.

t = Plotrb::Transform.fold.into('data.gold', 'data.silver')

For the following input

[
  {"data": {"country": "USA", "gold":10, "silver":20}}, 
  {"data": {"country": "Canada", "gold":7, "silver":26}}
]

The output will be

[
  {"index": 0, "key":"data.gold", "value":10, "data": {"country": "USA"}},
  {"index": 1, "key":"data.silver", "value":20, "data": {"country": "USA"}},
  {"index": 2, "key":"data.gold", "value":7, "data": {"country": "Canada"}},
  {"index": 3, "key":"data.silver", "value":26, "data": {"country": "Canada"}}
]

This can be used to transform matrix data into standardized format.

formula

Applies formula to the data set, and stores the result in a new field.

t = Plotrb::Transform.formula.apply('abs(d.data.x * d.data.y)').into('xy')

slice

Generates a subset of the data array.

Assume the data is [5, 6, 7, 8, 9, 10, 11].

t = Plotrb::Transform.slice

t.by(3) # => [8, 9, 10, 11]
t.by(-2) # => [10, 11]
t.by([2, 5]) #=> [7, 8, 9]
t.by(:min).field('min_value')

sort

Sorts the values by fields as criteria. You can either use #reverse or prefixing a "-" character in front of the fields to specify descending order.

t = Plotrb::Transform.sort.by('foo')
t = Plotrb::Transform.sort.by('bar').reverse
t = Plotrb::Transform.sort.by('-baz')

stats

Computes statistics for the data set. They are count, minimum, maximum, sum, mean, sample variance, and sample standard deviation.

t = Plotrb::Transform.stats do
  from 'data.foo'
  include_median
  store_stats
end

truncate

Truncates a string into specified length.

t = Plotrb::Transform.truncate do
  from 'data.text'
  to 'truncated'
  max_length 20
  position :middle
  ellipsis '***'
  wordbreak
end

unique

Construct a new data set that contains unique values for the specified field.

t = Plotrb::Transform.unique.from('data.foo').to('new_data')

window

Performs a "sliding window" over a data array and outputs each window frame.

t = Plotrb::Transform.window.size(3).step(2)

The above example returns triples in the data set, such that the last value of the previous triple is the first value in the next triple, because the step size is 2.

zip

Merges two data sets together according to a join key. If no key is provided, the data sets are merged by indices.

t = Plotrb::Transform.zip do
  with 'unemployment'
  match 'data.id'
  against 'data.key'
  as 'value'
end

This example matches records in the input data with records in the secondary data set named "unemployment", where the values of data.id (primary data) and data.key (secondary data) match. Matching values in the secondary data are added to the primary data in the field named "value".

Visual Encoding Transforms

force

Performs force-directed layout for network data.

The tranform acts on two data sets: one containing nodes and one containing links. Apply the transform to the node data, and include the name of the link data as a transform parameter.

geo

Performs a cartographic projection. Given longitude and latitude values, sets corresponding x and y properties for a mark.

geopath

Creates paths for geographic regions such as countries, states and counties.

link

Computes a path definition for connecting nodes within a network or tree diagram.

pie

Computes a pie chart layout.

If value property is not given, all pie slices will have equal spans.

stack

Computes layout values for stacked graphs.

The :silhouette offset will center the stacks, while :wiggle will attempt to minimize changes in slope to make the graph easier to read. If :expand is chosen, the output values will be in the range [0,1].

You can also call #reverse or #inside_out directly to set the order.

treemap

Computes a squarified treemap layout.

wordcloud

Computes a word cloud layout.


comments powered by Disqus