Gist supports three data formats: CSV, TSV and JSON. CSV and TSV formats are used for tabular data—data that is structured like a spreadsheet in rows and columns—while JSON is used for hierarchical data—data that is structured as nested objects and attributes. While Gist supports both tabular and hierarchical data, it is best suited for tabular data. Gist can handle hierarchical data via the API integration, but with certain limitations. For the purposes of this tutorial, we will focus on tabular data, in particular CSV.
CSV stands for "comma-separated values". TSV stands for "tab-separated values". These files can be exported from most spreadsheet software, such as Microsoft Excel or Google Sheets. We will take a look at how to use Excel to prepare your files.
This is a dataset of the "500 Greatest Albums of All Time", according to Rolling Stone magazine.
This dataset is already formatted correctly for export to CSV or TSV, with the first row as the header and the remaining rows as the data.
In this dataset, the rows represent albums. Each album has a Ranking ("Number"), a name ("Album"), a year ("Year"), an artist ("Artist"), a genre ("Genre"), a subgenre ("Subgenre"), an image ("ImageURL") and a description ("Description"). There is a mix of data types, including Numbers ("Number"), Dates ("Year"), and Text Fields ("Album"), which will give you a variety of different ways to visualize the data. The greater the variety of different field types there are in your data, the greater the possibilities for visualizing it.
This type of data is generally called "long data". Long data can be grouped by repeat values. In this example, the genre, subgenre, and artist fields all have repeat values. This type of data is best suited to visualize in Gist.
The focus of this dataset are albums. This makes the "primary object" an album. When you upload this dataset to Gist, you can select "Album" as your object title in the Views tab of your visualization settings, and Gist would know to output your visualization with albums as the focus. Gist produces the best, most informative results when your focus is an object—a person, place or thing. With long data, you will have the ability to group it by different fields.
Let's look at another dataset for comparison.
This is a dataset of alcohol consumption for all countries. The header row includes a field called "Alcohol Consumption" that includes values for Beer, Spirits, Wine, and Total Alcohol. Every country has its own column and values.
The rows in this dataset are based on unique measurements (alcohol servings), and not repeat values like the earlier example. This makes it less flexible—it isn't possible to group by different fields in the way it was possible for the albums dataset.
The dataset does include objects however—the countries. In order to structure this dataset with countries as the primary object, we need to transpose it. In Excel you can select the entire dataset and copy it. Then open a new sheet, and select "Paste Special" in the "Edit" menu. In the dialog, make sure you check the "transpose" option.
The result will look like this (we changed the first cell from "Alcohol Consumption" to "Country"):
Now your data is structured for better results in Gist, with countries as the primary object. However, there is only a single text field, while the remaining fields are numbers.
This type of data is generally called "wide data". Wide data has fewer capabilities for grouping than long data, so in order to get the most out of Gist we recommend working with long data whenever you have the option.
In either case, your next step would be to export the file as a CSV (or TSV) file, and upload it to Gist. In Excel you do so by selecting "Save as..." and then selecting "CSV UTF-8" as the file type.
Let's look at the results for both datasets.
The Rolling Stone dataset starts out with a gallery that you can filter and sort.
You can select from a range of filters to narrow down the dataset. You'll recognize that the filters were the columns in our spreadsheet.
A pie chart allows selecting from a range of fields in the Chart By menu to see different data breakdowns. In addition, you can group by the number of albums, or by ranking.
Since our data has a date field, we can view a Grouped Gallery.
The bar chart starts off with the total number of albums in each genre.
Because it is long data, we can also choose to view other field types; for example, we can look at the number of albums by year and stacked by artist.
These are just a few of the possibilities. Given that it is long data with a variety of field types, we have a range of possibilities for changing each view to find stories in the data.
Now let's look at the dataset on alcohol consumption. The visualization includes two views—a pie chart and a table.
Unlike the previous dataset, the Pie Chart View doesn't support changing the Chart By menu, since it only includes a single text field ("Country"). However, you can change the aggregation since it has several number fields to choose from (beer, wine, spirits, total).
The Table View shows the same data that we had in our spreadsheet, with the ability to filter on each column. The Table View works equally well for long and wide data.
As you can see, the options are much more limited for our second dataset.
Hopefully, this tutorial helped you understand the differences between long and wide data, and how to get the best results from Gist.
Download the source data we used here to experiment on your own:
Explore the visualizations:
More information on wide vs. long data: