What would a map of New York City look like if distances on the map represented time, instead of space? This is the question that came to me one day while staring at a subway map of the city.
I thought: New Yorkers only care about one thing — time — and this map doesn’t give you any sense of how far apart things are in the one dimension that matters to New Yorkers.
So I had the idea to figure this out. And years after the original attempt, here we are with better data and a more sophisticated approach.
This project emphasizes many of the things I love about Gradient: developing creative approaches to interesting problems, utilizing advanced mathematical and statistical techniques, building beautiful visualizations, and of course — going out and getting a great dataset.
The basic steps to build a timespace map are the following:
- Get the data: we had to figure out how far apart (in time) things are in NYC (caveat: we stuck with Manhattan). This data would take the form of a distance matrix not unlike what you used to see inset on road maps.
- Transform the data into a new projection: A matrix of raw data is good but without a useful way to see it, this would not be a compelling project. Transforming distance matrices is a common problem in data analysis and we leveraged a classic technique.
- Develop visualizations: To develop compelling ways of visualizing how the city gets transformed from physical space to timespace, we had to implement methods from computational geometry, including Delaunay Triangulation and the old mainstay of data analysis: Principal Components Analysis.
- *Develop a model to project new features: Having the general shape of Manhattan is well and good, but adding reference features like subway lines and parks required us to develop a statistical model to project arbitrary features using our original grid as landmark points.
- *Project new features: A wealth of data exists on physical features of New York, including subways, bike routes, waterways, parks, etc. The timespace map offers a sense of how these features affect the temporal layout of the city.
Getting the data
Getting the data was a three-step process. First, we had to make a grid of points that covered Manhattan:
We wanted the points to be evenly spaced so that we could capture all the idiosyncrasies of Manhattan’s temporal geography.
That’s a grid of 999 points. Next, we needed to fill out a matrix of travel times from every point to every other point. For those scoring at home, that’s 998 * 999/2 = 498,501 unique data points.
Clearly, we weren’t going to be able to fill that out by hand. The next step was to write some code to ping Google’s Distance Matrix API to gather the transit times between each point. To make the map as consistent and realistic as possible, we asked Google for times at 9am on a Wednesday morning, so it would not be affected by when we actually pinged their servers.
Fixing the data
Unfortunately, not all the data came back perfectly. We had some missing values that we had to correct for. Fortunately, the distance matrix is ubiquitous (especially so in the world of phylogenetics and evolution!) so many methods have been developed to work with them. We were able to leverage the ultrametric method of incomplete distance matrix filling from the ape package in R.
Embedding the data in a new projection
The problem of finding coordinates for points whose data is represented as a distance matrix is called multidimensional scaling. Quoting Wikipedia:
This is exactly what we need! In fact, it’s such a good fit for our use-case that we are able to use the classic version of the algorithm (sometimes also called Principal Coordinates Analysis). Plotting the results from the multidimensional scaling gives us:
Note: we’ve labeled the dimensions T1 and T2 as in “Time 1”. The units are in seconds
Now, this is interesting, but not incredibly useful. How can we get closer to a better understanding of what this data represents? We needed to develop some kind of relationship between geographic space and timespace.
Visualizing the timespace
Although the only data we have (for now!) are points, Manhattan is really a 2D surface. So, our first step was to start visualizing in that domain. We populated our map with a Delaunay Triangulation of the points, so the reader can get a sense of the geographic space of our map:
Delaunay Triangulations give an aesthetically pleasing way of generating a wireframe from points. In addition, we’ve colored this map so that different “sectors” of the city have unique colors — this will come in handy in the timespace transformations.
Now, if we hold the triangulation constant, but place the points in timespace, here is what Manhattan really looks like:
If you compare the axes here to the first multidimensional scaling plot, you’ll see how we’ve flipped and rotated the axes. We’re getting closer. We can see where Central Park is (and my, how it’s grown!) and we can see how the shape is a distortion of Manhattan.
But how can we link the two graphs? Here’s where it gets a bit tricky. The dimensions of the first graph are in latitude/longitude — or geographic distances. The dimensions of the second graph are in seconds. How can we marry the two?
The key is to first align the two maps by using Principal Component Analysis on both sets of coordinates. This will just result in a rotation, as we can see when we plot Manhattan along it’s principal components:
Now, all we have to do is make sure that the principal components for both the physical space map and the timespace map are on the same scale, and we can develop an animation showing one evolve into the other:
This is just the beginning. We want to populate this map with tons of interesting features, like subways:
And interesting analytics on the original map of Manhattan, like how connected each sector is to the rest of the city:
Or visualizations using different wireframing techniques (such as this Voronoi tessellation):
So stay tuned! Like what you see? Want to chat? Email us at firstname.lastname@example.org