Suggestions for improvements to 1-Introduction.ipynb

Hi there,

first of all, congrats on a great product. I was going through the Getting Started guide and noticed a few places where the text in the Introduction notebook is somewhat misleading with how it describes geographic data and what certain steps or arguments are actually doing. Below the four issue I found and some suggested edits for your consideration. I’m extremely new to HoloViews but think these edits would help reduce some confusion for newcomers.

Issue 1:

"As we can see, this dataset contains 24 arrays (one for each hour of the day) of taxi dropoff locations (by latitude and longitude) (My emphasis) aggregated over one month in 2015. The array shown above contains the accumulated dropoffs for the first hour of the day.

–Comment: correct me if I’m wrong but the arrays themselves don’t contain any geographic information like longitude and latitude. You just happen to display them across an, in the first instance, arbitray Cartesian space which, in this case, is chosen to line up with the actual lons and lats of the stations dataset you overlay later on, but you could have chosen any bounds tuple resulting in a square Cartesian space (e.g. (0, 0, 1, 1)) to display the arrays on their own, no? I found this wording here very misleading, being not very familiar with arrays and completely new to .npz’s and HoloViews. I would recommend removing the reference to latitute and longitude in parenthesis and replace it, and what follows, with: “where the value of each non-zero element in each of the 24 arrays is an intensity value representative of the accumulated drop-offs at that location over the same one hour time period, aggregated over one month in 2015)”–

Issue 2:

Once again, we can easily visualize this data with HoloViews by passing our array to hv.Image to create an object named image. This object has the spatial extent of the data declared as the bounds, in terms of the corresponding range of latitudes and longitudes (My emphasis)

Comment: Same issue as above. I eventually got what you mean by “corresponding” but at first reading, this is misleading. Would suggest the following replacement for the last sentence: “…Arrays being agnostic with regards to geography, any tuple (x0, y0, x1, y1) resulting in a space with the same aspect ratio as the array, in this case a square such as (0, 0, 1, 1), could be used to define the Cartesian space upon which to plot the image representations of each array (try!). If you are planning to work with other data in real-world coordinates, however, chose bounds in the same respective coordinate reference system with which to cover your study area. In this case, let’s already define the bounds in terms of the minimum and maximum longitude and latitude of the second dataset we will be overlaying in the next step.”

Issue 3:

bounds = (-74.05, 40.70, -73.90, 40.80)
image = hv.Image(taxi_dropoffs['0'], ['lon','lat'], bounds=bounds)

Comment: The second positional argument to hv.Image() here is also not well-explained. Newcomers, who may not have read the API reference at this point, may think these are some sort of existing keys in the dataset itself when in fact they are merely axis labels, if I understand correctly. I therefore suggest to add, prior to the next paragraph: “The second positional argument ['lon', lat] in this case merely serves to define axis labels. They are not features of the dataset itself.”

Issue 4:

On the left, we have the visual representation of the image object we declared. Using + we put it into a Layout together with a new compositional object created with the * operator called an Overlay. This particular overlay displays the station positions on top of our image, which works correctly because the data in both elements exists in the same space, namely New York City. (My emphasis)

–Comment: This again is misleading. The arrays don’t ‘exist’ in any space at all. Would suggest to replace the highlighted portion with: “because we earlier defined the bounds for the image representation to correspond (roughly) to the range of longitudes and latitudes of the stations dataset.”–