Box Plot Truncates Attribute Names

maleko · May 22, 2020, 11:40am

For df, I wish to produce an interactive box plot showing the distribution of all numeric attributes, represented as seperate subplots.

In [2]: import pandas as pd 
   ...: import numpy as np 
   ...: import holoviews as hv 
   ...: import hvplot.pandas # noqa 
   ...: import feather 
   ...: hv.extension("bokeh","plotly")

In [4]: df.head()                                                                                                                                                                                                                                                             
Out[4]: 
   area error  compactness error  concave points error  concavity error  fractal dimension error  mean area  mean compactness  ...  worst concavity  worst fractal dimension  worst perimeter  worst radius  worst smoothness  worst symmetry  worst texture
0      153.40            0.04904               0.01587          0.05373                 0.006193     1001.0           0.27760  ...           0.7119                  0.11890           184.60         25.38            0.1622          0.4601          17.33
1       74.08            0.01308               0.01340          0.01860                 0.003532     1326.0           0.07864  ...           0.2416                  0.08902           158.80         24.99            0.1238          0.2750          23.41
2       94.03            0.04006               0.02058          0.03832                 0.004571     1203.0           0.15990  ...           0.4504                  0.08758           152.50         23.57            0.1444          0.3613          25.53
3       27.23            0.07458               0.01867          0.05661                 0.009208      386.1           0.28390  ...           0.6869                  0.17300            98.87         14.91            0.2098          0.6638          26.50
4       94.44            0.02461               0.01885          0.05688                 0.005115     1297.0           0.13280  ...           0.4000                  0.07678           152.20         22.54            0.1374          0.2364          16.67

[5 rows x 32 columns]

In [8]: df.info()                                                                                                                                                                                                                                                             
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 32 columns):
 #   Column                   Non-Null Count  Dtype   
---  ------                   --------------  -----   
 0   area error               569 non-null    float64 
 1   compactness error        569 non-null    float64 
 2   concave points error     569 non-null    float64 
 3   concavity error          569 non-null    float64 
 4   fractal dimension error  569 non-null    float64 
 5   mean area                569 non-null    float64 
 6   mean compactness         569 non-null    float64 
 7   mean concave points      569 non-null    float64 
 8   mean concavity           569 non-null    float64 
 9   mean fractal dimension   569 non-null    float64 
 10  mean perimeter           569 non-null    float64 
 11  mean radius              569 non-null    float64 
 12  mean smoothness          569 non-null    float64 
 13  mean symmetry            569 non-null    float64 
 14  mean texture             569 non-null    float64 
 15  perimeter error          569 non-null    float64 
 16  radius error             569 non-null    float64 
 17  smoothness error         569 non-null    float64 
 18  symmetry error           569 non-null    float64 
 19  target                   569 non-null    int64   
 20  target_name              569 non-null    category
 21  texture error            569 non-null    float64 
 22  worst area               569 non-null    float64 
 23  worst compactness        569 non-null    float64 
 24  worst concave points     569 non-null    float64 
 25  worst concavity          569 non-null    float64 
 26  worst fractal dimension  569 non-null    float64 
 27  worst perimeter          569 non-null    float64 
 28  worst radius             569 non-null    float64 
 29  worst smoothness         569 non-null    float64 
 30  worst symmetry           569 non-null    float64 
 31  worst texture            569 non-null    float64 
dtypes: category(1), float64(30), int64(1)
memory usage: 138.6 KB

If I use the Bokeh backend then the plot successfully produces, although it is not interactive:

In [5]: hv.extension("bokeh") 
   ...: df.hvplot.box(subplots=True, 
   ...:               shared_axes=False, 
   ...:               height=500,width=900)

Also, subplots=True does not appear to have an effect on the plot. I have also experienced this issue for hvplot.kde(), but I have successfully used this attribute with hvplot.hist(). (I have raised a seperate issue about this.)

If I switch to the Plotly backend then my plot produces, however, the attribute names have been truncated to a single letter, many of which are not unique, hence my box plots end up stacked on top of each other:

In [6]: hv.extension("plotly") 
   ...: df.hvplot.box(subplots=True, 
   ...:               shared_axes=False, 
   ...:               height=500,width=900)

I was successfully able to rename some attributes based on index position, however it seems like the non unique letter version of attribute names still causes an issue:

In [7]: hv.extension("plotly") 
   ...: plot = df.hvplot.box(subplots=True, 
   ...:                      shared_axes=False, 
   ...:                      height=500,width=900) 
   ...:  
   ...: plot.opts(xticks=[(0,'area error'), (1,'compactness error'), (2,'concave points error'), 
   ...:        (3,'concavity error'), (4,'fractal dimension error'), (5,'mean area'), 
   ...:        (6,'mean compactness'), (7,'mean concave points'), (8,'mean concavity'), 
   ...:        (9,'mean fractal dimension'), (10,'mean perimeter'), (11,'mean radius'), 
   ...:        (12,'mean smoothness'), (13,'mean symmetry'), (14,'mean texture'), (15,'perimeter error'), 
   ...:        (16,'radius error'), (17,'smoothness error'), (18,'symmetry error'), (19,'target'), 
   ...:        (20,'target_name'), (21,'texture error'), (22,'worst area'), (23,'worst compactness'), 
   ...:        (24,'worst concave points'), (25,'worst concavity'), (26,'worst fractal dimension'), 
   ...:        (27,'worst perimeter'), (28,'worst radius'), (29,'worst smoothness'), (30,'worst symmetry'), 
   ...:        (31,'worst texture')])

My ultimate aim would be to produce a box plot for all numeric attributes and have each box plot in its own subplot.

Any advice how to solve this issue would be greatly appreciated.

Thanks.

Software Versions:

pandas                    1.0.3            py37h6c726b0_0 
numpy                     1.18.1           py37h7241aed_0  
holoviews                 1.13.1                     py_0    pyviz
hvplot                    0.5.2                      py_0    pyviz
feather-format            0.4.0                   py_1003    conda-forge
bokeh                     1.4.0                    py37_0
plotly                    4.5.4                      py_0    plotly