Panel/Holoviews/Bokeh app memory leaks, looking for general best practices

My last comments just made me realize that this would be so much easier if we would get any kind of logs of the elements being processed. Like right now even though I create a points= hv.Points(...) element or anything like that it does not actually gets created if then nothing references it. Which is awesome, but in an app with multiple dependent functions, with multiple chunks being commented out for debugging purposes, it’s pretty hard to track which element did actually get generated and which gets regenerated when I change the value of a Panel widget. Also, I assume, something similar could happen with the Panel widgets, only existing in theory until something references them, so it would be nice to know when they get accessed, as for example my problem right now seems to be related to a specific widget being used or not.

Certainly. My code architecture is very bokeh-centric with panel as a higher-level wrapper. I do not currently leverage parameter, the @pn.depends decorator, nor any Holoviews functionality.

Philosophically, I leverage panel when I need capabilities not available in bokeh. It was nice that I started in bokeh a few years back without any exposure to panel, and could easily extend things via panel without having to refactor anything that was already developed.

It has been remarkably smooth; the only current issue I have is with the gauge indicator based on how I have to sequentially build up my app at startup.

Specific examples of panel features leveraged in my apps include (1) panel’s layouts (Column and Row) because they have list-like features not in their bokeh counterparts that allow me to append/remove/pop/clear if the app to dynamically include or exclude different plots in a user’s interactive analysis; (2) panel’s tabs again because of the list-like attributes and also greater control over the tabs like where to place them (to a side, above, etc); (3) widgets like the gauge indicator available in panel but not bokeh; (4) panel’s busy-state indicator/spinner.

1 Like

A thought based on dissecting a few observations in your original detailed summary, specifically this point about the linear dependence on the number of times that you’re refreshing/reloading

and this regarding the refresh frequency.

If I correctly interpret how you’re running things, the bokeh server is started once but each refresh might correspond to a new session / client connection.

The server has session expiration options which are relevant if that’s the case. They are listed in the panel server user’s guide here. And documented in more detail in the bokeh reference document here. See the Session Expiration Options section of the bokeh reference document.

The bokeh reference indicates that the default lifetime of unused sessions is 15 seconds; see the --unused-session-lifetime option.

If you’re refreshes occur every ~10 seconds and a new session is created with each, your app can be accumulating sessions as time progresses; by way of a simplified example in 30 seconds you have three new sessions but only two oldest unused ones will have expired for a net of one additional session hanging around.

The server won’t clean up any memory associated with the sessions until it actually classifies them as unused so that could (hopefully) account for why you’re seeing such severe/pronounced memory leaks. (My memory issues appear to be much slower and much more subtle).

In any event, it should be straight forward to test the above hypothesis by configuring the option and or adjusting the refresh rate of your tests. Apologies in advance if its a dead end, but it’s worth a try.

1 Like

Dear all,

i am reading this thread for many days.

I experienced the same memory leaks in my code using (pandas to hvplot/holoviews/) embedded in panel object

I changed to (vaex->vaex.groupby->pandas to hvplot/holoviews/) embbeded in panel object

but after every embed panel i place a time.sleep(5) for 5 seconds so that any vectorisation,visulization etc completed in panel object completed before assigning new values in panel object

e.g
#nur_cal_line=
wan_eth_traf_rx_Node_err_line=wan_eth_traf_rx_stat_data_region.hvplot.line(shared_axes=False,ylabel=‘Packets(in thousands)’,title=‘Node Error Packet rate’,x=‘Month’,y=[‘Node_error_rate’],by=‘Region’,height=500,width=600,rot=70,sort_date=True).opts(legend_position=‘top’,toolbar=‘above’)
.opts(toolbar=None,xaxis=‘bottom’,tools=[HoverTool(tooltips=[(‘Region’, ‘@{Region}’),(‘Month’, ‘@{Month}{%Y-%m}’),(‘Node_Error_Packet_rate’, “@Node_error_rate{1}”)], formatters={’@{Month}’: ‘datetime’})])
time.sleep(0.5)
wan_eth_traf_rx_link_err_line=wan_eth_traf_rx_stat_data_region.hvplot.line(shared_axes=False,ylabel=‘Packets(in thousands)’,title=‘Link Packet Error rate’,x=‘Month’,y=[‘link_error_rate’],by=‘Region’,height=500,width=600,rot=70,sort_date=True).opts(legend_position=‘top’,toolbar=‘above’)
.opts(toolbar=None,xaxis=‘bottom’,tools=[HoverTool(tooltips=[(‘Region’, ‘@{Region}’),(‘Month’, ‘@{Month}{%Y-%m}’),(‘Link_Error_Packet_rate’, “@link_error_rate{1}”)], formatters={’@{Month}’: ‘datetime’})])

time.sleep(0.5)

wan_eth_traf_rx_Node_discard_pack_line=wan_eth_traf_rx_stat_data_region.hvplot.line(shared_axes=False,ylabel='Packets(in thousands)',title='Node Discarded Packet rate',x='Month',y=['Node_Discarded_Packet_rate'],by='Region',height=500,width=600,rot=70,sort_date=True).opts(legend_position='top',toolbar='above')\
.opts(toolbar=None,xaxis='bottom',tools=[HoverTool(tooltips=[('MBU', '@{MBU}'),('Month', '@{Month}{%Y-%m}'),('Node_Discarded_Packet_rate', "@Node_Discarded_Packet_rate{1}")], formatters={'@{Month}': 'datetime'})])
time.sleep(0.5)  
wan_eth_traf_rx_link_discard_pack_line=wan_eth_traf_rx_stat_data_region.hvplot.line(shared_axes=False,ylabel='Packets(in thousands)',title='Link Discarded Packet rate',x='Month',y=['link_Discarded_Packet_rate'],by='Region',height=500,width=600,rot=70,sort_date=True).opts(legend_position='top',toolbar='above')\
.opts(toolbar=None,xaxis='bottom',tools=[HoverTool(tooltips=[('MBU', '@{MBU}'),('Month', '@{Month}{%Y-%m}'),('link_Discarded_Packet_rate', "@link_Discarded_Packet_rate{1}")], formatters={'@{Month}': 'datetime'})])
                
time.sleep(0.5)

line_graphs=(wan_eth_traf_rx_Node_err_line + wan_eth_traf_rx_link_err_line+wan_eth_traf_rx_Node_discard_pack_line+wan_eth_traf_rx_link_discard_pack_line).cols(2)
time.sleep(0.5)

tab_dash=pn.Tabs(
(‘Error_Discarded Packets’,tab_wan_ethernet_traffic ),
(‘Congestion’, tab_wan_ethernet_cong),
(‘Transport KPIs with Availability and Microwave Severe Errors’, ‘Comming’),
(‘Fiber/SPO Utilization with CRC Alignment errors’, ‘Comming’),
dynamic=True
)
time.sleep(0.5)

@_jm Thanks a lot for the session lifetime idea! I’ve played around with the values, and now I get why the memory previously increased in a sawtooth manner (continuously increasing with regular minor drops in it): there were always some old sessions hanging around, and when they got cleaned up, that freed up some memory. So now I experimented with decreasing the session parameters, and it made the memory increase smoother as the old sessions are immediately cleaned up, so no more dips - but unfortunately the memory leak still stayed. Which makes sense as on the long run it does not really matter if a session lives for 20 more seconds without anything to do or not, it was cleaned up previously and is cleaned up now, just a bit sooner. So whatever remains is something that could not be cleaned up previously and could not be cleaned up now either. But it’s great that at least I understand this mechanic a bit better.

Also interesting that seemingly there are not a lot of common element in our codes. It’s just Panel Columns and Rows (that are pretty basic and would be a big surprise if they would cause this) and just the fact that we both are using Panel Widgets, although not even the same ones. Mighty curious. If you are okay with that, I would be happy to look at your code (even if you only want to share it in private, you can find me at attila.steve.kopias@gmail.com or at https://github.com/ka-steve) to see if a fresh eye can come up with something. Other than dealing with this issue to close this project, I’m currently between jobs/projects, so have some free time to throw around. (But of course only if you’re comfortable with it.)

Hi @khannaum,
Thanks for the code example. It’s not entirely clear for me that do you still experience memory leak with this code, or putting in time.sleep() solved it for you?
As far as I understand, sleep does not do much in this situation, as in the code you only define a recipe for the plot and it will only get actually created when the server actually want to display line_graphs for the first time and to do that, it has to create all its elements too, but I can be gravely mistaken as the under the hood operations of the server are still somewhat of a mystery for me.

@philippjfr If your time allows, could you share your experiences with the memory leaks in the example projects @jbednar mentioned? It would be be a pretty important data point to know that either 1) you too are poking around in the dark currently; 2) you know what’s happening and it’s on your side of the code; 3) or confirming that the holoviz tools in fact run perfectly for all the demos, so it’s a 100% there is something wrong in our codes.
Thanks!

Hi @SteveAKopias

Thanks for the offer to look over the code; it is sincerely appreciated. Unfortunately, the work is for a closed project that cannot be shared.

The app I am focusing on has session to session variability in the memory use. But things mostly get cleaned up after sessions are acknowledged as unused and then closed.

What I am facing is a slow leak – something on the order of tens of kilobytes in discrete increments every 30 minutes or so. This appears to be fairly periodic although not completely deterministic. And it happens when the session is just idling, i.e. it’s open and there’s a keep-alive mechanism between the server and client, but I’m not actively doing anything interactive.

Regarding the session lifetime properties, there’s also a check-unused-session parameter that works in tandem with the unused-session-lifetime property. It’s basically the resolution on how often the system performs such checks to characterize all the sessions it knows about. So, this can also influence how long it takes an old session to actually get cleaned up based on when the checks are made relative to when it actually stops being used.

One additional thought… have you enabled higher-level logging in the bokeh server. If you set things to debug level, for example, you can get regular printouts of how many sessions it thinks are knocking around and their state.

2021-06-10 18:14:15,614 [pid 64036] 0 clients connected
2021-06-10 18:14:15,615 [pid 64036]   /server has 0 sessions with 0 unused

You can set this via an environment variable as follows (apologies if I am stating the obvious).

export BOKEH_PY_LOG_LEVEL=DEBUG

Dear all,

I recently realized the memory problem when I deploy my apps. The memory utilization of my app keeps going up and then hang my server.

My app utilizing the panel plotly pane and using its hovering functions to get it’s index from the dataframe, pass it to other widgets by using pn.depends in order to update the objects of other widgets.

To debug it, I started to use --mem-log-frequency to print out the memory usage the apps. Some points that I found:

  • If new sessions is opened, the memory is increasing a bit.
  • When I hovering over the scatter plot, the memory keep increasing.
  • When I closed the browser tab, the numbers of documents, uncollected Documents and uncollected Models are reducing and the uncollected Sessions become 0. However, the memory is not going back to the baseline.
  • Furthermore, this kind of error also shown when I closed the tab: Module <module 'bokeh_app_2cc5b245240b4a44b4c714144b1685d1' from 'my_apps.py'> has extra unexpected referrers! This could indicate a serious memory leak. Extra referrers: [<cell at 0x7fd66ebf6310: module object at 0x7fd67015c2f0>]
  • If nothing happened, the memory usage is kept same.
2 Likes

is your scatter plot powered by datashader by any chance? I noticed the same bokeh app message when I use datashader

Plotly Pane was used in my case, since my data is under 10^3 order and I need to represent each points on the plot.

I remember I made an app a while back that also used pd.read_pickle and when I hosted it on Heroku, after some time, I would run into a memory error and the app would crash. To remedy that, rather than reading the entire dataset with pd.read_pickle everytime, I swapped it out with a sqlite3 database + calls Support wildcards · ahuang11/historname@6c27436 · GitHub and I don’t think I experienced memory problems after, but I stopped hosting the app eventually.

Hi there, I’m facing the same “unexpected referrers!” error. I can start my app in Spyder with .show() without error but in the terminal using panel serve I get the mysterious message and the app does not run. Before updating Panel and Geoviews and bokeh (new version 2.3.3) it runs in the terminal like charme. But I did not deployed the app yet. Therefore I porobably didn’t notice the memory leak.

Are you all using a template by any chance? I was wondering if maybe it’s the template causing the memory leak or maybe some type of database?

Or I wonder if it’s a leak from all the way upstream from tornado

Hi @sunny and others

Without knowing the specifics (package versions, minimals reproducible code example, how the code is run etc) it will be almost impossible for any one to help. So please provide if possible.

My experience from running lots of applications using different package versions and operating systems is that there is no problems. The only time I see problems is when using panel serve with --autoreload flag. And that is a false positive as far as I know.

Please see Memory leak in panel · and [BUG] Increasing memory

1 Like

Interesting, what made you suspect the template? I’m using template, and my app did run into the same error.

1 Like

Random guess because 2 out of 2 of my apps both use templates.

I have been facing this issue since nearly two years. I think it has something to do with reading in data within functions that create plots