arviz_plots.plot_khat

arviz_plots.plot_khat#

arviz_plots.plot_khat(elpd_data, threshold=None, hover_format='{index}: {label}', legend=None, color=None, marker=None, hline_values=None, bin_format='{pct:.1f}%', plot_collection=None, backend=None, labeller=None, aes_by_visuals=None, visuals=None, **pc_kwargs)[source]#

Plot Pareto tail indices for diagnosing convergence in PSIS-LOO-CV.

The Generalized Pareto distribution (GPD) is fitted to the largest importance ratios to diagnose convergence rates. The shape parameter \(\hat{k}\) estimates the pre-asymptotic convergence rate based on the fractional number of finite moments. Values \(\hat{k} > 0.7\) indicate impractically low convergence rates and unreliable estimates. Details are presented in [1] and [2].

Parameters:

elpd_dataELPDData

ELPD data object returned by arviz_stats.loo containing Pareto k diagnostics.

thresholdfloat, optional

Highlight khat values above this threshold with annotations. If None, no points are highlighted.

hover_formatstr, default "{index}: {label}"

Format string for hover annotations. Supports {index}, {label}, and {value}.

legendbool, optional

Whether to display a legend when color aesthetics are active. If None, a legend is shown when a color mapping is available.

colorcolor spec or str, optional

Color for scatter points when no aesthetic mapping supplies one. If the value matches a dimension name, that dimension is mapped to the color aesthetic.

markermarker spec or str, optional

Marker style for scatter points when no aesthetic mapping supplies one. If the value matches a dimension name, that dimension is mapped to the marker aesthetic.

hline_valuessequence of float, optional

Custom horizontal line positions. Defaults to [0.0, 0.7, 1.0].

bin_formatstr, default "{pct:.1f}%"

Format string for bin percentages. Supports {count} and {pct} placeholders.

plot_collectionPlotCollection, optional

backend{“matplotlib”, “bokeh”, “plotly”}, optional

Plotting backend to use. Defaults to rcParams["plot.backend"].

labellerlabeller, optional

aes_by_visualsmapping of {strsequence of str or False}, optional

Mapping of visuals to aesthetics that should use their mapping in plot_collection when plotted. Valid keys are the same as for visuals.

By default:

khat -> uses all available aesthetic mappings
threshold_text -> uses no aesthetic mappings
hover -> uses no aesthetic mappings
title -> uses no aesthetic mappings
xlabel -> uses no aesthetic mappings
ylabel -> uses no aesthetic mappings
ticks -> uses no aesthetic mappings

visualsmapping of {strmapping or bool}, optional

Valid keys are:

khat -> passed to scatter_xy
hlines -> passed to hline, defaults to False
bin_text -> passed to annotate_xy, defaults to False
threshold_text -> passed to annotate_xy
hover -> enables interactive hover annotations, defaults to False
title -> passed to labelled_title, defaults to False
xlabel -> passed to labelled_x
ylabel -> passed to labelled_y
legend -> passed to arviz_plots.PlotCollection.add_legend
ticks -> passed to set_xticks, defaults to False

**pc_kwargs

Passed to arviz_plots.PlotCollection.wrap.

Returns:

PlotCollection

Warning

When using custom markers via the visuals dict, ensure the marker type is compatible with your chosen backend. Not all marker types support separate facecolor and edgecolor across different backends.

References

[1]

Vehtari et al. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5) (2017). https://doi.org/10.1007/s11222-016-9696-4. arXiv preprint https://arxiv.org/abs/1507.04544.

[2]

Vehtari et al. Pareto Smoothed Importance Sampling. Journal of Machine Learning Research, 25(72) (2024) https://jmlr.org/papers/v25/19-556.html arXiv preprint https://arxiv.org/abs/1507.02646

Examples

The most basic usage plots the Pareto k values from a LOO-CV computation. Each point represents one observation, with higher k values indicating less reliable importance sampling for that observation.

>>> from arviz_plots import plot_khat, style
>>> style.use("arviz-variat")
>>> from arviz_base import load_arviz_data
>>> from arviz_stats import loo
>>> dt = load_arviz_data("rugby")
>>> elpd_data = loo(dt, var_name="home_points", pointwise=True)
>>> plot_khat(elpd_data, figure_kwargs={"figsize": (10, 5)})

../../_images/arviz_plots-plot_khat-1.png

Default Pareto k diagnostic plot from PSIS-LOO-CV to assess importance sampling reliability

Pareto k parameter diagnostics

Faceted Pareto k plot with row layout and color aesthetic mapping by team

Pareto k diagnostics with aesthetic mapping

Faceted Pareto k plot using column layout to compare diagnostics across field dimensions

Pareto k diagnostics with column faceting

Faceted Pareto k plot using grid layout to separate data by field and year dimensions

Pareto k diagnostics with grid faceting

arviz_plots.plot_khat

Contents

arviz_plots.plot_khat#