cluster

The functionality of this module is primarily exposed by the Clustering class. An instance of this class aggregates various types (defined in _types here).

Go to:

Clustering

class commonnn.cluster.Clustering(data=None, *, fitter=None, hierarchical_fitter=None, predictor=None, bundle_kwargs=None, recipe=None, **recipe_kwargs)

Organises a clustering

Aggregates all necessary types to carry out a clustering of input data points.

property children

Return a mapping of child cluster labels to commonnn._bundle.Bundle instances representing the children of this clustering (the root bundle).

evaluate(self, ax=None, clusters: Optional[Container[int]] = None, original: bool = False, plot_style: str = 'dots', parts: Optional[Tuple[Optional[int]]] = None, points: Optional[Tuple[Optional[int]]] = None, dim: Optional[Tuple[int, int]] = None, mask: Optional[Sequence[Union[bool, int]]] = None, ax_props: Optional[dict] = None, annotate: bool = True, annotate_pos: Union[str, dict] = 'mean', annotate_props: Optional[dict] = None, plot_props: Optional[dict] = None, plot_noise_props: Optional[dict] = None, hist_props: Optional[dict] = None, free_energy: bool = True, bundle=None)

Make 2D plot of an original data set or a cluster result

Parameters:
  • ax – The Axes instance to which to add the plot. If

  • None

  • created. (a new Figure with Axes will be) –

  • clusters – Cluster numbers to include in the plot. If None, consider all.

  • original – Allows to plot the original data instead of a cluster result. Overrides clusters. Will be considered True, if no cluster result is present.

  • plot_style

    The kind of plotting method to use:
    • ”dots”, ax.plot()

    • ”scatter”, ax.scatter()

    • ”contour”, ax.contour()

    • ”contourf”, ax.contourf()

  • parts – Use a slice (start, stop, stride) on the data parts before plotting. Will be applied before a slice on points.

  • points – Use a slice (start, stop, stride) on the data points before plotting.

  • dim – Use these two dimensions for plotting. If None, uses (0, 1).

  • mask – Sequence of boolean or integer values used for optional fancy indexing on the point data array. Note, that this is applied after regular slicing (e.g. via points) and requires a copy of the indexed data (may be slow and memory intensive for big data sets).

  • annotate – If there is a cluster result, plot the cluster numbers. Uses annotate_pos to determinte the position of the annotations.

  • annotate_pos

    Where to put the cluster number annotation. Can be one of:
    • ”mean”, Use the cluster mean

    • ”random”, Use a random point of the cluster

    • dict {1: (x, y), ...}, Use a specific coordinate

      tuple for each cluster. Omitted labels will be placed randomly.

  • annotate_props – Dictionary of keyword arguments passed to ax.annotate().

  • ax_props – Dictionary of ax properties to apply after plotting via ax.set(**ax_props)(). If None, uses defaults that can be also defined in the configuration file (Note yet implemented).

  • plot_props – Dictionary of keyword arguments passed to various functions (plot.plot_dots() etc.) with different meaning to format cluster plotting. If None, uses defaults that can be also defined in the configuration file (Note yet implemented).

  • plot_noise_props – Like plot_props but for formatting noise point plotting.

  • hist_props – Dictionary of keyword arguments passed to functions that involve the computing of a histogram via numpy.histogram2d.

  • free_energy – If True, converts computed histograms to pseudo free energy surfaces.

Returns:

Figure, Axes and a list of plotted elements

Note

Requires coordinate access on the input data via to_components_array(). Also requires by_parts() if option parts is used.

fit(self, bundle=None, *, sort_by_size=True, member_cutoff=None, max_clusters=None, record=True, v=True, **kwargs) None

Execute clustering procedure

Keyword Arguments:
  • sort_by_size – Weather to sort (and trim) the created Labels instance. See also Labels.sort_by_size().

  • member_cutoff – Valid clusters need to have at least this many members. Passed on to Labels.sort_by_size() if sort_by_size is True. Has no effect otherwise and valid clusters have at least one member.

  • max_clusters – Keep only the largest max_clusters clusters. Passed on to Labels.sort_by_size() if sort_by_size is True. Has no effect otherwise.

  • record – Whether to create a Record instance for this clustering which is appended to the Summary of the clustered bundle.

  • v – Be chatty.

Note

Further keyword arguments are passed on to fit.

fit_hierarchical(self, *args, purge=True, bundle=None, **kwargs)

Execute hierarchical clustering procedure

Keyword Arguments:
  • purge – Reset children dictionary of root bundle

  • bundle – Root bundle

Note

Used arguments and further keyword arguments depend on the used hierarchical fitter.

property fitter
property input_data
isolate(self, bool purge: bool = True, bool isolate_input_data: bool = True, bundle=None) None

Create child clusterings from cluster labels

Parameters:
  • purge – If True, creates a new mapping for the children of this clustering.

  • isolate_input_data – If True, attaches a subset of the input data of this clustering to the child.

  • bundle – A bundle to operate on. If None uses the root bundle.

property labels

Direct access to labels holding cluster label assignments for points in InputData, stored on the root Bundle.

pie(self, ax=None, pie_props=None, bundle=None)

Make a pie plot of the cluster hierarchy based on assigned labels

predict(self, other, *, bundle=None, **kwargs)

Execute prediction procedure

Parameters:

othercommonnn._bundle.Bundle instance for which cluster labels should be predicted.

Keyword Arguments:

bundle – Bundle to predict from. If None, uses the root bundle.

reel(self, depth: Optional[int] = None, bundle=None) None
property root
summarize(self, ax=None, quantity: str = 'execution_time', treat_nan: Optional[Any] = None, convert: Optional[Any] = None, ax_props: Optional[dict] = None, contour_props: Optional[dict] = None, plot_style: str = 'contourf', bundle=None)

Generate a 2D plot of record values

Record values (“time”, “clusters”, “largest”, “noise”) are plotted against cluster parameters (radius cutoff r and cnn cutoff c).

Parameters:
  • ax – Matplotlib Axes to plot on. If None, a new Figure with Axes will be created.

  • quantity – Record value to visualise: * “time” * “clusters” * “largest” * “noise”

  • treat_nan – If not None, use this value to pad nan-values.

  • ax_props – Used to style ax.

  • contour_props – Passed on to contour.

property summary

Return an instance of commonnn.report.Summary collecting clustering result records for this clustering (the root bundle).

to_dtrajs(self, bundle=None)

Convert cluster label assignments to discrete state trajectory

to_nx_DiGraph(self, ignore=None, bundle=None)

Convert cluster hierarchy to networkx DiGraph

Keyword Arguments:
  • ignore – A set of label not to include into the graph. Use for example to exclude noise (label 0).

  • bundle – The bundle to start with. If None, uses the root bundle.

tree(self, ax=None, ignore=None, pos_props=None, draw_props=None, bundle=None)

Make a layer plot of the cluster hierarchy

trim(self, Bundle bundle=None, protocol=u'shrinking', **kwargs)