cluster¶

The functionality of this module is primarily exposed by the Clustering class. An instance of this class aggregates various types (defined in _types here).

Go to:

Clustering

Clustering¶

class commonnn.cluster.Clustering(data=None, *, fitter=None, hierarchical_fitter=None, predictor=None, bundle_kwargs=None, recipe=None, **recipe_kwargs)¶

Organises a clustering

Aggregates all necessary types to carry out a clustering of input data points.

property children¶: Return a mapping of child cluster labels to commonnn._bundle.Bundle instances representing the children of this clustering (the root bundle).

Make 2D plot of an original data set or a cluster result

Parameters:

ax – The Axes instance to which to add the plot. If None, a new Figure with Axes will be created.
clusters – Cluster numbers to include in the plot. If None, consider all.
original – Allows to plot the original data instead of a cluster result. Overrides clusters. Will be considered True, if no cluster result is present.
plot_style –
The kind of plotting method to use:
- ”dots”, ax.plot()
- ”scatter”, ax.scatter()
- ”contour”, ax.contour()
- ”contourf”, ax.contourf()
parts – Use a slice (start, stop, stride) on the data parts before plotting. Will be applied before a slice on points.
points – Use a slice (start, stop, stride) on the data points before plotting.
dim – Use these two dimensions for plotting. If None, uses (0, 1).
mask – Sequence of boolean or integer values used for optional fancy indexing on the point data array. Note, that this is applied after regular slicing (e.g. via points) and requires a copy of the indexed data (may be slow and memory intensive for big data sets).
annotate – If there is a cluster result, plot the cluster numbers. Uses annotate_pos to determinte the position of the annotations.
annotate_pos –
Where to put the cluster number annotation. Can be one of:
- ”mean”, Use the cluster mean
- ”random”, Use a random point of the cluster
- dict {1: (x, y), ...}, Use a specific coordinate
  tuple for each cluster. Omitted labels will be placed randomly.
annotate_props – Dictionary of keyword arguments passed to ax.annotate().
ax_props – Dictionary of ax properties to apply after plotting via ax.set(**ax_props)(). If None, uses defaults that can be also defined in the configuration file (Note yet implemented).
plot_props – Dictionary of keyword arguments passed to various functions (plot.plot_dots() etc.) with different meaning to format cluster plotting. If None, uses defaults that can be also defined in the configuration file (Note yet implemented).
plot_noise_props – Like plot_props but for formatting noise point plotting.
hist_props – Dictionary of keyword arguments passed to functions that involve the computing of a histogram via numpy.histogram2d.
free_energy – If True, converts computed histograms to pseudo free energy surfaces.

Returns:

Figure, Axes and a list of plotted elements

Note

Requires coordinate access on the input data via to_components_array(). Also requires by_parts() if option parts is used.

fit(self, bundle=None, *, sort_by_size=True, member_cutoff=None, max_clusters=None, record=True, v=True, **kwargs) → None¶

Execute clustering procedure

Keyword Arguments:

sort_by_size – Weather to sort (and trim) the created Labels instance. See also Labels.sort_by_size().
member_cutoff – Valid clusters need to have at least this many members. Passed on to Labels.sort_by_size() if sort_by_size is True. Has no effect otherwise and valid clusters have at least one member.
max_clusters – Keep only the largest max_clusters clusters. Passed on to Labels.sort_by_size() if sort_by_size is True. Has no effect otherwise.
record – Whether to create a Record instance for this clustering which is appended to the Summary of the clustered bundle.
v – Be chatty.

Note

Further keyword arguments are passed on to fit.

fit_hierarchical(self, bundle=None, *, purge=True, **kwargs)¶

Execute hierarchical clustering procedure

Keyword Arguments:

purge – Reset children dictionary of root bundle
bundle – Root bundle

Note

Other keyword arguments depend on the used hierarchical fitter.

property fitter¶

property input_data¶

isolate(self, bool purge: bool = True, bool isolate_input_data: bool = True, bundle=None) → None¶

Create child clusterings from cluster labels

Parameters:

purge – If True, creates a new mapping for the children of this clustering.
isolate_input_data – If True, attaches a subset of the input data of this clustering to the child.
bundle – A bundle to operate on. If None uses the root bundle.

property labels¶: Direct access to labels holding cluster label assignments for points in InputData, stored on the root Bundle.

pie(self, ax=None, pie_props=None, bundle=None)¶: Make a pie plot of the cluster hierarchy based on assigned labels

predict(self, other, *, bundle=None, **kwargs)¶

Execute prediction procedure

Parameters:: other – commonnn._bundle.Bundle instance for which cluster labels should be predicted.
Keyword Arguments:: bundle – Bundle to predict from. If None, uses the root bundle.

reel(self, depth: int | None = None, bundle=None) → None¶

property root¶

summarize(self, ax=None, quantity: str = 'execution_time', treat_nan: Any | None = None, convert: Any | None = None, ax_props: dict | None = None, contour_props: dict | None = None, plot_style: str = 'contourf', bundle=None)¶

Generate a 2D plot of record values

Record values (“time”, “clusters”, “largest”, “noise”) are plotted against cluster parameters (radius cutoff r and cnn cutoff c).

Parameters:

ax – Matplotlib Axes to plot on. If None, a new Figure with Axes will be created.
quantity – Record value to visualise: * “time” * “clusters” * “largest” * “noise”
treat_nan – If not None, use this value to pad nan-values.
ax_props – Used to style ax.
contour_props – Passed on to contour.

property summary¶: Return an instance of commonnn.report.Summary collecting clustering result records for this clustering (the root bundle).

to_dtrajs(self, bundle=None)¶: Convert cluster label assignments to discrete state trajectory

to_nx_DiGraph(self, ignore=None, bundle=None)¶

Convert cluster hierarchy to networkx DiGraph

Keyword Arguments:

ignore – A set of label not to include into the graph. Use for example to exclude noise (label 0).
bundle – The bundle to start with. If None, uses the root bundle.

tree(self, ax=None, ignore=None, pos_props=None, draw_props=None, bundle=None, annotate=False, annotate_props=None, annotate_format='{alias}: (λ={lambda}, s={size})')¶

Make a layer plot of the cluster hierarchy

Keyword Arguments:

ax – The Matplotlib Axes instance to which to add the plot. If None, a new Figure with Axes will be created.
ignore – A set of labels not to include into the graph. Use for example to exclude noise (label 0).
pos_props – Dictionary of keyword arguments passed to networkx.spring_layout().
draw_props – Dictionary of keyword arguments passed to networkx.draw().
bundle – The bundle to start with. If None, uses the root bundle.
annotate – Whether to annotate the plotted nodes with aliases, size, and lambda values.
annotate_props – Dictionary of keyword arguments passed to ax.annotate().
annotate_format – Format string for the annotation. Can use {alias}, {lambda}, and {size} as placeholders.

trim(self, Bundle bundle=None, protocol=u'shrinking', **kwargs)¶

cluster¶

Clustering¶

CommonNN Clustering

Navigation

Related Topics