cluster¶
The functionality of this module is primarily exposed by the
Clustering
class. An instance of this
class aggregates various types (defined in _types
here).
Go to:
Clustering¶
- class commonnn.cluster.Clustering(data=None, *, fitter=None, hierarchical_fitter=None, predictor=None, bundle_kwargs=None, recipe=None, **recipe_kwargs)¶
Organises a clustering
Aggregates all necessary types to carry out a clustering of input data points.
- property children¶
Return a mapping of child cluster labels to
commonnn._bundle.Bundle
instances representing the children of this clustering (the root bundle).
- evaluate(self, ax=None, clusters: Optional[Container[int]] = None, original: bool = False, plot_style: str = 'dots', parts: Optional[Tuple[Optional[int]]] = None, points: Optional[Tuple[Optional[int]]] = None, dim: Optional[Tuple[int, int]] = None, mask: Optional[Sequence[Union[bool, int]]] = None, ax_props: Optional[dict] = None, annotate: bool = True, annotate_pos: Union[str, dict] = 'mean', annotate_props: Optional[dict] = None, plot_props: Optional[dict] = None, plot_noise_props: Optional[dict] = None, hist_props: Optional[dict] = None, free_energy: bool = True, bundle=None)¶
Make 2D plot of an original data set or a cluster result
- Parameters:
ax – The
Axes
instance to which to add the plot. IfNone –
created. (a new Figure with Axes will be) –
clusters – Cluster numbers to include in the plot. If
None
, consider all.original – Allows to plot the original data instead of a cluster result. Overrides
clusters
. Will be consideredTrue
, if no cluster result is present.plot_style –
- The kind of plotting method to use:
”dots”,
ax.plot()
”scatter”,
ax.scatter()
”contour”,
ax.contour()
”contourf”,
ax.contourf()
parts – Use a slice (start, stop, stride) on the data parts before plotting. Will be applied before a slice on
points
.points – Use a slice (start, stop, stride) on the data points before plotting.
dim – Use these two dimensions for plotting. If
None
, uses (0, 1).mask – Sequence of boolean or integer values used for optional fancy indexing on the point data array. Note, that this is applied after regular slicing (e.g. via
points
) and requires a copy of the indexed data (may be slow and memory intensive for big data sets).annotate – If there is a cluster result, plot the cluster numbers. Uses
annotate_pos
to determinte the position of the annotations.annotate_pos –
- Where to put the cluster number annotation. Can be one of:
”mean”, Use the cluster mean
”random”, Use a random point of the cluster
- dict
{1: (x, y), ...}
, Use a specific coordinate tuple for each cluster. Omitted labels will be placed randomly.
- dict
annotate_props – Dictionary of keyword arguments passed to
ax.annotate()
.ax_props – Dictionary of
ax
properties to apply after plotting viaax.set(**ax_props)()
. IfNone
, uses defaults that can be also defined in the configuration file (Note yet implemented).plot_props – Dictionary of keyword arguments passed to various functions (
plot.plot_dots()
etc.) with different meaning to format cluster plotting. IfNone
, uses defaults that can be also defined in the configuration file (Note yet implemented).plot_noise_props – Like
plot_props
but for formatting noise point plotting.hist_props – Dictionary of keyword arguments passed to functions that involve the computing of a histogram via
numpy.histogram2d
.free_energy – If
True
, converts computed histograms to pseudo free energy surfaces.
- Returns:
Figure, Axes and a list of plotted elements
Note
Requires coordinate access on the input data via
to_components_array()
. Also requiresby_parts()
if optionparts
is used.
- fit(self, bundle=None, *, sort_by_size=True, member_cutoff=None, max_clusters=None, record=True, v=True, **kwargs) None ¶
Execute clustering procedure
- Keyword Arguments:
sort_by_size – Weather to sort (and trim) the created
Labels
instance. See alsoLabels.sort_by_size()
.member_cutoff – Valid clusters need to have at least this many members. Passed on to
Labels.sort_by_size()
ifsort_by_size
isTrue
. Has no effect otherwise and valid clusters have at least one member.max_clusters – Keep only the largest
max_clusters
clusters. Passed on toLabels.sort_by_size()
ifsort_by_size
isTrue
. Has no effect otherwise.record – Whether to create a
Record
instance for this clustering which is appended to theSummary
of the clustered bundle.v – Be chatty.
Note
Further keyword arguments are passed on to
fit
.
- fit_hierarchical(self, *args, purge=True, bundle=None, **kwargs)¶
Execute hierarchical clustering procedure
- Keyword Arguments:
purge – Reset children dictionary of root bundle
bundle – Root bundle
Note
Used arguments and further keyword arguments depend on the used hierarchical fitter.
- property fitter¶
- property input_data¶
- isolate(self, bool purge: bool = True, bool isolate_input_data: bool = True, bundle=None) None ¶
Create child clusterings from cluster labels
- Parameters:
purge – If
True
, creates a new mapping for the children of this clustering.isolate_input_data – If
True
, attaches a subset of the input data of this clustering to the child.bundle – A bundle to operate on. If
None
uses the root bundle.
- property labels¶
Direct access to
labels
holding cluster label assignments for points inInputData
, stored on the rootBundle
.
- pie(self, ax=None, pie_props=None, bundle=None)¶
Make a pie plot of the cluster hierarchy based on assigned labels
- predict(self, other, *, bundle=None, **kwargs)¶
Execute prediction procedure
- Parameters:
other –
commonnn._bundle.Bundle
instance for which cluster labels should be predicted.- Keyword Arguments:
bundle – Bundle to predict from. If None, uses the root bundle.
- reel(self, depth: Optional[int] = None, bundle=None) None ¶
- property root¶
- summarize(self, ax=None, quantity: str = 'execution_time', treat_nan: Optional[Any] = None, convert: Optional[Any] = None, ax_props: Optional[dict] = None, contour_props: Optional[dict] = None, plot_style: str = 'contourf', bundle=None)¶
Generate a 2D plot of record values
Record values (“time”, “clusters”, “largest”, “noise”) are plotted against cluster parameters (radius cutoff r and cnn cutoff c).
- Parameters:
ax – Matplotlib Axes to plot on. If
None
, a new Figure with Axes will be created.quantity – Record value to visualise: * “time” * “clusters” * “largest” * “noise”
treat_nan – If not
None
, use this value to pad nan-values.ax_props – Used to style
ax
.contour_props – Passed on to contour.
- property summary¶
Return an instance of
commonnn.report.Summary
collecting clustering result records for this clustering (the root bundle).
- to_dtrajs(self, bundle=None)¶
Convert cluster label assignments to discrete state trajectory
- to_nx_DiGraph(self, ignore=None, bundle=None)¶
Convert cluster hierarchy to networkx DiGraph
- Keyword Arguments:
ignore – A set of label not to include into the graph. Use for example to exclude noise (label 0).
bundle – The bundle to start with. If
None
, uses the root bundle.
- tree(self, ax=None, ignore=None, pos_props=None, draw_props=None, bundle=None)¶
Make a layer plot of the cluster hierarchy
- trim(self, Bundle bundle=None, protocol=u'shrinking', **kwargs)¶