Learning¶
Estimating the mean¶
Frechet mean.

class
geomstats.learning.frechet_mean.
FrechetMean
(metric, max_iter=32, epsilon=0.0001, point_type=None, method='default', lr=1.0, verbose=False)[source]¶ Empirical Frechet mean.
 Parameters
metric (RiemannianMetric) – Riemannian metric.
max_iter (int) – Maximum number of iterations for gradient descent. Optional, default: 32.
point_type (str, {'vector', 'matrix'}) – Point type. Optional, default: None.
method (str, {'default', 'adaptive', 'ball'}) – Gradient descent method. Optional, default: 'default'.
verbose (bool) – Verbose option. Optional, default: False.

fit
(X, y=None, weights=None)[source]¶ Compute the empirical Frechet mean.
 Parameters
X ({arraylike, sparse matrix}, shape=[…, n_features]) – Training input samples.
y (arraylike, shape=[…,] or […, n_outputs]) – Target values (class labels in classification, real numbers in regression). Ignored.
weights (arraylike, shape=[…,]) – Weights associated to the points. Optional, default: None.
 Returns
self (object) – Returns self.

geomstats.learning.frechet_mean.
linear_mean
(points, weights=None, point_type='vector')[source]¶ Compute the weighted linear mean.
The linear mean is the Frechet mean when points:  lie in a Euclidean space with Euclidean metric,  lie in a Minkowski space with Minkowski metric.
 Parameters
points (arraylike, shape=[…, dim]) – Points to be averaged.
weights (arraylike, shape=[…,]) – Weights associated to the points. Optional, default: None.
 Returns
mean (arraylike, shape=[dim,]) – Weighted linear mean of the points.

geomstats.learning.frechet_mean.
variance
(points, base_point, metric, weights=None, point_type='vector')[source]¶ Variance of (weighted) points wrt a base point.
 Parameters
points (arraylike, shape=[…, dim]) – Points.
weights (arraylike, shape=[…,]) – Weights associated to the points. Optional, default: None.
 Returns
var (float) – Weighted variance of the points.
Exponential barycenter.

class
geomstats.learning.exponential_barycenter.
ExponentialBarycenter
(group, max_iter=32, epsilon=1e06, step=1.0, point_type=None, verbose=False)[source]¶ Empirical Exponential Barycenter for Matrix groups.
 Parameters
group (LieGroup) – Lie group instance on which the data lie.
max_iter (int) – Maximum number of iterations to perform in the gradient descent. Optional, default: 32.
epsilon (float) – Tolerance to reach convergence. The exstrinsic norm of the gradient is used as criterion. Optional, default: 1e6.
step (float) – Learning rate in the gradient descent. Optional, default: 1.
verbose (bool) – Level of verbosity to inform about convergence. Optional, default: 1.

estimate_
¶  Type
arraylike, shape=[dim, dim]

fit
(X, y=None, weights=None)[source]¶ Compute the empirical Exponential Barycenter mean.
 Parameters
X ({arraylike, sparse matrix}, shape=[…, n_features]) – Training input samples.
y (arraylike, shape=[…,] or […, n_outputs]) – Target values (class labels in classification, real numbers in regression). Ignored.
weights (arraylike, shape=[…,]) – Weights associated to the points. Optional, default: None.
 Returns
self (object) – Returns self.
Clustering¶
Kmeans clustering.

class
geomstats.learning.kmeans.
RiemannianKMeans
(metric, n_clusters=8, init='random', lr=0.005, tol=0.01, mean_method='default', verbose=0, point_type='vector')[source]¶ Class for kmeans clustering on manifolds.
Kmeans algorithm using Riemannian manifolds.
 Parameters
n_clusters (int) – Number of clusters (k value of the kmeans). Optional, default: 8.
metric (object of class RiemannianMetric) – The geomstats Riemmanian metric associate to the space used.
init (str) – How to initialize centroids at the beginning of the algorithm. The choice ‘random’ will select training points as initial centroids uniformly at random. Optional, default: ‘random’.
tol (float) – Convergence factor. Convergence is achieved when the difference of mean distance between two steps is lower than tol. Optional, default: 1e2.
verbose (int) – If verbose > 0, information will be printed during learning. Optional, default: 0.
Example
Available example on the Poincaré Ball and Hypersphere manifolds
examples.plot_kmeans_manifolds

fit
(X, max_iter=100)[source]¶ Provide clusters centroids and data labels.
Alternate between computing the mean of each cluster and labelling data according to the new positions of the centroids.
 Parameters
X (arraylike, shape=[…, n_features]) – Training data, where n_samples is the number of samples and n_features is the number of features.
max_iter (int) – Maximum number of iterations. Optional, default: 100.
 Returns
self (arraylike, shape=[n_clusters,]) – Centroids.

predict
(X)[source]¶ Predict the labels for each data point.
Label each data point with the cluster having the nearest centroid using metric distance.
 Parameters
X (arraylike, shape=[…, n_features]) – Input data.
 Returns
self (arraylike, shape=[…,]) – Array of predicted cluster indices for each sample.
Online kmeans algorithm on Manifolds.

class
geomstats.learning.online_kmeans.
OnlineKMeans
(metric, n_clusters, n_repetitions=20, atol=1e05, max_iter=50000.0, point_type='vector')[source]¶ Online kmeans clustering.
Online kmeans clustering seeks to divide a set of data points into a specified number of classes, while minimizing intraclass variance. It is closely linked to discrete quantization, which computes the closest approximation of the empirical distribution of the dataset by a discrete distribution supported by a smaller number of points with respect to the Wasserstein distance. The algorithm used can either be seen as an online version of the kmeans algorithm or as Competitive Learning Riemannian Quantization (see [LBP2019]).
 Parameters
metric (object) – Metric of the space in which the data lives. At each iteration, one of the cluster centers is moved in the direction of the new datum, according the exponential map of the underlying space, which is a method of metric.
n_clusters (int) – Number of clusters of the kmeans clustering, or number of desired atoms of the quantized distribution.
n_repetitions (int, default=20) – The cluster centers are updated using decreasing step sizes, each of which stays constant for n_repetitions iterations to allow a better exploration of the data points.
max_iter (int, default=5e4) – Maximum number of iterations. If it is reached, the quantization may be inacurate.

cluster_centers_
¶ Coordinates of cluster centers.
 Type
array, [n_clusters, n_features]

labels_
¶ Labels of each point.
Example
>>> from geomstats.geometry.hypersphere import Hypersphere >>> from geomstats.learning.onlinekmeans import OnlineKmeans >>> sphere = Hypersphere(dim=2) >>> metric = sphere.metric >>> X = sphere.random_von_mises_fisher(kappa=10, n_samples=50) >>> clustering = OnlineKmeans(metric=metric,n_clusters=4).fit(X) >>> clustering.cluster_centers_ >>> clustering.labels_
References
 LBP2019
A. Le Brigant and S. Puechmorel, Optimal Riemannian quantization with an application to air traffic analysis. J. Multivar. Anal. 173 (2019), 685  703.

geomstats.learning.online_kmeans.
online_kmeans
(X, metric, n_clusters, n_repetitions=20, atol=1e05, max_iter=50000.0)[source]¶ Perform online Kmeans clustering.
Perform online version of kmeans algorithm on data contained in X. The data points are treated sequentially and the cluster centers are updated one at a time. This version of kmeans avoids computing the mean of each cluster at each iteration and is therefore less computationally intensive than the offline version.
In the setting of quantization of probability distributions, this algorithm is also known as Competitive Learning Riemannian Quantization. It computes the closest approximation of the empirical distribution of data by a discrete distribution supported by a smaller number of points with respect to the Wasserstein distance. This smaller number of points is n_clusters.
 Parameters
X (arraylike, shape=[…, n_features]) – Input data. It is treated sequentially by the algorithm, i.e. one datum is chosen randomly at each iteration.
metric (object) – Metric of the space in which the data lives. At each iteration, one of the cluster centers is moved in the direction of the new datum, according the exponential map of the underlying space, which is a method of metric.
n_clusters (int) – Number of clusters of the kmeans clustering, or number of desired atoms of the quantized distribution.
n_repetitions (int, default=20) – The cluster centers are updated using decreasing step sizes, each of which stays constant for n_repetitions iterations to allow a better exploration of the data points.
max_iter (int, default=5e4) – Maximum number of iterations. If it is reached, the quantization may be inacurate.
 Returns
cluster_centers (array, shape=[n_clusters, n_features]) – Coordinates of cluster centers.
labels (array, shape=[n_samples]) – Cluster labels for each point.
The Agglomerative Hierarchical Clustering (AHC) on manifolds.

class
geomstats.learning.agglomerative_hierarchical_clustering.
AgglomerativeHierarchicalClustering
(n_clusters=2, distance='euclidean', memory=None, connectivity=None, compute_full_tree='auto', linkage='average', distance_threshold=None)[source]¶ The Agglomerative Hierarchical Clustering on manifolds.
Recursively merges the pair of clusters that minimally increases a given linkage distance.
 Parameters
n_clusters (int or None, default=2) – The number of clusters to find. It must be
None
ifdistance_threshold
is notNone
.distance (str or callable, default=’euclidean’) – Distance used to compute the linkage. Can be ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’, ‘cosine’, or ‘precomputed’. If linkage is ‘ward’, only ‘euclidean’ is accepted. If ‘precomputed’, a distance matrix (instead of a similarity matrix) is needed as input for the fit method.
memory (str or object, default=None) – Used to cache the output of the computation of the tree. By default, no caching is done. If a string is given, it is the path to the caching directory.
connectivity (arraylike or callable, default=None) – Connectivity matrix. Defines for each sample the neighboring samples following a given structure of the data. This can be a connectivity matrix itself or a callable that transforms the data into a connectivity matrix. Default is None, i.e, the hierarchical clustering algorithm is unstructured.
compute_full_tree (‘auto’ or bool, default=’auto’) – Stop early the construction of the tree at n_clusters. This is useful to decrease computation time if the number of clusters is not small compared to the number of samples. This option is useful only when specifying a connectivity matrix. Note also that when varying the number of clusters and using caching, it may be advantageous to compute the full tree. It must be
True
ifdistance_threshold
is notNone
. By default compute_full_tree is ‘auto’, which is equivalent to True when distance_threshold is not None or that n_clusters is inferior to the maximum between 100 or 0.02 * n_samples. Otherwise, ‘auto’ is equivalent to False.linkage ({‘ward’, ‘complete’, ‘average’, ‘single’}, default=’average’) – Which linkage criterion to use. The linkage criterion determines which distance to use between sets of observation. The algorithm will merge the pairs of cluster that minimize this criterion.  average uses the average of the distances of each observation of
the two sets.
complete or maximum linkage uses the maximum distances between all observations of the two sets.
single uses the minimum of the distances between all observations of the two sets.
ward minimizes the variance of the clusters being merged. It works for the ‘euclidean’ distance only.
distance_threshold (float, default=None) – The linkage distance threshold above which, clusters will not be merged. If not
None
,n_clusters
must beNone
andcompute_full_tree
must beTrue
.

n_clusters_
¶ The number of clusters found by the algorithm. If
distance_threshold=None
, it will be equal to the givenn_clusters
. Type
int

labels_
¶ Cluster labels for each point.
 Type
ndarray, shape=[..,]

n_leaves_
¶ Number of leaves in the hierarchical tree.
 Type
int

n_connected_components_
¶ The estimated number of connected components in the graph.
 Type
int

children_
¶ The children of each nonleaf node. Values less than n_samples correspond to leaves of the tree which are the original samples. A node i greater than or equal to n_samples is a nonleaf node and has children children_[i  n_samples]. Alternatively at the ith iteration, children[i][0] and children[i][1] are merged to form node n_samples + i.
 Type
arraylike, shape=[n_samples1, 2]
References
This algorithm uses the scikitlearn library: https://github.com/scikitlearn/scikitlearn/blob/95d4f0841/sklearn/ cluster/_agglomerative.py#L656
The KNN classifier on manifolds.

class
geomstats.learning.knn.
KNearestNeighborsClassifier
(n_neighbors=5, weights='uniform', p=2, distance='minkowski', distance_params=None, n_jobs=None, **kwargs)[source]¶ Classifier implementing the knearest neighbors vote on manifolds.
 Parameters
n_neighbors (int, optional (default = 5)) – Number of neighbors to use by default.
weights (string or callable, optional (default = ‘uniform’)) – Weight function used in prediction. Possible values:  ‘uniform’ : uniform weights. All points in each neighborhood
are weighted equally.
‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
[callable] : a userdefined function which accepts an array of distances, and returns an array of the same shape containing the weights.
p (integer, optional (default = 2)) – Power parameter for the ‘minkowski’ string distance. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.
distance (string or callable, optional (default = ‘minkowski’)) – The distance metric to use. The default distance is minkowski, and with p=2 is equivalent to the standard Euclidean distance. See the documentation of the DistanceMetric class in the scikitlearn library for a list of available distances. If distance is “precomputed”, X is assumed to be a distance matrix and must be square during fit.
distance_params (dict, optional (default = None)) – Additional keyword arguments for the distance function.
n_jobs (int or None, optional (default = None)) – The number of parallel jobs to run for neighbors search.
None
means 1;1
means using all processors.

classes_
¶ Class labels known to the classifier
 Type
array, shape=[n_classes,]

effective_metric_
¶ The distance metric used. It will be same as the distance parameter or a synonym of it, e.g. ‘euclidean’ if the distance parameter set to ‘minkowski’ and p parameter set to 2.
 Type
string or callable

effective_metric_params_
¶ Additional keyword arguments for the distance function. For most distances will be same with distance_params parameter, but may also contain the p parameter value if the effective_metric_ attribute is set to ‘minkowski’.
 Type
dict

outputs_2d_
¶ False when y’s shape is (n_samples, ) or (n_samples, 1) during fit otherwise True.
 Type
bool
References
This algorithm uses the scikitlearn library: https://github.com/scikitlearn/scikitlearn/blob/95d4f0841/sklearn/ neighbors/_classification.py#L25
Dimension reduction¶
Principal Component Analysis on Manifolds.

class
geomstats.learning.pca.
TangentPCA
(metric, n_components=None, copy=True, whiten=False, tol=0.0, iterated_power='auto', random_state=None)[source]¶ Tangent Principal component analysis (tPCA).
Linear dimensionality reduction using Singular Value Decomposition of the Riemannian Log of the data at the tangent space of the Frechet mean.
 Parameters
metric (RiemannianMetric) – Riemannian metric.
n_components (int) – Number of principal components. Optional, default: None.

fit
(X, y=None, base_point=None)[source]¶ Fit the model with X.
 Parameters
X (arraylike, shape=[…, n_features]) – Training data, where n_samples is the number of samples and n_features is the number of features.
y (Ignored (Compliance with scikitlearn interface))
base_point (arraylike, shape=[…, n_features], optional) – Point at which to perform the tangent PCA Optional, default to Frechet mean if None.
 Returns
self (object) – Returns the instance itself.

fit_transform
(X, y=None, base_point=None)[source]¶ Fit the model with X and apply the dimensionality reduction on X.
 Parameters
X (arraylike, shape=[…, n_features]) – Training data, where n_samples is the number of samples and n_features is the number of features.
y (Ignored (Compliance with scikitlearn interface))
base_point (arraylike, shape=[…, n_features]) – Point at which to perform the tangent PCA Optional, default to Frechet mean if None.
 Returns
X_new (arraylike, shape=[…, n_components]) – Projected data.

inverse_transform
(X)[source]¶ Lowdimensional reconstruction of X.
The reconstruction will match X_original whose transform would be X if n_components=min(n_samples, n_features).
 Parameters
X (arraylike, shape=[…, n_components]) – New data, where n_samples is the number of samples and n_components is the number of components.
 Returns
X_original (arraylike, shape=[…, n_features]) – Original data.

transform
(X, y=None)[source]¶ Project X on the principal components.
 Parameters
X (arraylike, shape=[…, n_features]) – Data, where n_samples is the number of samples and n_features is the number of features.
y (Ignored (Compliance with scikitlearn interface))
 Returns
X_new (arraylike, shape=[…, n_components]) – Projected data.