| Title: | Functional Data Analysis using Variational Inference |
|---|---|
| Description: | Implements a variational Expectation-Maximization (VEM) algorithm for smoothing one or multiple functional observations via basis function selection. The algorithm estimates all model parameters simultaneously and automatically, while accounting for within-curve correlation. The approach provides a flexible and computationally efficient framework for smoothing correlated functional data. |
| Authors: | Camila de Souza [cre], Stephen Kinsey [aut], Ana Carolina da Cruz [aut], Pedro Henrique Toledo Oliveira Sousa [aut] |
| Maintainer: | Camila de Souza <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.0 |
| Built: | 2026-06-21 12:13:15 UTC |
| Source: | https://github.com/desouzalab/fda.vi |
Returns a matrix of estimated basis coefficients.
Each column corresponds to one curve; each row to one basis function.
Coefficients are set to zero when the posterior inclusion probability
threshold (inactive bases). When
is.composite = TRUE, the matrix has dimension ,
where is the highest selected by GCV across all
curves; coefficients for curves with smaller optimal are
zero-padded (structural padding).
## S3 method for class 'vem_fit' coef(object, threshold = 0.5, ...)## S3 method for class 'vem_fit' coef(object, threshold = 0.5, ...)
object |
A |
threshold |
Numeric in |
... |
Currently unused. |
A numeric matrix of dimension , with row
names B1, B2, ... and column names Curve_1, Curve_2, ....
da Cruz, A. C., de Souza, C. P. E., & Sousa, P. H. T. O. (2024). Fast Bayesian basis selection for functional data representation with correlated errors. arXiv:2405.20758. https://arxiv.org/abs/2405.20758
vem_fit, predict.vem_fit,
summary.vem_fit
data(toy_curves) fit <- vem_fit(y = toy_curves$y, Xt = toy_curves$Xt, K = 8) # K x m matrix of active coefficients coefs <- coef(fit) dim(coefs) # 8 x 3 # Compare estimated vs true coefficients for curve 1 cbind(estimated = coefs[, 1], true = toy_curves$true_coef) # Stricter threshold — only very confident inclusions coef(fit, threshold = 0.9)data(toy_curves) fit <- vem_fit(y = toy_curves$y, Xt = toy_curves$Xt, K = 8) # K x m matrix of active coefficients coefs <- coef(fit) dim(coefs) # 8 x 3 # Compare estimated vs true coefficients for curve 1 cbind(estimated = coefs[, 1], true = toy_curves$true_coef) # Stricter threshold — only very confident inclusions coef(fit, threshold = 0.9)
Computes the generalized cross-validation (GCV) score for each curve from a
vem_smooth model object. GCV approximates leave-one-out prediction
error and is used by tune_vem_by_gcv to select the optimal
number of basis functions .
The smoother matrix maps observed values to fitted values and is
constructed from the variational posteriors. Its trace provides the effective
degrees of freedom used in the GCV penalty.
gcv_vem(out, threshold = 0.5)gcv_vem(out, threshold = 0.5)
out |
A fitted object returned by |
threshold |
Numeric in |
A named numeric vector of length of per-curve GCV scores.
Lower scores indicate better fit relative to model complexity.
da Cruz, A. C., de Souza, C. P. E., & Sousa, P. H. T. O. (2024). Fast Bayesian basis selection for functional data representation with correlated errors. arXiv:2405.20758. https://arxiv.org/abs/2405.20758
Plots observed data, the posterior mean fitted curve, and an optional 95%
credible band for a single curve from a vem_fit object. The credible
band provides uncertainty quantification by sampling from the
variational posteriors: and
.
Predictions are automatically back-transformed if the model was fitted with
center = TRUE or scale = TRUE.
## S3 method for class 'vem_fit' plot( x, curve_idx = 1, type = c("polygon", "lines"), show_CI = TRUE, n_samples = 200, alpha_shade = 0.25, ylim = NULL, xlab = "t", ylab = "Value", show_basis = FALSE, ... )## S3 method for class 'vem_fit' plot( x, curve_idx = 1, type = c("polygon", "lines"), show_CI = TRUE, n_samples = 200, alpha_shade = 0.25, ylim = NULL, xlab = "t", ylab = "Value", show_basis = FALSE, ... )
x |
A |
curve_idx |
Integer. Index of the curve to plot. Default |
type |
Character. Credible band style: |
show_CI |
Logical. If |
n_samples |
Integer. Number of posterior draws used to construct the
credible band. Default |
alpha_shade |
Numeric in |
ylim |
Optional numeric vector of length 2. If |
xlab |
Character. Label for the horizontal axis. Default |
ylab |
Character. Label for the vertical axis. Default |
show_basis |
Logical. If |
... |
Additional graphical parameters passed to |
Invisibly returns NULL. Called for its side effect of
producing a plot.
da Cruz, A. C., de Souza, C. P. E., & Sousa, P. H. T. O. (2024). Fast Bayesian basis selection for functional data representation with correlated errors. arXiv:2405.20758. https://arxiv.org/abs/2405.20758
data(toy_curves) fit <- vem_fit(y = toy_curves$y, Xt = toy_curves$Xt, K = 8) # Default: shaded credible band for curve 1 plot(fit) # Dashed credible band for curve 2 plot(fit, curve_idx = 2, type = "lines") # With basis selection subplot plot(fit, curve_idx = 1, show_basis = TRUE) # Suppress credible band plot(fit, show_CI = FALSE, main = "Mean fit only")data(toy_curves) fit <- vem_fit(y = toy_curves$y, Xt = toy_curves$Xt, K = 8) # Default: shaded credible band for curve 1 plot(fit) # Dashed credible band for curve 2 plot(fit, curve_idx = 2, type = "lines") # With basis selection subplot plot(fit, curve_idx = 1, show_basis = TRUE) # Suppress credible band plot(fit, show_CI = FALSE, main = "Mean fit only")
Returns posterior mean curve estimates from a vem_fit object.
Active basis functions are selected by applying a 0.5 probability threshold
on the posterior inclusion probabilities. If newdata is supplied,
a new basis matrix is
constructed at those time points; otherwise the original fitted time points
are used. Predictions are automatically back-transformed if the model was
fitted with center = TRUE or scale = TRUE.
## S3 method for class 'vem_fit' predict(object, newdata = NULL, ...)## S3 method for class 'vem_fit' predict(object, newdata = NULL, ...)
object |
A |
newdata |
Optional numeric vector of new time points at which to
evaluate the fitted curves. Must lie within the original domain
|
... |
Currently unused. |
A list of length . Each element is a numeric vector of
predicted values on the original (back-transformed) scale.
da Cruz, A. C., de Souza, C. P. E., & Sousa, P. H. T. O. (2024). Fast Bayesian basis selection for functional data representation with correlated errors. arXiv:2405.20758. https://arxiv.org/abs/2405.20758
vem_fit, plot.vem_fit,
coef.vem_fit
data(toy_curves) fit <- vem_fit(y = toy_curves$y, Xt = toy_curves$Xt, K = 8) # Predictions at original time points preds <- predict(fit) length(preds) # 3 — one vector per curve # Predictions at a denser grid Xt_new <- seq(0, 1, length.out = 200) preds_dense <- predict(fit, newdata = Xt_new) # Plot observed vs predicted for curve 1 plot(toy_curves$Xt, toy_curves$y[[1]], pch = 16, cex = 0.6, col = "grey50", xlab = "t", ylab = "y") lines(Xt_new, preds_dense[[1]], col = "firebrick", lwd = 2)data(toy_curves) fit <- vem_fit(y = toy_curves$y, Xt = toy_curves$Xt, K = 8) # Predictions at original time points preds <- predict(fit) length(preds) # 3 — one vector per curve # Predictions at a denser grid Xt_new <- seq(0, 1, length.out = 200) preds_dense <- predict(fit, newdata = Xt_new) # Plot observed vs predicted for curve 1 plot(toy_curves$Xt, toy_curves$y[[1]], pch = 16, cex = 0.6, col = "grey50", xlab = "t", ylab = "y") lines(Xt_new, preds_dense[[1]], col = "firebrick", lwd = 2)
Provides a displayed summary of the results from vem_fit and
invisibly returns a list of summary statistics, including the basis type,
number of curves, selected , active basis counts per curve,
estimated model parameters, and GCV tuning results if applicable.
Reported variational posterior parameters for and
are the shape and scale of their respective
Inverse-Gamma variational distributions:
and
.
For composite fits (selection_metric = "per_curve"), parameters
from the first curve are shown as representative values.
## S3 method for class 'vem_fit' summary(object, ...)## S3 method for class 'vem_fit' summary(object, ...)
object |
A |
... |
Currently unused. |
Invisibly returns a list with element active_bases: an
integer vector of active basis counts per curve.
da Cruz, A. C., de Souza, C. P. E., & Sousa, P. H. T. O. (2024). Fast Bayesian basis selection for functional data representation with correlated errors. arXiv:2405.20758. https://arxiv.org/abs/2405.20758
data(toy_curves) fit <- vem_fit(y = toy_curves$y, Xt = toy_curves$Xt, K = 8) summary(fit) # Active basis counts are returned invisibly s <- summary(fit) s$active_basesdata(toy_curves) fit <- vem_fit(y = toy_curves$y, Xt = toy_curves$Xt, K = 8) summary(fit) # Active basis counts are returned invisibly s <- summary(fit) s$active_bases
A small simulated dataset of three functional curves used in package examples. Curves are generated from a known cubic B-spline expansion with correlated errors, making it suitable for demonstrating basis selection and recovery of true coefficients.
toy_curvestoy_curves
A list with the following elements:
yNamed list of 3 numeric vectors of length 50, one per curve.
XtNumeric vector of 50 equally spaced time points on .
true_coefNumeric vector of length 8. True basis coefficients:
c(1.5, 0, -1, 0.8, 0, -0.5, 1.2, -0.9).
KInteger. Number of basis functions used (8).
mInteger. Number of curves (3).
sigmaNumeric. True noise standard deviation (0.1).
wNumeric. True correlation decay parameter (6).
Each curve is generated as:
where
for all , and with
and (correlation function of an
Ornstein-Uhlenbeck (OU) process).
Basis functions 2 and 5 have zero coefficients, providing a ground truth
for evaluating basis selection.
Generated via data-raw/generate_toy_curves.R.
data(toy_curves) str(toy_curves) # Plot the three raw curves plot(toy_curves$Xt, toy_curves$y[[1]], type = "l", ylab = "y", xlab = "t", main = "Toy curves") lines(toy_curves$Xt, toy_curves$y[[2]], col = "blue") lines(toy_curves$Xt, toy_curves$y[[3]], col = "red")data(toy_curves) str(toy_curves) # Plot the three raw curves plot(toy_curves$Xt, toy_curves$y[[1]], type = "l", ylab = "y", xlab = "t", main = "Toy curves") lines(toy_curves$Xt, toy_curves$y[[2]], col = "blue") lines(toy_curves$Xt, toy_curves$y[[3]], col = "red")
Fits vem_smooth across a grid of candidate basis sizes
K_grid and selects the best using GCV scores from
gcv_vem. Called internally by vem_fit when
a vector of values is supplied; not typically called directly.
Two selection modes are supported: "mean" selects the single
minimizing the mean GCV across all curves; "per_curve" selects the
that minimizes the GCV criterion for each individual curve, producing a composite fit.
tune_vem_by_gcv( y, Xt, K_grid, build_B, initial_values_fn, threshold = 0.5, mode = c("mean", "per_curve"), ... )tune_vem_by_gcv( y, Xt, K_grid, build_B, initial_values_fn, threshold = 0.5, mode = c("mean", "per_curve"), ... )
y |
List of curves. |
Xt |
Numeric vector of time points. |
K_grid |
Integer vector of candidate |
build_B |
Function with signature |
initial_values_fn |
Function with signature |
threshold |
Posterior inclusion probability (PIP) threshold passed to |
mode |
Character. |
... |
Additional arguments passed to |
A list with:
fitsNamed list of fitted vem_smooth objects, one per
candidate .
gcv_matrixNumeric matrix ( length(K_grid))
of per-curve GCV scores.
best_K_meanInteger. Best by mean GCV.
best_K_per_curveInteger vector of length . Best
for each curve.
da Cruz, A. C., de Souza, C. P. E., & Sousa, P. H. T. O. (2024). Fast Bayesian basis selection for functional data representation with correlated errors. arXiv:2405.20758. https://arxiv.org/abs/2405.20758
Fits one or more functional curves using Bayesian basis function selection
via the Variational EM algorithm, with an Ornstein-Uhlenbeck within-curve
correlation structure. Internally calls vem_smooth to run the
VEM algorithm.
If a single value is provided for K, the model is fitted using that fixed
number of basis functions. If a vector of candidate values is supplied, the
function tune_vem_by_gcv is called to automatically select the
optimal K based on the GCV criterion. The resulting fitted object provides
methods for plot, predict, coef, and summary via the corresponding S3
methods plot.vem_fit, predict.vem_fit,
coef.vem_fit, and summary.vem_fit.
vem_fit( y, Xt, K = NULL, basis_type = c("cubic_bspline", "fourier"), selection_metric = c("mean", "per_curve"), threshold = 0.5, center = FALSE, scale = FALSE, period = NULL, initial_values_fn = NULL, lambda_1 = NULL, lambda_2 = NULL, delta_1 = NULL, delta_2 = NULL, ... )vem_fit( y, Xt, K = NULL, basis_type = c("cubic_bspline", "fourier"), selection_metric = c("mean", "per_curve"), threshold = 0.5, center = FALSE, scale = FALSE, period = NULL, initial_values_fn = NULL, lambda_1 = NULL, lambda_2 = NULL, delta_1 = NULL, delta_2 = NULL, ... )
y |
Named list of numeric vectors, one per curve. |
Xt |
Numeric vector of time points, common across all curves. |
K |
Integer or integer vector of candidate basis sizes. If a single
value, fits directly at that |
basis_type |
Character. One of |
selection_metric |
Character. |
threshold |
Numeric in |
center |
Logical. If |
scale |
Logical. If |
period |
Numeric. Period for Fourier bases. Defaults to the domain
range |
initial_values_fn |
Function with signature |
lambda_1, lambda_2
|
Positive scalars. Inverse-Gamma prior hyperparameters
for |
delta_1, delta_2
|
Positive scalars. Inverse-Gamma prior hyperparameters
for |
... |
Additional arguments passed to |
An object of class "vem_fit" containing:
modelThe fitted vem_smooth object (global fit), or a
named list of per-curve model objects (composite fit).
selected_KInteger vector of length m. The K
used for each curve.
best_KThe single selected K (global fit), or a vector
(composite fit).
tuningOutput of tune_vem_by_gcv, including
the full GCV matrix and all candidate fits. NULL if a single
K was supplied.
scaling_paramsList with means and sds used
for standardization. Used by predict.vem_fit and
plot.vem_fit to back-transform predictions.
data_origThe input curves in their original scales.
basis_type, is_composite, Xt, call
Metadata stored for use by S3 methods.
da Cruz, A. C., de Souza, C. P. E., & Sousa, P. H. T. O. (2024). Fast Bayesian basis selection for functional data representation with correlated errors. arXiv:2405.20758. https://arxiv.org/abs/2405.20758
vem_smooth, plot.vem_fit,
predict.vem_fit, coef.vem_fit,
summary.vem_fit
data(toy_curves) fit <- vem_fit( y = toy_curves$y, Xt = toy_curves$Xt, K = 8 ) summary(fit) plot(fit, curve_idx = 1) coef(fit) predict(fit) # GCV tuning over a grid of K values fit_gcv <- vem_fit( y = toy_curves$y, Xt = toy_curves$Xt, K = c(6, 8, 10) ) fit_gcv$best_Kdata(toy_curves) fit <- vem_fit( y = toy_curves$y, Xt = toy_curves$Xt, K = 8 ) summary(fit) plot(fit, curve_idx = 1) coef(fit) predict(fit) # GCV tuning over a grid of K values fit_gcv <- vem_fit( y = toy_curves$y, Xt = toy_curves$Xt, K = c(6, 8, 10) ) fit_gcv$best_K
Fits functional curves simultaneously via Bayesian basis function
selection with an Ornstein-Uhlenbeck within-curve correlation structure.
This function is called internally by vem_fit and only runs
the VEM algorithm itself, without performing basis construction,
standardization, or GCV tuning. Most users should call
vem_fit instead, which handles those steps automatically.
vem_smooth( y, B, Xt = Xt, m = length(y), K = K, mu_ki = 0.5, lambda_1 = 1e-10, lambda_2 = 1e-10, delta_1 = 1e-10, delta_2 = 1e-10, maxIter = 1000, initial_values, convergence_threshold = 0.01, lower_opt = 0.1 )vem_smooth( y, B, Xt = Xt, m = length(y), K = K, mu_ki = 0.5, lambda_1 = 1e-10, lambda_2 = 1e-10, delta_1 = 1e-10, delta_2 = 1e-10, maxIter = 1000, initial_values, convergence_threshold = 0.01, lower_opt = 0.1 )
y |
List of length |
B |
List of length |
Xt |
Numeric vector of |
m |
Integer. Number of curves. Defaults to |
K |
Integer. Number of basis functions. |
mu_ki |
Numeric scalar in |
lambda_1, lambda_2
|
Positive scalars. Inverse-Gamma prior hyperparameters
for |
delta_1, delta_2
|
Positive scalars. Inverse-Gamma prior hyperparameters
for |
maxIter |
Integer. Maximum VEM iterations. Default |
initial_values |
Named list with elements |
convergence_threshold |
Positive scalar. Absolute ELBO tolerance for
convergence. Default |
lower_opt |
Positive scalar. Lower bound for |
The algorithm alternates between an E-step — sequential coordinate ascent variational inference (CAVI) updates for
, , , ,
and — and an M-step that maximizes the ELBO with
respect to the correlation decay parameter via L-BFGS-B with an
analytic gradient. Convergence is declared when the absolute ELBO change
between iterations falls below convergence_threshold.
For hyperparameter initialization, set delta_1 and delta_2
such that delta_2 / (delta_1 - 1) is a rough estimate of the noise
variance, and initialize w consistent with the expected correlation
strength in the data.
A named list containing:
mu_betaPosterior means (length ).
Sigma_betaPosterior covariance array ().
probPosterior inclusion probabilities (length ).
Basis is active for curve when .
delta1, delta2
Final parameters.
lambda1, lambda2
Final parameters.
wEstimated correlation decay parameter (range-normalized scale).
cor_matThe Ornstein-Uhlenbeck correlation
matrix evaluated at the final estimated decay parameter
, as returned by computePsiMatrix.
elbo_valuesELBO trajectory across iterations.
convergedLogical. Whether the convergence criterion was met.
n_iterationsNumber of iterations run.
da Cruz, A. C., de Souza, C. P. E., & Sousa, P. H. T. O. (2024). Fast Bayesian basis selection for functional data representation with correlated errors. arXiv:2405.20758. https://arxiv.org/abs/2405.20758
vem_fit, plot.vem_fit,
predict.vem_fit, coef.vem_fit