Statistics in Matlab

Matlab has a large and growing library of programs for computation and statistical inference. One entry point is here:
help stats
Statistics and Machine Learning Toolbox Version 23.2 (R2023b) 19-May-2023 Distributions. Parameter estimation. betafit - Beta parameter estimation. binofit - Binomial parameter estimation. distributionFitter - Distribution fitting app. evfit - Extreme value parameter estimation. expfit - Exponential parameter estimation. fitdist - Distribution fitting. fitgmdist - Fit a Gaussian mixture model to data. gamfit - Gamma parameter estimation. gevfit - Generalized extreme value parameter estimation. gpfit - Generalized Pareto parameter estimation. lognfit - Lognormal parameter estimation. makedist - Make probability distribution. mle - Maximum likelihood estimation (MLE). mlecov - Asymptotic covariance matrix of MLE. nbinfit - Negative binomial parameter estimation. normfit - Normal parameter estimation. paretotails - Empirical cdf with generalized Pareto tails. poissfit - Poisson parameter estimation. raylfit - Rayleigh parameter estimation. unifit - Uniform parameter estimation. wblfit - Weibull parameter estimation. Probability density functions (pdf). betapdf - Beta density. binopdf - Binomial density. chi2pdf - Chi square density. evpdf - Extreme value density. exppdf - Exponential density. fpdf - F density. gampdf - Gamma density. geopdf - Geometric density. gevpdf - Generalized extreme value density. gppdf - Generalized Pareto density. hygepdf - Hypergeometric density. lognpdf - Lognormal density. mnpdf - Multinomial probability density function. mvnpdf - Multivariate normal density. mvtpdf - Multivariate t density. nbinpdf - Negative binomial density. ncfpdf - Noncentral F density. nctpdf - Noncentral t density. ncx2pdf - Noncentral Chi-square density. normpdf - Normal (Gaussian) density. pdf - Density function for a specified distribution. pearspdf - Pearson density. poisspdf - Poisson density. raylpdf - Rayleigh density. tpdf - T density. unidpdf - Discrete uniform density. unifpdf - Uniform density. wblpdf - Weibull density. Cumulative Distribution functions (cdf). betacdf - Beta cumulative distribution function. binocdf - Binomial cumulative distribution function. cdf - Specified cumulative distribution function. chi2cdf - Chi square cumulative distribution function. ecdf - Empirical cumulative distribution function (Kaplan-Meier estimate). evcdf - Extreme value cumulative distribution function. expcdf - Exponential cumulative distribution function. fcdf - F cumulative distribution function. gamcdf - Gamma cumulative distribution function. geocdf - Geometric cumulative distribution function. gevcdf - Generalized extreme value cumulative distribution function. gpcdf - Generalized Pareto cumulative distribution function. hygecdf - Hypergeometric cumulative distribution function. logncdf - Lognormal cumulative distribution function. mvncdf - Multivariate normal cumulative distribution function. mvtcdf - Multivariate t cumulative distribution function. nbincdf - Negative binomial cumulative distribution function. ncfcdf - Noncentral F cumulative distribution function. nctcdf - Noncentral t cumulative distribution function. ncx2cdf - Noncentral Chi-square cumulative distribution function. normcdf - Normal (Gaussian) cumulative distribution function. pearscdf - Pearson cumulative distribution function. poisscdf - Poisson cumulative distribution function. raylcdf - Rayleigh cumulative distribution function. tcdf - T cumulative distribution function. unidcdf - Discrete uniform cumulative distribution function. unifcdf - Uniform cumulative distribution function. wblcdf - Weibull cumulative distribution function. Critical Values of Distribution functions. betainv - Beta inverse cumulative distribution function. binoinv - Binomial inverse cumulative distribution function. chi2inv - Chi square inverse cumulative distribution function. evinv - Extreme value inverse cumulative distribution function. expinv - Exponential inverse cumulative distribution function. finv - F inverse cumulative distribution function. gaminv - Gamma inverse cumulative distribution function. geoinv - Geometric inverse cumulative distribution function. gevinv - Generalized extreme value inverse cumulative distribution function. gpinv - Generalized Pareto inverse cumulative distribution function. hygeinv - Hypergeometric inverse cumulative distribution function. icdf - Specified inverse cumulative distribution function. logninv - Lognormal inverse cumulative distribution function. nbininv - Negative binomial inverse distribution function. ncfinv - Noncentral F inverse cumulative distribution function. nctinv - Noncentral t inverse cumulative distribution function. ncx2inv - Noncentral Chi-square inverse distribution function. norminv - Normal (Gaussian) inverse cumulative distribution function. poissinv - Poisson inverse cumulative distribution function. raylinv - Rayleigh inverse cumulative distribution function. tinv - T inverse cumulative distribution function. unidinv - Discrete uniform inverse cumulative distribution function. unifinv - Uniform inverse cumulative distribution function. wblinv - Weibull inverse cumulative distribution function. Random Number Generators. betarnd - Beta random numbers. binornd - Binomial random numbers. chi2rnd - Chi square random numbers. datasample - Randomly sample from data, with or without replacement. evrnd - Extreme value random numbers. exprnd - Exponential random numbers. frnd - F random numbers. gamrnd - Gamma random numbers. geornd - Geometric random numbers. gevrnd - Generalized extreme value random numbers. gprnd - Generalized Pareto inverse random numbers. hmcSampler - Hamiltonian Monte Carlo sampler. hygernd - Hypergeometric random numbers. iwishrnd - Inverse Wishart random matrix. johnsrnd - Random numbers from the Johnson system of distributions. lognrnd - Lognormal random numbers. mhsample - Metropolis-Hastings algorithm. mnrnd - Multinomial random vectors. mvnrnd - Multivariate normal random vectors. mvtrnd - Multivariate t random vectors. nbinrnd - Negative binomial random numbers. ncfrnd - Noncentral F random numbers. nctrnd - Noncentral t random numbers. ncx2rnd - Noncentral Chi-square random numbers. normrnd - Normal (Gaussian) random numbers. pearsrnd - Random numbers from the Pearson system of distributions. poissrnd - Poisson random numbers. randg - Gamma random numbers (unit scale). random - Random numbers from specified distribution. randsample - Random sample from finite population. raylrnd - Rayleigh random numbers. slicesample - Slice sampling method. trnd - T random numbers. unidrnd - Discrete uniform random numbers. unifrnd - Uniform random numbers. wblrnd - Weibull random numbers. wishrnd - Wishart random matrix. Quasi-Random Number Generators. haltonset - Halton sequence point set. qrandstream - Quasi-random stream. sobolset - Sobol sequence point set. Statistics. betastat - Beta mean and variance. binostat - Binomial mean and variance. chi2stat - Chi square mean and variance. evstat - Extreme value mean and variance. expstat - Exponential mean and variance. fstat - F mean and variance. gamstat - Gamma mean and variance. geostat - Geometric mean and variance. gevstat - Generalized extreme value mean and variance. gpstat - Generalized Pareto inverse mean and variance. hygestat - Hypergeometric mean and variance. lognstat - Lognormal mean and variance. nbinstat - Negative binomial mean and variance. ncfstat - Noncentral F mean and variance. nctstat - Noncentral t mean and variance. ncx2stat - Noncentral Chi-square mean and variance. normstat - Normal (Gaussian) mean and variance. poisstat - Poisson mean and variance. raylstat - Rayleigh mean and variance. tstat - T mean and variance. unidstat - Discrete uniform mean and variance. unifstat - Uniform mean and variance. wblstat - Weibull mean and variance. Likelihood functions. betalike - Negative beta log-likelihood. evlike - Negative extreme value log-likelihood. explike - Negative exponential log-likelihood. gamlike - Negative gamma log-likelihood. gevlike - Generalized extreme value log-likelihood. gplike - Generalized Pareto inverse log-likelihood. lognlike - Negative lognormal log-likelihood. nbinlike - Negative binomial log-likelihood. normlike - Negative normal likelihood. wbllike - Negative Weibull log-likelihood. Descriptive Statistics. bootci - Bootstrap confidence intervals. bootstrp - Bootstrap statistics. corr - Linear or rank correlation coefficient. corrcoef - Linear correlation coefficient (in MATLAB toolbox). cov - Covariance (in MATLAB toolbox). crosstab - Cross tabulation. geomean - Geometric mean. grpstats - Summary statistics by group. harmmean - Harmonic mean. jackknife - Jackknife statistics. kurtosis - Kurtosis. mad - Median Absolute Deviation. mean - Sample average (in MATLAB toolbox). median - 50th percentile of a sample (in MATLAB toolbox). mode - Mode, or most frequent value in a sample (in MATLAB toolbox). moment - Moments of a sample. nearcorr - Nearest correlation matrix. partialcorr - Linear or rank partial correlation coefficient. partialcorri - Partial correlation coefficients with internal adjustments. range - Range. robustcov - Robust estimation of covariance and mean. skewness - Skewness. std - Standard deviation (in MATLAB toolbox). tabulate - Frequency table. trimmean - Trimmed mean. var - Variance (in MATLAB toolbox). Parametric Regression Analysis. Regression Model Building. fitcox - Fit Cox proportional hazards model. fitglm - Fit generalized linear model. fitlm - Fit linear model by least squares or robust fitting. fitlme - Fit linear mixed effects model. fitlmematrix - Fit linear mixed effects model to matrix data. fitglme - Fit generalized linear mixed effects model. fitmnr - Fit multinomial regression model. fitnlm - Fit nonlinear model by least squares or robust fitting. fitrm - Fit repeated measures model. stepwiseglm - Fit generalized linear model by stepwise regression. stepwiselm - Fit linear model by stepwise regression. Analysis of Variance. anova - Create an anova object to perform analysis of variance. anova1 - One-way analysis of variance. anova2 - Two-way analysis of variance. anovan - N-way analysis of variance. aoctool - Interactive tool for analysis of covariance. friedman - Friedman's test (nonparametric two-way anova). kruskalwallis - Kruskal-Wallis test (nonparametric one-way anova). manova - Multivariate analysis of variance. manova1 - One-way multivariate analysis of variance. manovacluster - Draw clusters of group means for manova1. multcompare - Multiple comparisons of means and other estimates. gardnerAltmanPlot - Gardner-Altman plot for two-sample effect size. meanEffectSize - One-sample or two-sample effect size computations. Linear Regression dummyvar - Dummy-variable coding. glmfit - Generalized linear model coefficient estimation. glmval - Evaluate fitted values for generalized linear model. invpred - Inverse prediction for simple linear regression. leverage - Regression diagnostic. lscov - Ordinary, weighted, or generalized least-squares (in MATLAB toolbox). lsqnonneg - Non-negative least-squares (in MATLAB toolbox). multcompare - Multiple comparisons of means and other estimates. mvregress - Multivariate regression with missing data. mvregresslike - Negative log-likelihood for multivariate regression. polyconf - Polynomial evaluation and confidence interval estimation. polyfit - Least-squares polynomial fitting (in MATLAB toolbox). polyval - Predicted values for polynomial functions (in MATLAB toolbox). regress - Multiple linear regression using least squares. regstats - Regression diagnostics. ridge - Ridge regression. robustfit - Robust regression model fitting. stepwise - Interactive tool for stepwise regression. stepwisefit - Non-interactive stepwise regression. x2fx - Factor settings matrix (x) to design matrix (fx). Nonlinear Regression. coxphfit - Cox proportional hazards regression. nlinfit - Nonlinear least-squares coefficient estimation. nlintool - Interactive graphical tool for prediction in nonlinear models. nlmefit - Nonlinear mixed-effects data fitting. nlmefitoutputfcn - Output function example for nlmefit and nlmefitsa. nlmefitsa - Fit nonlinear mixed effects model with stochastic EM algorithm. nlpredci - Confidence intervals for prediction. nlparci - Confidence intervals for parameters. Regression Plots. addedvarplot - Created added-variable plot for stepwise regression. nlintool - Interactive graphical tool for prediction in nonlinear models. polytool - Interactive graph for prediction of fitted polynomials. rcoplot - Residuals case order plot. robustdemo - Interactive tool to compare robust and least squares fits. rsmdemo - Reaction simulation (DOE, RSM, nonlinear curve fitting). rstool - Multidimensional response surface visualization (RSM). Design of Experiments (DOE). bbdesign - Box-Behnken design. candexch - D-optimal design (row exchange algorithm for candidate set). candgen - Candidates set for D-optimal design generation. ccdesign - Central composite design. cordexch - D-optimal design (coordinate exchange algorithm). daugment - Augment D-optimal design. dcovary - D-optimal design with fixed covariates. fracfactgen - Fractional factorial design generators. ff2n - Two-level full-factorial design. fracfact - Two-level fractional factorial design. fullfact - Mixed-level full-factorial design. hadamard - Hadamard matrices (orthogonal arrays) (in MATLAB toolbox). lhsdesign - Latin hypercube sampling design. lhsnorm - Latin hypercube multivariate normal sample. rowexch - D-optimal design (row exchange algorithm). Statistical Process Control (SPC). capability - Capability indices. capaplot - Capability plot. controlchart - Shewhart control chart. controlrules - Control rules (Western Electric or Nelson) for SPC data. gagerr - Gage repeatability and reproducibility (R&R) study. histfit - Histogram with superimposed normal density. normspec - Plot normal density between specification limits. runstest - Runs test for randomness. Multivariate Statistics. Cluster Analysis. cophenet - Cophenetic coefficient. cluster - Construct clusters from LINKAGE output. clusterdata - Construct clusters from data. dbscan - DBSCAN Clustering. dendrogram - Generate dendrogram plot. evalclusters - Evaluate clustering solutions to select number of clusters. fitgmdist - Fit a Gaussian mixture model to data. inconsistent - Inconsistent values of a cluster tree. kmeans - K-means clustering. kmedoids - K-medoids clustering. linkage - Hierarchical cluster information. pdist - Pairwise distance between observations. silhouette - Silhouette plot of clustered data. spectralcluster - Spectral Clustering. squareform - Square matrix formatted distance. optimalleaforder - optimal leaf ordering for hierarchical clustering. Parametric Classification. classify - Linear discriminant analysis. fitcdiscr - Fit linear discriminant analysis model with regularization. makecdiscr - Make a discriminant from class means and covariance matrix. Dimension Reduction Techniques. factoran - Factor analysis. nnmf - Non-negative matrix factorization. pca - Principal components analysis (PCA) from raw data. pcacov - Principal components analysis (PCA) from covariance matrix. pcares - Residuals from principal components analysis (PCA). ppca - Probabilistic PCA. rica - Reconstruction ICA (Independent Component Analysis). rotatefactors - Rotation of factor analysis or PCA loadings. sparsefilt - Sparse filtering. tsne - t-distributed stochastic neighbor embedding. Copulas. copulacdf - Cumulative probability function for a copula. copulafit - Fit a parametric copula to data. copulaparam - Copula parameters as a function of rank correlation. copulapdf - Probability density function for a copula. copularnd - Random vectors from a copula. copulastat - Rank correlation for a copula. Nearest Neighbor Methods. fitcknn - Fit K nearest neighbor classification model. createns - Create a NeighborSearcher object for nearest neighbor search. ExhaustiveSearcher - Nearest neighbor search object using exhaustive search. knnsearch - Find K nearest neighbors. KDTreeSearcher - Nearest neighbor search object using kd-tree. pdist2 - Pairwise distance between two sets of observations. rangesearch - Find neighbors within specified radius. Plotting. andrewsplot - Andrews plot for multivariate data. biplot - Biplot of variable/factor coefficients and scores. interactionplot - Interaction plot for factor effects. maineffectsplot - Main effects plot for factor effects. glyphplot - Plot stars or Chernoff faces for multivariate data. gplotmatrix - Matrix of scatter plots grouped by a common variable. multivarichart - Multi-vari chart of factor effects. parallelcoords - Parallel coordinates plot for multivariate data. Other Multivariate Methods. barttest - Bartlett's test for dimensionality. canoncorr - Canonical correlation analysis. cmdscale - Classical multidimensional scaling. mahal - Mahalanobis distance. manova1 - One-way multivariate analysis of variance. mdscale - Metric and non-metric multidimensional scaling. mvregress - Multivariate regression with missing data. plsregress - Partial least squares regression. procrustes - Procrustes analysis. Supervised Learning. classificationLearner - Classification machine learning app. CompactTreeBagger - Lightweight ensemble of bagged decision trees. designecoc - Coding matrix for reducing a multiclass problem to a set of binary problems. fitcauto - Optimize classifier across models and hyperparameters. fitcecoc - Fit a multiclass model for Support Vector Machine or other classifiers. fitcgam - Fit a generalized additive models classifier. fitckernel - Fit a kernel classification model by explicit feature expansion. fitclinear - Fit a linear classification model to high-dimensional data. fitcnb - Fit a Naive Bayes classifier. fitcnet - Fit a neural network classifier. fitcsvm - Fit a classification Support Vector Machine (SVM). fitctree - Fit decision tree for classification. fitrauto - Optimize regression across models and hyperparameters. fitrgam - Fit a generalized additive models regression. fitrgp - Fit a Gaussian Process (GP) regression model. fitrkernel - Fit a kernel regression model by explicit feature expansion. fitrlinear - Fit a linear regression model to high-dimensional data. fitrnet - Fit a regression neural network. fitrsvm - Fit a regression Support Vector Machine (SVM). fitrtree - Fit decision tree for regression. fitcensemble - Fit ensemble of classification learners. fitrensemble - Fit ensemble of regression learners. fitSVMPosterior - Fit posterior probabilities for a Support Vector Machine model. regressionLearner - Regression machine learning app. testcholdout - Compare accuracies of two sets of predicted class labels. testckfold - Compare accuracies of two classifiers by repeated cross-validation. TreeBagger - Ensemble of bagged decision trees. Semi-supervised Learning. fitsemigraph - Fit graph-based semi-supervised model. fitsemiself - Fit self-training semi-supervised model. Time series forecasting. directforecaster - Fit a direct forecasting model. Incremental Learning. incrementalClassificationECOC - Incremental learning for multi-class classification models. incrementalClassificationKernel - Incremental learning for binary classification kernel models. incrementalClassificationLinear - Incremental learning for linear classification models. incrementalClassificationNaiveBayes - Incremental learning for naive Bayes models. incrementalConceptDriftDetector - Incremental concept drift detector. incrementalDriftAwareLearner - Drift aware model for incremental learning. incrementalOneClassSVM - Incremental learning using one-class SVM. incrementalRobustRandomCutForest - Incremental learning using robust random cut forest. incrementalRegressionLinear - Incremental learning for linear regression models. incrementalRegressionKernel - Incremental learning for regression kernel model. detectdrift - Drift detection. Interpretable machine learning. lime - Local interpretable model-agnostic explanations. partialDependence - Compute partial dependence for 1 or 2 predictors. plotPartialDependence - Plot partial dependence for 1 or 2 predictors. shapley - Shapley values for model explanations. Fair machine learning. disparateImpactRemover - Remove disparate impact to create fairness. fairnessMetrics - Evaluate data or model fairness using bias and group metrics. fairnessThresholder - Optimize classification threshold to create fairness. fairnessWeights - Reweight observations for fairness in binary classification. Anomaly and outlier detection. iforest - Fit isolation forest. lof - Local Outlier Factor. ocsvm - Fit one-class Support Vector Machine (SVM). rrcforest - Fit Robust Random Cut Forest. Hypothesis Tests. ansaribradley - Ansari-Bradley two-sample test for equal dispersions. dwtest - Durbin-Watson test for autocorrelation in linear regression. linhyptest - Linear hypothesis test on parameter estimates. ranksum - Wilcoxon rank sum test (independent samples). runstest - Runs test for randomness. sampsizepwr - Sample size and power calculation for hypothesis test. signrank - Wilcoxon sign rank test (paired samples). signtest - Sign test (paired samples). ttest - One-sample and paired-sample t test. ttest2 - Two-sample t test. vartest - One-sample test of variance. vartest2 - Two-sample F test for equal variances. vartestn - Test for equal variances across multiple groups. ztest - Z test. Distribution Testing. adtest - Anderson-Darling goodness-of-fit test. chi2gof - Chi-square goodness-of-fit test. jbtest - Jarque-Bera test of normality. kstest - Kolmogorov-Smirnov test for one sample. kstest2 - Kolmogorov-Smirnov test for two samples. lillietest - Lilliefors test of normality. Nonparametric Functions. fishertest - Fisher's exact test friedman - Friedman's test (nonparametric two-way anova). kruskalwallis - Kruskal-Wallis test (nonparametric one-way anova). ksdensity - Kernel smoothing density estimation. mvksdensity - Multivariate kernel smoothing density estimation. ranksum - Wilcoxon rank sum test (independent samples). signrank - Wilcoxon sign rank test (paired samples). signtest - Sign test (paired samples). Hidden Markov Models. hmmdecode - Calculate HMM posterior state probabilities. hmmestimate - Estimate HMM parameters given state information. hmmgenerate - Generate random sequence for HMM. hmmtrain - Calculate maximum likelihood estimates for HMM parameters. hmmviterbi - Calculate most probable state path for HMM sequence. Model Assessment. confusionmat - Confusion matrix for classification algorithms. crossval - Loss estimate using cross-validation. cvpartition - Cross-validation partition. perfcurve - ROC and other performance measures for classification algorithms. rocmetrics - ROC and other metrics. tspartition - Cross-validation partition for time series data. Model Selection. fscmrmr - Importance of features for classification using MRMR algorithm. fscnca - Feature selection for classification using NCA. fscchi2 - Univariate feature selection for classification using chi-square test. fsrftest - Univariate feature selection for regression using F-test. fsrmrmr - Importance of features for regression using MRMR algorithm. fsrnca - Feature selection for regression using NCA. fsulaplacian - Importance of features for unsupervised learning using Laplacian scores. gencfeatures - Automated feature engineering for classification. genrfeatures - Automated feature engineering for regression. lasso - Lasso and elastic net linear regression. lassoglm - Lasso and elastic net generalized linear regression. lassoPlot - Lasso and elastic net plotting. sequentialfs - Sequential feature selection. stepwise - Interactive tool for stepwise regression. stepwisefit - Non-interactive stepwise regression. relieff - Importance of attributes (predictors) using ReliefF algorithm. Machine Learning Utilities. bayesopt - Find the global minimum of a function using Bayesian optimization. optimizableVariable - Define a variable to be optimized using bayesopt. hyperparameters - Determine hyperparameters that can be optimized for a fit function. copyFunctionHandleToWorkers - Copy objective function to parallel workers. Statistical Plotting. andrewsplot - Andrews plot for multivariate data. biplot - Biplot of variable/factor coefficients and scores. boxplot - Boxplots of a data matrix (one per column). cdfplot - Plot of empirical cumulative distribution function. confusionchart - Plot a confusion matrix. ecdf - Empirical cdf (Kaplan-Meier estimate). ecdfhist - Histogram calculated from empirical cdf. fsurfht - Interactive contour plot of a function. gline - Point, drag and click line drawing on figures. glyphplot - Plot stars or Chernoff faces for multivariate data. gname - Interactive point labeling in x-y plots. gplotmatrix - Matrix of scatter plots grouped by a common variable. gscatter - Scatter plot of two variables grouped by a third. hist - Histogram (in MATLAB toolbox). hist3 - Three-dimensional histogram of bivariate data. ksdensity - Kernel smoothing density estimation. lsline - Add least-square fit line to scatter plot. normplot - Normal probability plot. parallelcoords - Parallel coordinates plot for multivariate data. probplot - Probability plot. qqplot - Quantile-Quantile plot. refcurve - Reference polynomial curve. refline - Reference line. scatterhist - 2D scatter plot with marginal histograms. surfht - Interactive contour plot of a data grid. wblplot - Weibull probability plot. Data Objects dataset - Create datasets from workspace variables or files. dataset2table - Convert dataset array to table. cell2dataset - Convert cell array to dataset array. mat2dataset - Convert matrix to dataset array. struct2dataset - Convert structure array to dataset array. table2dataset - Convert table to dataset array. nominal - Create arrays of nominal data. ordinal - Create arrays of ordinal data. Statistics Demos. aoctool - Interactive tool for analysis of covariance. disttool - GUI tool for exploring probability distribution functions. polytool - Interactive graph for prediction of fitted polynomials. randtool - GUI tool for generating random numbers. rsmdemo - Reaction simulation (DOE, RSM, nonlinear curve fitting). robustdemo - Interactive tool to compare robust and least squares fits. File Based I/O. tblread - Read in data in tabular format. tblwrite - Write out data in tabular format to file. tdfread - Read in text and numeric data from tab-delimited file. caseread - Read in case names. casewrite - Write out case names to file. xptread - Create a dataset array from a SAS XPORT format file. Code Generation. generateLearnerDataTypeFcn - Create function for fixed-point data types. learnerCoderConfigurer - Create coder configurer for machine learning model. loadCompactModel - Construct compact model from a struct in a mat file. loadLearnerForCoder - Construct model from a struct in a mat file. saveCompactModel - Save compact fitted model to a struct in a mat file. saveLearnerForCoder - Save fitted model to a struct in a mat file. Utility Functions. cholcov - Cholesky-like decomposition for covariance matrix. combnk - Enumeration of all combinations of n objects k at a time. corrcov - Convert covariance matrix to correlation matrix. groupingvariable - Help information for grouping variables. grp2idx - Convert grouping variable to indices and array of names. hougen - Prediction function for Hougen model (nonlinear example). onehotdecode - One-hot decoding. onehotencode - One-hot encoding. parallelstats - Help information for parallel computing options. statget - Get stats options parameter value. statset - Set stats options parameter value. templateDiscriminant - Create a discriminant template. templateECOC - Create a template for ECOC learning. templateEnsemble - Create an ensemble template. templateGAM - Create a template for generalized additive models. templateGP - Create a template for Gaussian process regression. templateNeuralNetwork - Create a template for neural networks. templateLinear - Create a linear model template for high-dimensional data. templateKernel - Create a kernel model template. templateKNN - Create a classification KNN template. templateNaiveBayes - Create a naive Bayes template. templateSVM - Create a support vector machine template. templateTree - Create a decision tree template. tiedrank - Compute ranks of sample, adjusting for ties. zscore - Normalize matrix columns to mean 0, variance 1. Statistics and Machine Learning Toolbox Documentation Other uses of stats
In this module, we'll take a brief overview of some of the functions available and give some pointers to how they work.
Table of Contents

Why Matlab?

First and foremost, this is the language & environment I know best. Python is more popular and has some substantial advantages, but here are some dimensions I think Matlab is better on:
% for help:
doc
Matlab is not so good at being free, so access without a licence or Uni site license is the biggest issue. It's also not so good at seamless deployment on servers to run processes in the cloud. They are working on these things within the "not free" constraints, but this is a major reason to use Python. Matlab is, however, actively working on some new things:

A brief list of selected statistics functions

Here are some sections in the statistics and machine learning toolbox, along with a selection of functions I think are likely to be particularly useful. This is one of many toolboxes (also not free) that inter-operate with the core Matlab distribution:
There are also Graphic User Interface (GUI) Apps that allow you to explore things (e.g., deep learning, curve fitting, distribution fitting)

Discovering statistics objects

Matlab has been moving steadily towards the use of objects and object-oriented programming, which is a general design feature of modern languages and toolboxes. Objects are "collections" of variables with pre-defined properties and methods. The have a class, which means each object type is its own unique type of variable. Properties are data fields attached to the objects. Methods are functions that can be run on the object. Each object has a constructor method for the object type that builds and returns the object.
Python and R (and some other packages) work similarly, so what you do with Matlab will transfer to some degree.
For example, the code below gives you some help on a few of the common types of objects used in stats in Matlab, including their properties and methods:
% Table objects are the preferred way to store datasets
help table
table Table. Tables are used to collect heterogeneous data and metadata into a single container. Tables are suitable for storing column-oriented or tabular data that are often stored as columns in a text file or in a spreadsheet. Tables can accommodate variables of different types, sizes, units, etc. They are often used to store experimental data, with rows representing different observations and columns representing different measured variables. Use the table constructor to create a table from variables in the MATLAB workspace. Use the readtable function to create a table by reading data from a text or spreadsheet file. The table constructor can also be used to create tables without providing workspace variables, by providing the size and variable types. Tables can be subscripted using parentheses much like ordinary numeric arrays, but in addition to numeric and logical indices, you can use a table's variable names, row names, and patterns matching variable or row names as indices. You can access individual variables in a table much like fields in a structure, using dot subscripting. You can access the contents of one or more variables using brace subscripting. Tables can contain different kinds of variables, including numeric, logical, character, string, categorical, and cell. However, a table is a different class than the variables that it contains. For example, even a table that contains only variables that are double arrays cannot be operated on as if it were itself a double array. However, using dot subscripting, you can operate on a variable in a table as if it were a workspace variable. Using brace subscripting, you can operate on one or more variables in a table as if they were in a homogeneous array. A table T has properties that store metadata such as its variable and row names. Access or assign to a property using P = T.Properties.PropName or T.Properties.PropName = P, where PropName is one of the following: table metadata properties: Description - A character vector describing the table DimensionNames - A two-element cell array of character vectors containing names of the dimensions of the table VariableNames - A cell array containing names of the variables in the table VariableDescriptions - A cell array of character vectors containing descriptions of the variables in the table VariableUnits - A cell array of character vectors containing units for the variables in table VariableContinuity - An array containing a matlab.tabular.Continuity value for each table variable, specifying whether a variable represents continuous or discrete data values. You can assign 'unset', 'continuous', 'step', or 'event' to elements of VariableContinuity. RowNames - A cell array of nonempty, distinct character vectors containing names of the rows in the table UserData - A variable containing any additional information associated with the table. You can assign any value to this property. CustomProperties - A container for user-defined per-table or per-variable custom metadata fields. table methods and functions: Construction and conversion: table - Create a table from workspace variables. array2table - Convert homogeneous array to table. cell2table - Convert cell array to table. struct2table - Convert structure array to table. table2array - Convert table to a homogeneous array. table2cell - Convert table to cell array. table2struct - Convert table to structure array. Import and export: readtable - Create a table by reading from a file. writetable - Write a table to a file. write - Write a table to a file. Size and shape: istable - True for tables. size - Size of a table. width - Number of variables in a table. height - Number of rows in a table. ndims - Number of dimensions of a table. numel - Number of elements in a table. horzcat - Horizontal concatenation for tables. vertcat - Vertical concatenation for tables. Set membership: intersect - Find rows common to two tables. ismember - Find rows in one table that occur in another table. setdiff - Find rows that occur in one table but not in another. setxor - Find rows that occur in one or the other of two tables, but not both. unique - Find unique rows in a table. union - Find rows that occur in either of two tables. Data manipulation and reorganization: summary - Print summary of a table. addvars - Insert new variables at a specified location in a table. movevars - Move table variables to a specified location. removevars - Delete the specified table variables. splitvars - Splits multi-column variables into separate variables. mergevars - Merges multiple variables into one multi-column variable or a nested table. convertvars - Converts table variables to a specified data type. renamevars - Rename variables in table. sortrows - Sort rows of a table. stack - Stack up data from multiple variables into a single variable. unstack - Unstack data from a single variable into multiple variables. join - Merge two tables by matching up rows using key variables. innerjoin - Inner join between two tables. outerjoin - Outer join between two tables. rows2vars - Reorient rows to be variables of output table. inner2outer - Invert a nested table-in-table hierarchy. ismissing - Find elements in a table that contain missing values. standardizeMissing - Insert missing data indicators into a table. Computations on tables: varfun - Apply a function to variables in a table. rowfun - Apply a function to rows of a table. List of math operations that support tables. Subscripting into tables: vartype - Table variable subscripting by variable type. rowfilter - Table row filtering by variable data. Examples: % Create a table from individual workspace variables. load patients patients = table(LastName,Gender,Age,Height,Weight,Smoker,Systolic,Diastolic) % Select the rows for patients who smoke, and a subset of the variables. smokers = patients(patients.Smoker == true, {'LastName' 'Gender' 'Systolic' 'Diastolic'}) % Convert the two blood pressure variables into a single variable. patients.BloodPressure = [patients.Systolic patients.Diastolic]; patients(:,{'Systolic' 'Diastolic'}) = [] % Pick out two specific patients by the LastName variable. patients(ismember(patients.LastName,{'Smith' 'Jones'}), :) % Convert the LastName variable into row names. patients.Properties.RowNames = patients.LastName; patients.LastName = [] % Use the row names to pick out two specific patients. patients({'Smith' 'Jones'},:) % Add metadata to the table. patients.Properties.Description = 'Simulated patient data'; patients.Properties.VariableUnits = {'' 'Yrs' 'In' 'Lbs' '' 'mm Hg'}; patients.Properties.VariableDescriptions{6} = 'Systolic/Diastolic'; summary(patients) % Create a new variable in the table from existing variables. patients.BMI = (patients.Weight * 0.453592) ./ (patients.Height * 0.0254).^2 patients.Properties.VariableUnits{'BMI'} = 'kg/m^2'; patients.Properties.VariableDescriptions{'BMI'} = 'Body Mass Index'; % Sort the table based on the new variable. sortrows(patients,'BMI') % Make a scatter plot of two of the table's variables. plot(patients.Height,patients.Weight,'o') % Create tables from text and spreadsheet files patients2 = readtable('patients.dat','ReadRowNames',true) patients3 = readtable('patients.xls','ReadRowNames',true) % Create a table from a numeric matrix load tetmesh.mat t = array2table(X,'VariableNames',{'x' 'y' 'z'}); plot3(t.x,t.y,t.z,'.') See also table, categorical Documentation for table Other uses of table
% RepeatedMeasuresModel is an object related to ANOVA/RMANOVA/MANOVA models
help RepeatedMeasuresModel
RepeatedMeasuresModel - Repeated measures model. RM = FITRM(T,MODELSPEC) fits the model specified by MODELSPEC to data in the table T, and returns the RepeatedMeasuresModel RM. T is a table containing the values of the response variables and the between-subject factors to be used as predictors in the model. MODELSPEC specifies the response variable names and model terms as a string such as 'y1-y5 ~ x1 + x2 + x3*x4'. RepeatedMeasuresModel methods: anova - Analysis of variance for between-subject effects. coeftest - Hypothesis test on coefficients. margmean - Estimated marginal means. epsilon - Epsilon adjustment for repeated measures anova. grpstats - Descriptive statistics by group. manova - Multivariate analysis of variance. mauchly - Mauchly's test of sphericity. multcompare - Multiple comparisons of marginal means. plot - Plot data with optional grouping. plotprofile - Plot expected marginal means with optional grouping. predict - Compute predicted values. random - Generate new random response values. ranova - Repeated measures analysis of variance. RepeatedMeasuresModel properties: BetweenDesign - Design for between-subject factors. BetweenModel - Model for between-subject factors. BetweenFactorNames - Names of between-subject factors. ResponseNames - Names of responses. WithinDesign - Design for within-subject factors. WithinModel - Model for within-subject factors. WithinFactorNames - Names of within-subject factors. Coefficients - Table of estimated coefficients. Covariance - Table of estimated response covariances. DFE - Degrees of freedom for error. See also fitrm. Documentation for RepeatedMeasuresModel
You would use the function fitrm, for example, to fit a repeated measures model to some data and return a model object with estimated parameters (and other associated properties) as output.
% LinearMixedModel is an object related to mixed effects models (akin to
% lmer in R software)
help LinearMixedModel
LinearMixedModel Fitted linear mixed effects model. LME = FITLME(...) fits a linear mixed effects model to data. The fitted model LME is a LinearMixedModel modeling a response variable as a linear function of fixed effect predictors and random effect predictors. LinearMixedModel methods: coefCI - Coefficient confidence intervals coefTest - Linear hypothesis test on coefficients predict - Compute predicted values given predictor values random - Generate random response values given predictor values partialDependence - Partial Dependence values. plotPartialDependence - Plot Partial Dependence. plotResiduals - Plot of residuals designMatrix - Fixed and random effects design matrices fixedEffects - Stats on fixed effects randomEffects - Stats on random effects covarianceParameters - Stats on covariance parameters fitted - Fitted response residuals - Various types of residuals response - Response used to fit the model compare - Compare fitted models anova - Marginal tests for fixed effect terms disp - Display a fitted model LinearMixedModel properties: FitMethod - Method used for fitting (either 'ML' or 'REML') MSE - Mean squared error (estimate of residual variance) Formula - Representation of the model used in this fit LogLikelihood - Log of likelihood function at coefficient estimates DFE - Degrees of freedom for residuals SSE - Error sum of squares SST - Total sum of squares SSR - Regression sum of squares CoefficientCovariance - Covariance matrix for coefficient estimates CoefficientNames - Coefficient names NumCoefficients - Number of coefficients NumEstimatedCoefficients - Number of estimated coefficients Coefficients - Coefficients and related statistics Rsquared - R-squared and adjusted R-squared ModelCriterion - AIC and other model criteria VariableInfo - Information about variables used in the fit ObservationInfo - Information about observations used in the fit Variables - Table of variables used in fit NumVariables - Number of variables used in fit VariableNames - Names of variables used in fit NumPredictors - Number of predictors PredictorNames - Names of predictors ResponseName - Name of response NumObservations - Number of observations in the fit ObservationNames - Names of observations in the fit See also fitlme, fitlmematrix, LinearModel, GeneralizedLinearModel, NonLinearModel. Documentation for LinearMixedModel
You would use the function fitlme, for example, to fit a mixed effects model to data and return a model object with estimated parameters (and other associated properties) as output.
Related object types are: LinearModel, GeneralizedLinearModel, NonLinearModel
% ClassificationSVM is an object related to multivariate classification.
help ClassificationSVM
ClassificationSVM Support Vector Machine model for classification. ClassificationSVM is an SVM model for classification with one or two classes. This model can predict response for new data. This model also stores data used for training and can compute resubstitution predictions. An object of this class cannot be created by calling the constructor. Use FITCSVM to create a ClassificationSVM object by fitting an SVM model to training data. This class is derived from CompactClassificationSVM. ClassificationSVM properties: NumObservations - Number of observations. X - Matrix of predictors used to train this model. Y - True class labels used to train this model. W - Weights of observations used to train this model. ModelParameters - SVM parameters. PredictorNames - Names of predictors used for this model. ExpandedPredictorNames - Names of expanded predictors. ResponseName - Name of the response variable. ClassNames - Names of classes in Y. Cost - Misclassification costs. Prior - Prior class probabilities. ScoreTransform - Transformation applied to predicted classification scores. Alpha - Coefficients obtained by solving the dual problem. Beta - Coefficients for the primal linear problem. Bias - Bias term. KernelParameters - Kernel parameters. Mu - Predictor means. Sigma - Predictor standard deviations. SupportVectors - Support vectors. SupportVectorLabels - Support vector labels (+1 and -1). BoxConstraints - Box constraints. CacheInfo - Cache information. ConvergenceInfo - Convergence information. Gradient - Gradient values in the training data. IsSupportVector - Indices of support vectors in the training data. Nu - Nu parameter for one-class learning. NumIterations - Number of iterations taken by optimization. OutlierFraction - Expected fraction of outliers in the training data. ShrinkagePeriod - Number of iterations between reductions of the active set. Solver - Name of the used solver. RowsUsed - Logical index for rows used in fit. HyperparameterOptimizationResults - An object or table describing the results of hyperparameter optimization. ClassificationSVM methods: compact - Compact this model. compareHoldout - Compare two models using test data. crossval - Cross-validate this model. discardSupportVectors - Discard support vectors for linear SVM. edge - Classification edge. fitPosterior - Find transformation from SVM scores to class posterior probabilities. incrementalLearner - Return an incremental binary classification linear model. loss - Classification loss. margin - Classification margins. partialDependence - Partial Dependence values. plotPartialDependence - Plot Partial Dependence. predict - Predicted response of this model. resubEdge - Resubstitution classification edge. resubLoss - Resubstitution classification loss. resubMargin - Resubstitution classification margins. resubPredict - Resubstitution predicted response. resume - Resume training. Example: Train an SVM model on ionosphere data. load ionosphere svm = fitcsvm(X,Y,'KernelFunction','gaussian','standardize',true) See also fitcsvm, classreg.learning.classif.CompactClassificationSVM. Documentation for ClassificationSVM
Notice the list of related functions at the bottom, including fitcsvm, the constructor method for the object type that builds and returns the object. You may also see related object types.
The code below lists types of objects related to classification and regression:
disp('Classification models')
Classification models
classreg.learning.classificationModels
ans = 1×22 cell
'Tree' 'ByBinaryRegr''Discriminant''KNN' 'SVM' 'ECOC' 'NaiveBayes''Linear' 'Bag' 'AdaBoostM1''AdaBoostM2''AdaBoostMH''RobustBoost''LogitBoost''GentleBoost''Subspace' 'RUSBoost' 'LPBoost' 'TotalBoost''Kernel' 'NeuralNetwork''GAM'
 
disp(' ')
disp('Regression models')
Regression models
classreg.learning.regressionModels
ans = 1×10 cell
'Tree' 'Bag' 'LSBoost' 'Subspace' 'SVM' 'GP' 'Linear' 'Kernel' 'GAM' 'NeuralNetwork'
You can then get help on these other object classes using "Classification" or "Regression" joined with the type:
help ClassificationTree
help RegressionTree

Methods and properties

To get associated methods for an object, type "methods <objectClass>"
properties LinearMixedModel
Properties for class LinearMixedModel: FitMethod MSE VariableInfo NumVariables VariableNames NumPredictors PredictorNames ResponseName NumObservations ObservationInfo Variables ObservationNames Formula LogLikelihood DFE CoefficientCovariance CoefficientNames NumCoefficients Coefficients Rsquared ModelCriterion SSE SST SSR NumEstimatedCoefficients
methods LinearMixedModel
Methods for class LinearMixedModel: anova coefCI coefTest compare covarianceParameters designMatrix disp fitted fixedEffects partialDependence plotPartialDependence plotResiduals predict random randomEffects residuals response

Choices: Objects and classical functions

Trying out a model on a sample dataset

Many fitting methods include sample code in their help. e.g., type "help fitlme", and you get the code below. Try it, and you load a dataset and fit a linear mixed effects model with a random intercept:
load carsmall
ds = dataset(MPG,Weight,Model_Year);
lme = fitlme(ds,'MPG ~ Weight + (1|Model_Year)')
lme =
Linear mixed-effects model fit by ML Model information: Number of observations 94 Fixed effects coefficients 2 Random effects coefficients 3 Covariance parameters 2 Formula: MPG ~ 1 + Weight + (1 | Model_Year) Model fit statistics: AIC BIC LogLikelihood Deviance 486.09 496.26 -239.04 478.09 Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper {'(Intercept)'} 43.575 2.3038 18.915 92 1.8371e-33 39 48.151 {'Weight' } -0.0067097 0.0004242 -15.817 92 5.5373e-28 -0.0075522 -0.0058672 Random effects covariance parameters (95% CIs): Group: Model_Year (3 Levels) Name1 Name2 Type Estimate Lower Upper {'(Intercept)'} {'(Intercept)'} {'std'} 3.301 1.4448 7.5421 Group: Error Name Estimate Lower Upper {'Res Std'} 2.8997 2.5075 3.3532
Then, you can explore the model's properties and methods.
Let's display a couple of the properties of the estimated model:
lme.Rsquared % Variance explained
ans = struct with fields:
Ordinary: 0.8714 Adjusted: 0.8700
lme.MSE % Mean squared error
ans = 8.4083

Help and doc

Help <function name> brings up the help text for that function (or object class).
doc <function name> brings up an interactive browser with even more help and examples.
doc LinearMixedModel

Activity: Explore by fitting a statistics model to a dataset

Choose a model object or function.
Get the help on that function.
Use the help to fit the model.
What are the outputs? If an object, what are some of the properties and methods?