Documentation
Classes
LatentDirichletAllocation

LatentDirichletAllocation

Latent Dirichlet Allocation with online variational Bayes algorithm.

The implementation is based on [1] and [2].

Python Reference (opens in a new tab)

Constructors

constructor()

Signature

new LatentDirichletAllocation(opts?: object): LatentDirichletAllocation;

Parameters

NameTypeDescription
opts?object-
opts.batch_size?numberNumber of documents to use in each EM iteration. Only used in online learning. Default Value 128
opts.doc_topic_prior?numberPrior of document topic distribution theta. If the value is undefined, defaults to 1 / n\_components. In [1], this is called alpha.
opts.evaluate_every?numberHow often to evaluate perplexity. Only used in fit method. set it to 0 or negative number to not evaluate perplexity in training at all. Evaluating perplexity can help you check convergence in training process, but it will also increase total training time. Evaluating perplexity in every iteration might increase training time up to two-fold. Default Value -1
opts.learning_decay?numberIt is a parameter that control learning rate in the online learning method. The value should be set between (0.5, 1.0] to guarantee asymptotic convergence. When the value is 0.0 and batch_size is n\_samples, the update method is same as batch learning. In the literature, this is called kappa. Default Value 0.7
opts.learning_method?"batch" | "online"Method used to update \_component. Only used in fit method. In general, if the data size is large, the online update will be much faster than the batch update. Valid options: Default Value 'batch'
opts.learning_offset?numberA (positive) parameter that downweights early iterations in online learning. It should be greater than 1.0. In the literature, this is called tau_0. Default Value 10
opts.max_doc_update_iter?numberMax number of iterations for updating document topic distribution in the E-step. Default Value 100
opts.max_iter?numberThe maximum number of passes over the training data (aka epochs). It only impacts the behavior in the fit method, and not the partial\_fit method. Default Value 10
opts.mean_change_tol?numberStopping tolerance for updating document topic distribution in E-step. Default Value 0.001
opts.n_components?numberNumber of topics. Default Value 10
opts.n_jobs?numberThe number of jobs to use in the E-step. undefined means 1 unless in a joblib.parallel\_backend (opens in a new tab) context. \-1 means using all processors. See Glossary for more details.
opts.perp_tol?numberPerplexity tolerance in batch learning. Only used when evaluate\_every is greater than 0. Default Value 0.1
opts.random_state?numberPass an int for reproducible results across multiple function calls. See Glossary.
opts.topic_word_prior?numberPrior of topic word distribution beta. If the value is undefined, defaults to 1 / n\_components. In [1], this is called eta.
opts.total_samples?numberTotal number of documents. Only used in the partial\_fit method. Default Value 1000000
opts.verbose?numberVerbosity level. Default Value 0

Returns

LatentDirichletAllocation

Defined in: generated/decomposition/LatentDirichletAllocation.ts:23 (opens in a new tab)

Properties

_isDisposed

boolean = false

Defined in: generated/decomposition/LatentDirichletAllocation.ts:21 (opens in a new tab)

_isInitialized

boolean = false

Defined in: generated/decomposition/LatentDirichletAllocation.ts:20 (opens in a new tab)

_py

PythonBridge

Defined in: generated/decomposition/LatentDirichletAllocation.ts:19 (opens in a new tab)

id

string

Defined in: generated/decomposition/LatentDirichletAllocation.ts:16 (opens in a new tab)

opts

any

Defined in: generated/decomposition/LatentDirichletAllocation.ts:17 (opens in a new tab)

Accessors

bound_

Final perplexity score on training set.

Signature

bound_(): Promise<number>;

Returns

Promise<number>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:738 (opens in a new tab)

components_

Variational parameters for topic word distribution. Since the complete conditional for topic word distribution is a Dirichlet, components\_\[i, j\] can be viewed as pseudocount that represents the number of times word j was assigned to topic i. It can also be viewed as distribution over the words for each topic after normalization: model.components\_ / model.components\_.sum(axis=1)\[:, np.newaxis\].

Signature

components_(): Promise<ArrayLike[]>;

Returns

Promise<ArrayLike[]>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:576 (opens in a new tab)

doc_topic_prior_

Prior of document topic distribution theta. If the value is undefined, it is 1 / n\_components.

Signature

doc_topic_prior_(): Promise<number>;

Returns

Promise<number>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:765 (opens in a new tab)

exp_dirichlet_component_

Exponential value of expectation of log topic word distribution. In the literature, this is exp(E\[log(beta)\]).

Signature

exp_dirichlet_component_(): Promise<ArrayLike[]>;

Returns

Promise<ArrayLike[]>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:603 (opens in a new tab)

feature_names_in_

Names of features seen during fit. Defined only when X has feature names that are all strings.

Signature

feature_names_in_(): Promise<ArrayLike>;

Returns

Promise<ArrayLike>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:684 (opens in a new tab)

n_batch_iter_

Number of iterations of the EM step.

Signature

n_batch_iter_(): Promise<number>;

Returns

Promise<number>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:630 (opens in a new tab)

n_features_in_

Number of features seen during fit.

Signature

n_features_in_(): Promise<number>;

Returns

Promise<number>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:657 (opens in a new tab)

n_iter_

Number of passes over the dataset.

Signature

n_iter_(): Promise<number>;

Returns

Promise<number>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:711 (opens in a new tab)

py

Signature

py(): PythonBridge;

Returns

PythonBridge

Defined in: generated/decomposition/LatentDirichletAllocation.ts:134 (opens in a new tab)

Signature

py(pythonBridge: PythonBridge): void;

Parameters

NameType
pythonBridgePythonBridge

Returns

void

Defined in: generated/decomposition/LatentDirichletAllocation.ts:138 (opens in a new tab)

random_state_

RandomState instance that is generated either from a seed, the random number generator or by np.random.

Signature

random_state_(): Promise<any>;

Returns

Promise<any>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:792 (opens in a new tab)

topic_word_prior_

Prior of topic word distribution beta. If the value is undefined, it is 1 / n\_components.

Signature

topic_word_prior_(): Promise<number>;

Returns

Promise<number>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:819 (opens in a new tab)

Methods

dispose()

Disposes of the underlying Python resources.

Once dispose() is called, the instance is no longer usable.

Signature

dispose(): Promise<void>;

Returns

Promise<void>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:213 (opens in a new tab)

fit()

Learn model for the data X with variational Bayes method.

When learning\_method is ‘online’, use mini-batch update. Otherwise, use batch update.

Signature

fit(opts: object): Promise<any>;

Parameters

NameTypeDescription
optsobject-
opts.X?ArrayLikeDocument word matrix.
opts.y?anyNot used, present here for API consistency by convention.

Returns

Promise<any>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:232 (opens in a new tab)

fit_transform()

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit\_params and returns a transformed version of X.

Signature

fit_transform(opts: object): Promise<any[]>;

Parameters

NameTypeDescription
optsobject-
opts.X?ArrayLike[]Input samples.
opts.fit_params?anyAdditional fit parameters.
opts.y?ArrayLikeTarget values (undefined for unsupervised transformations).

Returns

Promise<any[]>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:274 (opens in a new tab)

get_feature_names_out()

Get output feature names for transformation.

The feature names out will prefixed by the lowercased class name. For example, if the transformer outputs 3 features, then the feature names out are: \["class\_name0", "class\_name1", "class\_name2"\].

Signature

get_feature_names_out(opts: object): Promise<any>;

Parameters

NameTypeDescription
optsobject-
opts.input_features?anyOnly used to validate feature names with the names seen in fit.

Returns

Promise<any>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:328 (opens in a new tab)

init()

Initializes the underlying Python resources.

This instance is not usable until the Promise returned by init() resolves.

Signature

init(py: PythonBridge): Promise<void>;

Parameters

NameType
pyPythonBridge

Returns

Promise<void>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:147 (opens in a new tab)

partial_fit()

Online VB with Mini-Batch update.

Signature

partial_fit(opts: object): Promise<any>;

Parameters

NameTypeDescription
optsobject-
opts.X?ArrayLikeDocument word matrix.
opts.y?anyNot used, present here for API consistency by convention.

Returns

Promise<any>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:366 (opens in a new tab)

perplexity()

Calculate approximate perplexity for data X.

Perplexity is defined as exp(-1. * log-likelihood per word)

Signature

perplexity(opts: object): Promise<number>;

Parameters

NameTypeDescription
optsobject-
opts.X?ArrayLikeDocument word matrix.
opts.sub_sampling?booleanDo sub-sampling or not.

Returns

Promise<number>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:411 (opens in a new tab)

score()

Calculate approximate log-likelihood as score.

Signature

score(opts: object): Promise<number>;

Parameters

NameTypeDescription
optsobject-
opts.X?ArrayLikeDocument word matrix.
opts.y?anyNot used, present here for API consistency by convention.

Returns

Promise<number>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:456 (opens in a new tab)

set_output()

Set output container.

See Introducing the set_output API for an example on how to use the API.

Signature

set_output(opts: object): Promise<any>;

Parameters

NameTypeDescription
optsobject-
opts.transform?"default" | "pandas"Configure output of transform and fit\_transform.

Returns

Promise<any>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:500 (opens in a new tab)

transform()

Transform data X according to the fitted model.

Signature

transform(opts: object): Promise<ArrayLike[]>;

Parameters

NameTypeDescription
optsobject-
opts.X?ArrayLikeDocument word matrix.

Returns

Promise<ArrayLike[]>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:538 (opens in a new tab)