LatentDirichletAllocation

Latent Dirichlet Allocation with online variational Bayes algorithm.

The implementation is based on [1] and [2].

Constructors

constructor()

Signature

new LatentDirichletAllocation(opts?: object): LatentDirichletAllocation;

Parameters

Name	Type	Description
`opts?`	`object`	-
`opts.batch_size?`	`number`	Number of documents to use in each EM iteration. Only used in online learning. `Default Value` `128`
`opts.doc_topic_prior?`	`number`	Prior of document topic distribution `theta`. If the value is `undefined`, defaults to `1 / n\_components`. In [1], this is called `alpha`.
`opts.evaluate_every?`	`number`	How often to evaluate perplexity. Only used in `fit` method. set it to 0 or negative number to not evaluate perplexity in training at all. Evaluating perplexity can help you check convergence in training process, but it will also increase total training time. Evaluating perplexity in every iteration might increase training time up to two-fold. `Default Value` `-1`
`opts.learning_decay?`	`number`	It is a parameter that control learning rate in the online learning method. The value should be set between (0.5, 1.0] to guarantee asymptotic convergence. When the value is 0.0 and batch_size is `n\_samples`, the update method is same as batch learning. In the literature, this is called kappa. `Default Value` `0.7`
`opts.learning_method?`	`"batch"` \| `"online"`	Method used to update `\_component`. Only used in `fit` method. In general, if the data size is large, the online update will be much faster than the batch update. Valid options: `Default Value` `'batch'`
`opts.learning_offset?`	`number`	A (positive) parameter that downweights early iterations in online learning. It should be greater than 1.0. In the literature, this is called tau_0. `Default Value` `10`
`opts.max_doc_update_iter?`	`number`	Max number of iterations for updating document topic distribution in the E-step. `Default Value` `100`
`opts.max_iter?`	`number`	The maximum number of passes over the training data (aka epochs). It only impacts the behavior in the `fit` method, and not the `partial\_fit` method. `Default Value` `10`
`opts.mean_change_tol?`	`number`	Stopping tolerance for updating document topic distribution in E-step. `Default Value` `0.001`
`opts.n_components?`	`number`	Number of topics. `Default Value` `10`
`opts.n_jobs?`	`number`	The number of jobs to use in the E-step. `undefined` means 1 unless in a `joblib.parallel\_backend` (opens in a new tab) context. `\-1` means using all processors. See Glossary for more details.
`opts.perp_tol?`	`number`	Perplexity tolerance in batch learning. Only used when `evaluate\_every` is greater than 0. `Default Value` `0.1`
`opts.random_state?`	`number`	Pass an int for reproducible results across multiple function calls. See Glossary.
`opts.topic_word_prior?`	`number`	Prior of topic word distribution `beta`. If the value is `undefined`, defaults to `1 / n\_components`. In [1], this is called `eta`.
`opts.total_samples?`	`number`	Total number of documents. Only used in the `partial\_fit` method. `Default Value` `1000000`
`opts.verbose?`	`number`	Verbosity level. `Default Value` `0`

Returns

LatentDirichletAllocation

Defined in: generated/decomposition/LatentDirichletAllocation.ts:23 (opens in a new tab)

Properties

_isDisposed

boolean = false

Defined in: generated/decomposition/LatentDirichletAllocation.ts:21 (opens in a new tab)

_isInitialized

boolean = false

Defined in: generated/decomposition/LatentDirichletAllocation.ts:20 (opens in a new tab)

_py

PythonBridge

Defined in: generated/decomposition/LatentDirichletAllocation.ts:19 (opens in a new tab)

id

string

Defined in: generated/decomposition/LatentDirichletAllocation.ts:16 (opens in a new tab)

opts

any

Defined in: generated/decomposition/LatentDirichletAllocation.ts:17 (opens in a new tab)

Accessors

bound_

Final perplexity score on training set.

Signature

bound_(): Promise<number>;

Returns

Promise<number>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:738 (opens in a new tab)

components_

Variational parameters for topic word distribution. Since the complete conditional for topic word distribution is a Dirichlet, components\_\[i, j\] can be viewed as pseudocount that represents the number of times word j was assigned to topic i. It can also be viewed as distribution over the words for each topic after normalization: model.components\_ / model.components\_.sum(axis=1)\[:, np.newaxis\].

Signature

components_(): Promise<ArrayLike[]>;

Returns

Promise<ArrayLike[]>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:576 (opens in a new tab)

doc_topic_prior_

Prior of document topic distribution theta. If the value is undefined, it is 1 / n\_components.

Signature

doc_topic_prior_(): Promise<number>;

Returns

Promise<number>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:765 (opens in a new tab)

exp_dirichlet_component_

Exponential value of expectation of log topic word distribution. In the literature, this is exp(E\[log(beta)\]).

Signature

exp_dirichlet_component_(): Promise<ArrayLike[]>;

Returns

Promise<ArrayLike[]>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:603 (opens in a new tab)

feature_names_in_

Names of features seen during fit. Defined only when X has feature names that are all strings.

Signature

feature_names_in_(): Promise<ArrayLike>;

Returns

Promise<ArrayLike>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:684 (opens in a new tab)

n_batch_iter_

Number of iterations of the EM step.

Signature

n_batch_iter_(): Promise<number>;

Returns

Promise<number>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:630 (opens in a new tab)

n_features_in_

Number of features seen during fit.

Signature

n_features_in_(): Promise<number>;

Returns

Promise<number>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:657 (opens in a new tab)

n_iter_

Number of passes over the dataset.

Signature

n_iter_(): Promise<number>;

Returns

Promise<number>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:711 (opens in a new tab)

py

Signature

py(): PythonBridge;

Returns

PythonBridge

Defined in: generated/decomposition/LatentDirichletAllocation.ts:134 (opens in a new tab)

Signature

py(pythonBridge: PythonBridge): void;

Parameters

Name	Type
`pythonBridge`	`PythonBridge`

Returns

void

Defined in: generated/decomposition/LatentDirichletAllocation.ts:138 (opens in a new tab)

random_state_

RandomState instance that is generated either from a seed, the random number generator or by np.random.

Signature

random_state_(): Promise<any>;

Returns

Promise<any>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:792 (opens in a new tab)

topic_word_prior_

Prior of topic word distribution beta. If the value is undefined, it is 1 / n\_components.

Signature

topic_word_prior_(): Promise<number>;

Returns

Promise<number>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:819 (opens in a new tab)

Methods

dispose()

Disposes of the underlying Python resources.

Once dispose() is called, the instance is no longer usable.

Signature

dispose(): Promise<void>;

Returns

Promise<void>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:213 (opens in a new tab)

fit()

Learn model for the data X with variational Bayes method.

When learning\_method is ‘online’, use mini-batch update. Otherwise, use batch update.

Signature

fit(opts: object): Promise<any>;

Parameters

Name	Type	Description
`opts`	`object`	-
`opts.X?`	`ArrayLike`	Document word matrix.
`opts.y?`	`any`	Not used, present here for API consistency by convention.

Returns

Promise<any>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:232 (opens in a new tab)

fit_transform()

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit\_params and returns a transformed version of X.

Signature

fit_transform(opts: object): Promise<any[]>;

Parameters

Name	Type	Description
`opts`	`object`	-
`opts.X?`	`ArrayLike`[]	Input samples.
`opts.fit_params?`	`any`	Additional fit parameters.
`opts.y?`	`ArrayLike`	Target values (`undefined` for unsupervised transformations).

Returns

Promise<any[]>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:274 (opens in a new tab)

get_feature_names_out()

Get output feature names for transformation.

The feature names out will prefixed by the lowercased class name. For example, if the transformer outputs 3 features, then the feature names out are: \["class\_name0", "class\_name1", "class\_name2"\].

Signature

get_feature_names_out(opts: object): Promise<any>;

Parameters

Name	Type	Description
`opts`	`object`	-
`opts.input_features?`	`any`	Only used to validate feature names with the names seen in `fit`.

Returns

Promise<any>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:328 (opens in a new tab)

init()

Initializes the underlying Python resources.

This instance is not usable until the Promise returned by init() resolves.

Signature

init(py: PythonBridge): Promise<void>;

Parameters

Name	Type
`py`	`PythonBridge`

Returns

Promise<void>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:147 (opens in a new tab)

partial_fit()

Online VB with Mini-Batch update.

Signature

partial_fit(opts: object): Promise<any>;

Parameters

Name	Type	Description
`opts`	`object`	-
`opts.X?`	`ArrayLike`	Document word matrix.
`opts.y?`	`any`	Not used, present here for API consistency by convention.

Returns

Promise<any>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:366 (opens in a new tab)

perplexity()

Calculate approximate perplexity for data X.

Perplexity is defined as exp(-1. * log-likelihood per word)

Signature

perplexity(opts: object): Promise<number>;

Parameters

Name	Type	Description
`opts`	`object`	-
`opts.X?`	`ArrayLike`	Document word matrix.
`opts.sub_sampling?`	`boolean`	Do sub-sampling or not.

Returns

Promise<number>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:411 (opens in a new tab)

score()

Calculate approximate log-likelihood as score.

Signature

score(opts: object): Promise<number>;

Parameters

Name	Type	Description
`opts`	`object`	-
`opts.X?`	`ArrayLike`	Document word matrix.
`opts.y?`	`any`	Not used, present here for API consistency by convention.

Returns

Promise<number>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:456 (opens in a new tab)

set_output()

Set output container.

See Introducing the set_output API for an example on how to use the API.

Signature

set_output(opts: object): Promise<any>;

Parameters

Name	Type	Description
`opts`	`object`	-
`opts.transform?`	`"default"` \| `"pandas"`	Configure output of `transform` and `fit\_transform`.

Returns

Promise<any>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:500 (opens in a new tab)

transform()

Transform data X according to the fitted model.

Signature

transform(opts: object): Promise<ArrayLike[]>;

Parameters

Name	Type	Description
`opts`	`object`	-
`opts.X?`	`ArrayLike`	Document word matrix.

Returns

Promise<ArrayLike[]>

Defined in: generated/decomposition/LatentDirichletAllocation.ts:538 (opens in a new tab)

LassoLarsIC LearningCurveDisplay