LatentDirichletAllocation
Latent Dirichlet Allocation with online variational Bayes algorithm.
The implementation is based on [1] and [2].
Python Reference (opens in a new tab)
Constructors
constructor()
Signature
new LatentDirichletAllocation(opts?: object): LatentDirichletAllocation;
Parameters
Name | Type | Description |
---|---|---|
opts? | object | - |
opts.batch_size? | number | Number of documents to use in each EM iteration. Only used in online learning. Default Value 128 |
opts.doc_topic_prior? | number | Prior of document topic distribution theta . If the value is undefined , defaults to 1 / n\_components . In [1], this is called alpha . |
opts.evaluate_every? | number | How often to evaluate perplexity. Only used in fit method. set it to 0 or negative number to not evaluate perplexity in training at all. Evaluating perplexity can help you check convergence in training process, but it will also increase total training time. Evaluating perplexity in every iteration might increase training time up to two-fold. Default Value -1 |
opts.learning_decay? | number | It is a parameter that control learning rate in the online learning method. The value should be set between (0.5, 1.0] to guarantee asymptotic convergence. When the value is 0.0 and batch_size is n\_samples , the update method is same as batch learning. In the literature, this is called kappa. Default Value 0.7 |
opts.learning_method? | "batch" | "online" | Method used to update \_component . Only used in fit method. In general, if the data size is large, the online update will be much faster than the batch update. Valid options: Default Value 'batch' |
opts.learning_offset? | number | A (positive) parameter that downweights early iterations in online learning. It should be greater than 1.0. In the literature, this is called tau_0. Default Value 10 |
opts.max_doc_update_iter? | number | Max number of iterations for updating document topic distribution in the E-step. Default Value 100 |
opts.max_iter? | number | The maximum number of passes over the training data (aka epochs). It only impacts the behavior in the fit method, and not the partial\_fit method. Default Value 10 |
opts.mean_change_tol? | number | Stopping tolerance for updating document topic distribution in E-step. Default Value 0.001 |
opts.n_components? | number | Number of topics. Default Value 10 |
opts.n_jobs? | number | The number of jobs to use in the E-step. undefined means 1 unless in a joblib.parallel\_backend (opens in a new tab) context. \-1 means using all processors. See Glossary for more details. |
opts.perp_tol? | number | Perplexity tolerance in batch learning. Only used when evaluate\_every is greater than 0. Default Value 0.1 |
opts.random_state? | number | Pass an int for reproducible results across multiple function calls. See Glossary. |
opts.topic_word_prior? | number | Prior of topic word distribution beta . If the value is undefined , defaults to 1 / n\_components . In [1], this is called eta . |
opts.total_samples? | number | Total number of documents. Only used in the partial\_fit method. Default Value 1000000 |
opts.verbose? | number | Verbosity level. Default Value 0 |
Returns
Defined in: generated/decomposition/LatentDirichletAllocation.ts:23 (opens in a new tab)
Properties
_isDisposed
boolean
=false
Defined in: generated/decomposition/LatentDirichletAllocation.ts:21 (opens in a new tab)
_isInitialized
boolean
=false
Defined in: generated/decomposition/LatentDirichletAllocation.ts:20 (opens in a new tab)
_py
PythonBridge
Defined in: generated/decomposition/LatentDirichletAllocation.ts:19 (opens in a new tab)
id
string
Defined in: generated/decomposition/LatentDirichletAllocation.ts:16 (opens in a new tab)
opts
any
Defined in: generated/decomposition/LatentDirichletAllocation.ts:17 (opens in a new tab)
Accessors
bound_
Final perplexity score on training set.
Signature
bound_(): Promise<number>;
Returns
Promise
<number
>
Defined in: generated/decomposition/LatentDirichletAllocation.ts:738 (opens in a new tab)
components_
Variational parameters for topic word distribution. Since the complete conditional for topic word distribution is a Dirichlet, components\_\[i, j\]
can be viewed as pseudocount that represents the number of times word j
was assigned to topic i
. It can also be viewed as distribution over the words for each topic after normalization: model.components\_ / model.components\_.sum(axis=1)\[:, np.newaxis\]
.
Signature
components_(): Promise<ArrayLike[]>;
Returns
Promise
<ArrayLike
[]>
Defined in: generated/decomposition/LatentDirichletAllocation.ts:576 (opens in a new tab)
doc_topic_prior_
Prior of document topic distribution theta
. If the value is undefined
, it is 1 / n\_components
.
Signature
doc_topic_prior_(): Promise<number>;
Returns
Promise
<number
>
Defined in: generated/decomposition/LatentDirichletAllocation.ts:765 (opens in a new tab)
exp_dirichlet_component_
Exponential value of expectation of log topic word distribution. In the literature, this is exp(E\[log(beta)\])
.
Signature
exp_dirichlet_component_(): Promise<ArrayLike[]>;
Returns
Promise
<ArrayLike
[]>
Defined in: generated/decomposition/LatentDirichletAllocation.ts:603 (opens in a new tab)
feature_names_in_
Names of features seen during fit. Defined only when X
has feature names that are all strings.
Signature
feature_names_in_(): Promise<ArrayLike>;
Returns
Promise
<ArrayLike
>
Defined in: generated/decomposition/LatentDirichletAllocation.ts:684 (opens in a new tab)
n_batch_iter_
Number of iterations of the EM step.
Signature
n_batch_iter_(): Promise<number>;
Returns
Promise
<number
>
Defined in: generated/decomposition/LatentDirichletAllocation.ts:630 (opens in a new tab)
n_features_in_
Number of features seen during fit.
Signature
n_features_in_(): Promise<number>;
Returns
Promise
<number
>
Defined in: generated/decomposition/LatentDirichletAllocation.ts:657 (opens in a new tab)
n_iter_
Number of passes over the dataset.
Signature
n_iter_(): Promise<number>;
Returns
Promise
<number
>
Defined in: generated/decomposition/LatentDirichletAllocation.ts:711 (opens in a new tab)
py
Signature
py(): PythonBridge;
Returns
PythonBridge
Defined in: generated/decomposition/LatentDirichletAllocation.ts:134 (opens in a new tab)
Signature
py(pythonBridge: PythonBridge): void;
Parameters
Name | Type |
---|---|
pythonBridge | PythonBridge |
Returns
void
Defined in: generated/decomposition/LatentDirichletAllocation.ts:138 (opens in a new tab)
random_state_
RandomState instance that is generated either from a seed, the random number generator or by np.random
.
Signature
random_state_(): Promise<any>;
Returns
Promise
<any
>
Defined in: generated/decomposition/LatentDirichletAllocation.ts:792 (opens in a new tab)
topic_word_prior_
Prior of topic word distribution beta
. If the value is undefined
, it is 1 / n\_components
.
Signature
topic_word_prior_(): Promise<number>;
Returns
Promise
<number
>
Defined in: generated/decomposition/LatentDirichletAllocation.ts:819 (opens in a new tab)
Methods
dispose()
Disposes of the underlying Python resources.
Once dispose()
is called, the instance is no longer usable.
Signature
dispose(): Promise<void>;
Returns
Promise
<void
>
Defined in: generated/decomposition/LatentDirichletAllocation.ts:213 (opens in a new tab)
fit()
Learn model for the data X with variational Bayes method.
When learning\_method
is ‘online’, use mini-batch update. Otherwise, use batch update.
Signature
fit(opts: object): Promise<any>;
Parameters
Name | Type | Description |
---|---|---|
opts | object | - |
opts.X? | ArrayLike | Document word matrix. |
opts.y? | any | Not used, present here for API consistency by convention. |
Returns
Promise
<any
>
Defined in: generated/decomposition/LatentDirichletAllocation.ts:232 (opens in a new tab)
fit_transform()
Fit to data, then transform it.
Fits transformer to X
and y
with optional parameters fit\_params
and returns a transformed version of X
.
Signature
fit_transform(opts: object): Promise<any[]>;
Parameters
Name | Type | Description |
---|---|---|
opts | object | - |
opts.X? | ArrayLike [] | Input samples. |
opts.fit_params? | any | Additional fit parameters. |
opts.y? | ArrayLike | Target values (undefined for unsupervised transformations). |
Returns
Promise
<any
[]>
Defined in: generated/decomposition/LatentDirichletAllocation.ts:274 (opens in a new tab)
get_feature_names_out()
Get output feature names for transformation.
The feature names out will prefixed by the lowercased class name. For example, if the transformer outputs 3 features, then the feature names out are: \["class\_name0", "class\_name1", "class\_name2"\]
.
Signature
get_feature_names_out(opts: object): Promise<any>;
Parameters
Name | Type | Description |
---|---|---|
opts | object | - |
opts.input_features? | any | Only used to validate feature names with the names seen in fit . |
Returns
Promise
<any
>
Defined in: generated/decomposition/LatentDirichletAllocation.ts:328 (opens in a new tab)
init()
Initializes the underlying Python resources.
This instance is not usable until the Promise
returned by init()
resolves.
Signature
init(py: PythonBridge): Promise<void>;
Parameters
Name | Type |
---|---|
py | PythonBridge |
Returns
Promise
<void
>
Defined in: generated/decomposition/LatentDirichletAllocation.ts:147 (opens in a new tab)
partial_fit()
Online VB with Mini-Batch update.
Signature
partial_fit(opts: object): Promise<any>;
Parameters
Name | Type | Description |
---|---|---|
opts | object | - |
opts.X? | ArrayLike | Document word matrix. |
opts.y? | any | Not used, present here for API consistency by convention. |
Returns
Promise
<any
>
Defined in: generated/decomposition/LatentDirichletAllocation.ts:366 (opens in a new tab)
perplexity()
Calculate approximate perplexity for data X.
Perplexity is defined as exp(-1. * log-likelihood per word)
Signature
perplexity(opts: object): Promise<number>;
Parameters
Name | Type | Description |
---|---|---|
opts | object | - |
opts.X? | ArrayLike | Document word matrix. |
opts.sub_sampling? | boolean | Do sub-sampling or not. |
Returns
Promise
<number
>
Defined in: generated/decomposition/LatentDirichletAllocation.ts:411 (opens in a new tab)
score()
Calculate approximate log-likelihood as score.
Signature
score(opts: object): Promise<number>;
Parameters
Name | Type | Description |
---|---|---|
opts | object | - |
opts.X? | ArrayLike | Document word matrix. |
opts.y? | any | Not used, present here for API consistency by convention. |
Returns
Promise
<number
>
Defined in: generated/decomposition/LatentDirichletAllocation.ts:456 (opens in a new tab)
set_output()
Set output container.
See Introducing the set_output API for an example on how to use the API.
Signature
set_output(opts: object): Promise<any>;
Parameters
Name | Type | Description |
---|---|---|
opts | object | - |
opts.transform? | "default" | "pandas" | Configure output of transform and fit\_transform . |
Returns
Promise
<any
>
Defined in: generated/decomposition/LatentDirichletAllocation.ts:500 (opens in a new tab)
transform()
Transform data X according to the fitted model.
Signature
transform(opts: object): Promise<ArrayLike[]>;
Parameters
Name | Type | Description |
---|---|---|
opts | object | - |
opts.X? | ArrayLike | Document word matrix. |
Returns
Promise
<ArrayLike
[]>
Defined in: generated/decomposition/LatentDirichletAllocation.ts:538 (opens in a new tab)