Documentation
Classes
DictVectorizer

DictVectorizer

Transforms lists of feature-value mappings to vectors.

This transformer turns lists of mappings (dict-like objects) of feature names to feature values into Numpy arrays or scipy.sparse matrices for use with scikit-learn estimators.

When feature values are strings, this transformer will do a binary one-hot (aka one-of-K) coding: one boolean-valued feature is constructed for each of the possible string values that the feature can take on. For instance, a feature “f” that can take on the values “ham” and “spam” will become two features in the output, one signifying “f=ham”, the other “f=spam”.

If a feature value is a sequence or set of strings, this transformer will iterate over the values and will count the occurrences of each string value.

However, note that this transformer will only do a binary one-hot encoding when feature values are of type string. If categorical features are represented as numeric values such as int or iterables of strings, the DictVectorizer can be followed by OneHotEncoder to complete binary one-hot encoding.

Features that do not occur in a sample (mapping) will have a zero value in the resulting array/matrix.

Read more in the User Guide.

Python Reference (opens in a new tab)

Constructors

constructor()

Signature

new DictVectorizer(opts?: object): DictVectorizer;

Parameters

NameTypeDescription
opts?object-
opts.dtype?anyThe type of feature values. Passed to Numpy array/scipy.sparse matrix constructors as the dtype argument.
opts.separator?stringSeparator string used when constructing new features for one-hot coding. Default Value '='
opts.sort?booleanWhether feature\_names\_ and vocabulary\_ should be sorted when fitting. Default Value true
opts.sparse?booleanWhether transform should produce scipy.sparse matrices. Default Value true

Returns

DictVectorizer

Defined in: generated/feature_extraction/DictVectorizer.ts:33 (opens in a new tab)

Properties

_isDisposed

boolean = false

Defined in: generated/feature_extraction/DictVectorizer.ts:31 (opens in a new tab)

_isInitialized

boolean = false

Defined in: generated/feature_extraction/DictVectorizer.ts:30 (opens in a new tab)

_py

PythonBridge

Defined in: generated/feature_extraction/DictVectorizer.ts:29 (opens in a new tab)

id

string

Defined in: generated/feature_extraction/DictVectorizer.ts:26 (opens in a new tab)

opts

any

Defined in: generated/feature_extraction/DictVectorizer.ts:27 (opens in a new tab)

Accessors

feature_names_

A list of length n_features containing the feature names (e.g., “f=ham” and “f=spam”).

Signature

feature_names_(): Promise<any[]>;

Returns

Promise<any[]>

Defined in: generated/feature_extraction/DictVectorizer.ts:433 (opens in a new tab)

py

Signature

py(): PythonBridge;

Returns

PythonBridge

Defined in: generated/feature_extraction/DictVectorizer.ts:64 (opens in a new tab)

Signature

py(pythonBridge: PythonBridge): void;

Parameters

NameType
pythonBridgePythonBridge

Returns

void

Defined in: generated/feature_extraction/DictVectorizer.ts:68 (opens in a new tab)

vocabulary_

A dictionary mapping feature names to feature indices.

Signature

vocabulary_(): Promise<any>;

Returns

Promise<any>

Defined in: generated/feature_extraction/DictVectorizer.ts:408 (opens in a new tab)

Methods

dispose()

Disposes of the underlying Python resources.

Once dispose() is called, the instance is no longer usable.

Signature

dispose(): Promise<void>;

Returns

Promise<void>

Defined in: generated/feature_extraction/DictVectorizer.ts:119 (opens in a new tab)

fit()

Learn a list of feature name -> indices mappings.

Signature

fit(opts: object): Promise<any>;

Parameters

NameTypeDescription
optsobject-
opts.X?anyDict(s) or Mapping(s) from feature names (arbitrary Python objects) to feature values (strings or convertible to dtype).
opts.y?anyIgnored parameter.

Returns

Promise<any>

Defined in: generated/feature_extraction/DictVectorizer.ts:136 (opens in a new tab)

fit_transform()

Learn a list of feature name -> indices mappings and transform X.

Like fit(X) followed by transform(X), but does not require materializing X in memory.

Signature

fit_transform(opts: object): Promise<any>;

Parameters

NameTypeDescription
optsobject-
opts.X?anyDict(s) or Mapping(s) from feature names (arbitrary Python objects) to feature values (strings or convertible to dtype).
opts.y?anyIgnored parameter.

Returns

Promise<any>

Defined in: generated/feature_extraction/DictVectorizer.ts:176 (opens in a new tab)

get_feature_names_out()

Get output feature names for transformation.

Signature

get_feature_names_out(opts: object): Promise<any>;

Parameters

NameTypeDescription
optsobject-
opts.input_features?anyNot used, present here for API consistency by convention.

Returns

Promise<any>

Defined in: generated/feature_extraction/DictVectorizer.ts:214 (opens in a new tab)

init()

Initializes the underlying Python resources.

This instance is not usable until the Promise returned by init() resolves.

Signature

init(py: PythonBridge): Promise<void>;

Parameters

NameType
pyPythonBridge

Returns

Promise<void>

Defined in: generated/feature_extraction/DictVectorizer.ts:77 (opens in a new tab)

inverse_transform()

Transform array or sparse matrix X back to feature mappings.

X must have been produced by this DictVectorizer’s transform or fit_transform method; it may only have passed through transformers that preserve the number of features and their order.

In the case of one-hot/one-of-K coding, the constructed feature names and values are returned rather than the original ones.

Signature

inverse_transform(opts: object): Promise<any[]>;

Parameters

NameTypeDescription
optsobject-
opts.X?ArrayLikeSample matrix.
opts.dict_type?anyConstructor for feature mappings. Must conform to the collections.Mapping API.

Returns

Promise<any[]>

Defined in: generated/feature_extraction/DictVectorizer.ts:254 (opens in a new tab)

restrict()

Restrict the features to those in support using feature selection.

This function modifies the estimator in-place.

Signature

restrict(opts: object): Promise<any>;

Parameters

NameTypeDescription
optsobject-
opts.indices?booleanWhether support is a list of indices. Default Value false
opts.support?ArrayLikeBoolean mask or list of indices (as returned by the get_support member of feature selectors).

Returns

Promise<any>

Defined in: generated/feature_extraction/DictVectorizer.ts:298 (opens in a new tab)

set_output()

Set output container.

See Introducing the set_output API for an example on how to use the API.

Signature

set_output(opts: object): Promise<any>;

Parameters

NameTypeDescription
optsobject-
opts.transform?"default" | "pandas"Configure output of transform and fit\_transform.

Returns

Promise<any>

Defined in: generated/feature_extraction/DictVectorizer.ts:340 (opens in a new tab)

transform()

Transform feature->value dicts to array or sparse matrix.

Named features not encountered during fit or fit_transform will be silently ignored.

Signature

transform(opts: object): Promise<any>;

Parameters

NameTypeDescription
optsobject-
opts.X?any[]Dict(s) or Mapping(s) from feature names (arbitrary Python objects) to feature values (strings or convertible to dtype).

Returns

Promise<any>

Defined in: generated/feature_extraction/DictVectorizer.ts:375 (opens in a new tab)