DictVectorizer
Transforms lists of feature-value mappings to vectors.
This transformer turns lists of mappings (dict-like objects) of feature names to feature values into Numpy arrays or scipy.sparse matrices for use with scikit-learn estimators.
When feature values are strings, this transformer will do a binary one-hot (aka one-of-K) coding: one boolean-valued feature is constructed for each of the possible string values that the feature can take on. For instance, a feature “f” that can take on the values “ham” and “spam” will become two features in the output, one signifying “f=ham”, the other “f=spam”.
If a feature value is a sequence or set of strings, this transformer will iterate over the values and will count the occurrences of each string value.
However, note that this transformer will only do a binary one-hot encoding when feature values are of type string. If categorical features are represented as numeric values such as int or iterables of strings, the DictVectorizer can be followed by OneHotEncoder
to complete binary one-hot encoding.
Features that do not occur in a sample (mapping) will have a zero value in the resulting array/matrix.
Read more in the User Guide.
Python Reference (opens in a new tab)
Constructors
constructor()
Signature
new DictVectorizer(opts?: object): DictVectorizer;
Parameters
Name | Type | Description |
---|---|---|
opts? | object | - |
opts.dtype? | any | The type of feature values. Passed to Numpy array/scipy.sparse matrix constructors as the dtype argument. |
opts.separator? | string | Separator string used when constructing new features for one-hot coding. Default Value '=' |
opts.sort? | boolean | Whether feature\_names\_ and vocabulary\_ should be sorted when fitting. Default Value true |
opts.sparse? | boolean | Whether transform should produce scipy.sparse matrices. Default Value true |
Returns
Defined in: generated/feature_extraction/DictVectorizer.ts:33 (opens in a new tab)
Properties
_isDisposed
boolean
=false
Defined in: generated/feature_extraction/DictVectorizer.ts:31 (opens in a new tab)
_isInitialized
boolean
=false
Defined in: generated/feature_extraction/DictVectorizer.ts:30 (opens in a new tab)
_py
PythonBridge
Defined in: generated/feature_extraction/DictVectorizer.ts:29 (opens in a new tab)
id
string
Defined in: generated/feature_extraction/DictVectorizer.ts:26 (opens in a new tab)
opts
any
Defined in: generated/feature_extraction/DictVectorizer.ts:27 (opens in a new tab)
Accessors
feature_names_
A list of length n_features containing the feature names (e.g., “f=ham” and “f=spam”).
Signature
feature_names_(): Promise<any[]>;
Returns
Promise
<any
[]>
Defined in: generated/feature_extraction/DictVectorizer.ts:433 (opens in a new tab)
py
Signature
py(): PythonBridge;
Returns
PythonBridge
Defined in: generated/feature_extraction/DictVectorizer.ts:64 (opens in a new tab)
Signature
py(pythonBridge: PythonBridge): void;
Parameters
Name | Type |
---|---|
pythonBridge | PythonBridge |
Returns
void
Defined in: generated/feature_extraction/DictVectorizer.ts:68 (opens in a new tab)
vocabulary_
A dictionary mapping feature names to feature indices.
Signature
vocabulary_(): Promise<any>;
Returns
Promise
<any
>
Defined in: generated/feature_extraction/DictVectorizer.ts:408 (opens in a new tab)
Methods
dispose()
Disposes of the underlying Python resources.
Once dispose()
is called, the instance is no longer usable.
Signature
dispose(): Promise<void>;
Returns
Promise
<void
>
Defined in: generated/feature_extraction/DictVectorizer.ts:119 (opens in a new tab)
fit()
Learn a list of feature name -> indices mappings.
Signature
fit(opts: object): Promise<any>;
Parameters
Name | Type | Description |
---|---|---|
opts | object | - |
opts.X? | any | Dict(s) or Mapping(s) from feature names (arbitrary Python objects) to feature values (strings or convertible to dtype). |
opts.y? | any | Ignored parameter. |
Returns
Promise
<any
>
Defined in: generated/feature_extraction/DictVectorizer.ts:136 (opens in a new tab)
fit_transform()
Learn a list of feature name -> indices mappings and transform X.
Like fit(X) followed by transform(X), but does not require materializing X in memory.
Signature
fit_transform(opts: object): Promise<any>;
Parameters
Name | Type | Description |
---|---|---|
opts | object | - |
opts.X? | any | Dict(s) or Mapping(s) from feature names (arbitrary Python objects) to feature values (strings or convertible to dtype). |
opts.y? | any | Ignored parameter. |
Returns
Promise
<any
>
Defined in: generated/feature_extraction/DictVectorizer.ts:176 (opens in a new tab)
get_feature_names_out()
Get output feature names for transformation.
Signature
get_feature_names_out(opts: object): Promise<any>;
Parameters
Name | Type | Description |
---|---|---|
opts | object | - |
opts.input_features? | any | Not used, present here for API consistency by convention. |
Returns
Promise
<any
>
Defined in: generated/feature_extraction/DictVectorizer.ts:214 (opens in a new tab)
init()
Initializes the underlying Python resources.
This instance is not usable until the Promise
returned by init()
resolves.
Signature
init(py: PythonBridge): Promise<void>;
Parameters
Name | Type |
---|---|
py | PythonBridge |
Returns
Promise
<void
>
Defined in: generated/feature_extraction/DictVectorizer.ts:77 (opens in a new tab)
inverse_transform()
Transform array or sparse matrix X back to feature mappings.
X must have been produced by this DictVectorizer’s transform or fit_transform method; it may only have passed through transformers that preserve the number of features and their order.
In the case of one-hot/one-of-K coding, the constructed feature names and values are returned rather than the original ones.
Signature
inverse_transform(opts: object): Promise<any[]>;
Parameters
Name | Type | Description |
---|---|---|
opts | object | - |
opts.X? | ArrayLike | Sample matrix. |
opts.dict_type? | any | Constructor for feature mappings. Must conform to the collections.Mapping API. |
Returns
Promise
<any
[]>
Defined in: generated/feature_extraction/DictVectorizer.ts:254 (opens in a new tab)
restrict()
Restrict the features to those in support using feature selection.
This function modifies the estimator in-place.
Signature
restrict(opts: object): Promise<any>;
Parameters
Name | Type | Description |
---|---|---|
opts | object | - |
opts.indices? | boolean | Whether support is a list of indices. Default Value false |
opts.support? | ArrayLike | Boolean mask or list of indices (as returned by the get_support member of feature selectors). |
Returns
Promise
<any
>
Defined in: generated/feature_extraction/DictVectorizer.ts:298 (opens in a new tab)
set_output()
Set output container.
See Introducing the set_output API for an example on how to use the API.
Signature
set_output(opts: object): Promise<any>;
Parameters
Name | Type | Description |
---|---|---|
opts | object | - |
opts.transform? | "default" | "pandas" | Configure output of transform and fit\_transform . |
Returns
Promise
<any
>
Defined in: generated/feature_extraction/DictVectorizer.ts:340 (opens in a new tab)
transform()
Transform feature->value dicts to array or sparse matrix.
Named features not encountered during fit or fit_transform will be silently ignored.
Signature
transform(opts: object): Promise<any>;
Parameters
Name | Type | Description |
---|---|---|
opts | object | - |
opts.X? | any [] | Dict(s) or Mapping(s) from feature names (arbitrary Python objects) to feature values (strings or convertible to dtype). |
Returns
Promise
<any
>
Defined in: generated/feature_extraction/DictVectorizer.ts:375 (opens in a new tab)