lstmProjectedLayer
Long short-term memory (LSTM) projected layer for recurrent neural network (RNN)
Since R2022b
Description
An LSTM projected layer is an RNN layer that learns long-term dependencies between time steps in time series and sequence data using projected learnable weights.
To compress a deep learning network, you can useprojected layers. A projected layer is a type of deep learning layer that enables compression by reducing the number of stored learnable parameters. The layer introduces learnable projector matricesQ, replaces multiplications of the form , whereWis a learnable matrix, with the multiplication , and storesQand instead of storingW. Projectingxinto a lower-dimensional space usingQtypically requires less memory to store the learnable parameters and can have similarly strong prediction accuracy.
Reducing the number of learnable parameters by projecting an LSTM layer rather than reducing the number of hidden units of the LSTM layer maintains the output size of the layer and, in turn, the sizes of the downstream layers, which can result in better prediction accuracy.
Creation
Syntax
Description
creates an LSTM projected layer and sets thelayer
= lstmProjectedLayer(numHiddenUnits
,outputProjectorSize
,inputProjectorSize
)NumHiddenUnits
,OutputProjectorSize
, andInputProjectorSize
properties.
sets thelayer
= lstmProjectedLayer(___,Name=Value
)OutputMode
,HasStateInputs
,HasStateOutputs
,Activations,State,Parameters and Initialization,Learning Rate and Regularization, andName
properties using one or more name-value arguments.
Properties
Projected LSTM
NumHiddenUnits
—Number of hidden units
positive integer
This property is read-only.
Number of hidden units (also known as the hidden size), specified as a positive integer.
对应于amou隐藏单位的数量nt of information that the layer remembers between time steps (the hidden state). The hidden state can contain information from all the previous time steps, regardless of the sequence length. If the number of hidden units is too large, then the layer might overfit to the training data.
The hidden state does not limit the number of time steps that the layer processes in an iteration. To split your sequences into smaller sequences for when you use thetrainNetwork
function, use theSequenceLength
training option.
The layer outputs data withNumHiddenUnits
channels.
Data Types:single
|double
|int8
|int16
|int32
|int64
|uint8
|uint16
|uint32
|uint64
OutputProjectorSize
—Output projector size
positive integer
This property is read-only.
Output projector size, specified as a positive integer.
The LSTM layer operation uses four matrix multiplications of the form , whereRdenotes the recurrent weights andhtdenotes the hidden state (or, equivalently, the layer output) at time stept.
The LSTM projected layer operation instead uses multiplications of the from
, whereQois aNumHiddenUnits
-by-OutputProjectorSize
matrix known as theoutput projector. The layer uses the same projectorQofor each of the four multiplications.
To perform the four multiplications of the form
, an LSTM layer stores four recurrent weightsR, which necessitates storing4*NumHiddenUnits^2
learnable parameters. By instead storing the4*NumHiddenUnits
-by-OutputProjectorSize
matrix
andQo, an LSTM projected layer can perform the multiplication
and store only5*NumHiddenUnits*OutputProjectorSize
learnable parameters.
Tip
To ensure that
requires fewer learnable parameters, set theOutputProjectorSize
property to a value less than(4/5)*NumHiddenUnits
.
Data Types:single
|double
|int8
|int16
|int32
|int64
|uint8
|uint16
|uint32
|uint64
InputProjectorSize
—Input projector size
positive integer
This property is read-only.
Input projector size, specified as a positive integer.
The LSTM layer operation uses four matrix multiplications of the form , whereWdenotes the input weights andxtdenotes the layer input at time stept.
The LSTM projected layer operation instead uses multiplications of the from
, whereQiis anInputSize
-by-InputProjectorSize
matrix known as theinput projector. The layer uses the same projectorQifor each of the four multiplications.
To perform the four multiplications of the form
, an LSTM layer stores four weight matricesW, which necessitates storing4*NumHiddenUnits*InputSize
learnable parameters. By instead storing the4*NumHiddenUnits
-by-InputProjectorSize
matrix
andQi, an LSTM projected layer can perform the multiplication
and store only(4*NumHiddenUnits+InputSize)*InputProjectorSize
learnable parameters.
Tip
To ensure that
requires fewer learnable parameters, set theInputProjectorSize
property to a value less than(4*numHiddenUnits*inputSize)/(4*numHiddenUnits+inputSize)
.
Data Types:single
|double
|int8
|int16
|int32
|int64
|uint8
|uint16
|uint32
|uint64
OutputMode
—Output mode
'sequence'
(default) |'last'
This property is read-only.
Output mode, specified as one of these values:
'sequence'
— Output the complete sequence.'last'
— Output the last time step of the sequence.
HasStateInputs
—Flag for state inputs to layer
0
(false)(default) |1
(true)
This property is read-only.
Flag for state inputs to the layer, specified as0
(虚假的)或1
(true).
If theHasStateInputs
property is0
(false
), then the layer has one input with the name'in'
, which corresponds to the input data. In this case, the layer uses theHiddenState
andCellState
properties for the layer operation.
If theHasStateInputs
property is1
(true
), then the layer has three inputs with the names'in'
,'hidden'
, and'cell'
, which correspond to the input data, hidden state, and cell state, respectively. In this case, the layer uses the values passed to these inputs for the layer operation. IfHasStateInputs
is1
(true
), then theHiddenState
andCellState
properties must be empty.
HasStateOutputs
—Flag for state outputs from layer
0
(false)(default) |1
(true)
This property is read-only.
Flag for state outputs from the layer, specified as0
(虚假的)或1
(true).
If theHasStateOutputs
property is0
(false), then the layer has one output with the name'out'
, which corresponds to the output data.
If theHasStateOutputs
property is1
(true), then the layer has three outputs with the names'out'
,'hidden'
, and'cell'
, which correspond to the output data, hidden state, and cell state, respectively. In this case, the layer also outputs the state values that it computes.
InputSize
—Input size
'auto'
(default) |positive integer
This property is read-only.
Input size, specified as a positive integer or'auto'
. IfInputSize
is'auto'
, then the software automatically assigns the input size at training time.
Data Types:double
|char
Activations
StateActivationFunction
—Activation function to update cell and hidden state
'tanh'
(default) |'softsign'
This property is read-only.
Activation function to update the cell and hidden state, specified as one of these values:
'tanh'
— Use the hyperbolic tangent function (tanh).'softsign'
— Use the softsign function .
The layer uses this option as the function in the calculations to update the cell and hidden state. For more information on how an LSTM layer uses activation functions, seeLong Short-Term Memory Layer.
GateActivationFunction
—Activation function to apply to gates
'sigmoid'
(default) |'hard-sigmoid'
This property is read-only.
Activation function to apply to the gates, specified as one of these values:
'sigmoid'
— Use the sigmoid function .'hard-sigmoid'
— Use the hard sigmoid function
The layer uses this option as the function in the calculations for the layer gates.
State
CellState
—Cell state
[]
(default) |numeric vector
Cell state to use in the layer operation, specified as aNumHiddenUnits
1数字向量。这个值对应于the initial cell state when data is passed to the layer.
After you set this property manually, calls to theresetState
function set the cell state to this value.
IfHasStateInputs
is1
(true), then theCellState
property must be empty.
Data Types:single
|double
HiddenState
—Hidden state
[]
(default) |numeric vector
Hidden state to use in the layer operation, specified as aNumHiddenUnits
1数字向量。这个值对应于the initial hidden state when data is passed to the layer.
After you set this property manually, calls to theresetState
function set the hidden state to this value.
IfHasStateInputs
is1
(true), then theHiddenState
property must be empty.
Data Types:single
|double
Parameters and Initialization
InputWeightsInitializer
—Function to initialize input weights
'glorot'
(default) |'he'
|'orthogonal'
|'narrow-normal'
|'zeros'
|'ones'
|function handle
Function to initialize the input weights, specified as one of these values:
'glorot'
— Initialize the input weights with the Glorot initializer[1](also known as the Xavier initializer). The Glorot initializer independently samples from a uniform distribution with zero mean and a variance of2/(InputProjectorSize + numOut)
, wherenumOut = 4*NumHiddenUnits
.'he'
— Initialize the input weights with the He initializer[2]. The He initializer samples from a normal distribution with zero mean and a variance of2/InputProjectorSize
.'orthogonal'
— Initialize the input weights withQ, the orthogonal matrix in the QR decomposition ofZ=QRfor a random matrixZsampled from a unit normal distribution[3].'narrow-normal'
— Initialize the input weights by independently sampling from a normal distribution with zero mean and a standard deviation of 0.01.'zeros'
— Initialize the input weights with zeros.'ones'
— Initialize the input weights with ones.Function handle — Initialize the input weights with a custom function. If you specify a function handle, then the function must be of the form
weights = func(sz)
, wheresz
is the size of the input weights.
The layer only initializes the input weights when theInputWeights
property is empty.
Data Types:char
|string
|function_handle
RecurrentWeightsInitializer
—Function to initialize recurrent weights
'orthogonal'
(default) |'glorot'
|'he'
|'narrow-normal'
|'zeros'
|'ones'
|function handle
Function to initialize the recurrent weights, specified as one of the following:
'orthogonal'
— Initialize the recurrent weights withQ, the orthogonal matrix in the QR decomposition ofZ=QRfor a random matrixZsampled from a unit normal distribution[3].'glorot'
— Initialize the recurrent weights with the Glorot initializer[1](also known as the Xavier initializer). The Glorot initializer independently samples from a uniform distribution with zero mean and a variance of2/(numIn + numOut)
, wherenumIn = OutputProjectorSize
andnumOut = 4*NumHiddenUnits
.'he'
— Initialize the recurrent weights with the He initializer[2]. The He initializer samples from a normal distribution with zero mean and a variance of2/OutputProjectorSize
.'narrow-normal'
— Initialize the recurrent weights by independently sampling from a normal distribution with zero mean and a standard deviation of 0.01.'zeros'
— Initialize the recurrent weights with zeros.'ones'
— Initialize the recurrent weights with ones.Function handle — Initialize the recurrent weights with a custom function. If you specify a function handle, then the function must be of the form
weights = func(sz)
, wheresz
is the size of the recurrent weights.
The layer only initializes the recurrent weights when theRecurrentWeights
property is empty.
Data Types:char
|string
|function_handle
InputProjectorInitializer
—Function to initialize input projector
'orthogonal'
(default) |'glorot'
|'he'
|'narrow-normal'
|'zeros'
|'ones'
|function handle
Function to initialize the input projector, specified as one of the following:
'orthogonal'
— Initialize the input projector withQ, the orthogonal matrix in the QR decomposition ofZ=QRfor a random matrixZsampled from a unit normal distribution[3].'glorot'
— Initialize the input projector with the Glorot initializer[1](also known as the Xavier initializer). The Glorot initializer independently samples from a uniform distribution with zero mean and a variance of2/(InputSize + inputProjectorSize)
.'he'
— Initialize the input projector with the He initializer[2]. The He initializer samples from a normal distribution with zero mean and a variance of2/InputSize
.'narrow-normal'
— Initialize the input projector by independently sampling from a normal distribution with zero mean and a standard deviation of 0.01.'zeros'
— Initialize the input weights with zeros.'ones'
— Initialize the input weights with ones.Function handle — Initialize the input projector with a custom function. If you specify a function handle, then the function must be of the form
weights = func(sz)
, wheresz
is the size of the input projector.
The layer only initializes the input projector when theInputProjector
property is empty.
Data Types:char
|string
|function_handle
OutputProjectorInitializer
—Function to initialize output projector
'orthogonal'
(default) |'glorot'
|'he'
|'narrow-normal'
|'zeros'
|'ones'
|function handle
Function to initialize the output projector, specified as one of the following:
'orthogonal'
— Initialize the output projector withQ, the orthogonal matrix in the QR decomposition ofZ=QRfor a random matrixZsampled from a unit normal distribution[3].'glorot'
— Initialize the output projector with the Glorot initializer[1](also known as the Xavier initializer). The Glorot initializer independently samples from a uniform distribution with zero mean and a variance of2/(NumHiddenUnits + OutputProjectorSize)
.'he'
— Initialize the output projector with the He initializer[2]. The He initializer samples from a normal distribution with zero mean and a variance of2/NumHiddenUnits
.'narrow-normal'
— Initialize the output projector by independently sampling from a normal distribution with zero mean and a standard deviation of 0.01.'zeros'
— Initialize the output projector with zeros.'ones'
— Initialize the output projector with ones.Function handle — Initialize the output projector with a custom function. If you specify a function handle, then the function must be of the form
weights = func(sz)
, wheresz
is the size of the output projector.
The layer only initializes the output projector when theOutputProjector
property is empty.
Data Types:char
|string
|function_handle
BiasInitializer
—Function to initialize bias
'unit-forget-gate'
(default) |'narrow-normal'
|'ones'
|function handle
Function to initialize the bias, specified as one of these values:
'unit-forget-gate'
— Initialize the forget gate bias with ones and the remaining biases with zeros.'narrow-normal'
— Initialize the bias by independently sampling from a normal distribution with zero mean and a standard deviation of 0.01.'ones'
— Initialize the bias with ones.Function handle — Initialize the bias with a custom function. If you specify a function handle, then the function must be of the form
bias = func(sz)
, wheresz
is the size of the bias.
The layer only initializes the bias when theBias
property is empty.
Data Types:char
|string
|function_handle
InputWeights
—Input weights
[]
(default) |matrix
Input weights, specified as a matrix.
The input weight matrix is a concatenation of the four input weight matrices for the components (gates) in the LSTM layer. The layer vertically concatenates the four matrices in this order:
Input gate
Forget gate
Cell candidate
Output gate
The input weights are learnable parameters. When you train a neural network using thetrainNetwork
function, ifInputWeights
is nonempty, then the software uses theInputWeights
property as the initial value. IfInputWeights
is empty, then the software uses the initializer specified byInputWeightsInitializer
.
At training time,InputWeights
is a4*NumHiddenUnits
-by-InputProjectorSize
matrix.
RecurrentWeights
—Recurrent weights
[]
(default) |matrix
Recurrent weights, specified as a matrix.
The recurrent weight matrix is a concatenation of the four recurrent weight matrices for the components (gates) in the LSTM layer. The layer vertically concatenates the four matrices in this order:
Input gate
Forget gate
Cell candidate
Output gate
复发性权重可学的参数。When you train an RNN using thetrainNetwork
function, ifRecurrentWeights
is nonempty, then the software uses theRecurrentWeights
property as the initial value. IfRecurrentWeights
is empty, then the software uses the initializer specified byRecurrentWeightsInitializer
.
At training time,RecurrentWeights
is a4*NumHiddenUnits
-by-OutputProjectorSize
matrix.
InputProjector
—Input projector
[]
(default) |matrix
Input projector, specified as a matrix.
The input projector weights are learnable parameters. When you train a network using thetrainNetwork
function, ifInputProjector
is nonempty, then the software uses theInputProjector
property as the initial value. IfInputProjector
is empty, then the software uses the initializer specified byInputProjectorInitializer
.
At training time,InputProjector
is aInputSize
-by-InputProjectorSize
matrix.
Data Types:single
|double
OutputProjector
—Output projector
[]
(default) |matrix
Output projector, specified as a matrix.
The output projector weights are learnable parameters. When you train a network using thetrainNetwork
function, ifOutputProjector
is nonempty, then the software uses theOutputProjector
property as the initial value. IfOutputProjector
is empty, then the software uses the initializer specified byOutputProjectorInitializer
.
At training time,OutputProjector
is aNumHiddenUnits
-by-OutputProjectorSize
matrix.
Data Types:single
|double
Bias
—Layer biases
[]
(default) |numeric vector
Layer biases, specified as a numeric vector.
The bias vector is a concatenation of the four bias vectors for the components (gates) in the layer. The layer vertically concatenates the four vectors in this order:
Input gate
Forget gate
Cell candidate
Output gate
The layer biases are learnable parameters. When you train a neural network, ifBias
is nonempty, thentrainNetwork
uses theBias
property as the initial value. IfBias
is empty, thentrainNetwork
uses the initializer specified byBiasInitializer
.
At training time,Bias
is a4*NumHiddenUnits
1数字向量。
Learning Rate and Regularization
InputWeightsLearnRateFactor
—Learning rate factor for input weights
1(default) |nonnegative scalar|1-by-4 numeric vector
Learning rate factor for the input weights, specified as a nonnegative scalar or a 1-by-4 numeric vector.
软件增加这个因素the global learning rate to determine the learning rate factor for the input weights of the layer. For example, ifInputWeightsLearnRateFactor
is2
, then the learning rate factor for the input weights of the layer is twice the current global learning rate. The software determines the global learning rate based on the settings you specify with thetrainingOptions
function.
To control the value of the learning rate factor for the four individual matrices inInputWeights
, specify a 1-by-4 vector. The entries ofInputWeightsLearnRateFactor
correspond to the learning rate factor of these components:
Input gate
Forget gate
Cell candidate
Output gate
To specify the same value for all the matrices, specify a nonnegative scalar.
Example:2
Example:[1 2 1 1]
RecurrentWeightsLearnRateFactor
—Learning rate factor for recurrent weights
1(default) |nonnegative scalar|1-by-4 numeric vector
Learning rate factor for the recurrent weights, specified as a nonnegative scalar or a 1-by-4 numeric vector.
软件增加这个因素the global learning rate to determine the learning rate for the recurrent weights of the layer. For example, ifRecurrentWeightsLearnRateFactor
is2
, then the learning rate for the recurrent weights of the layer is twice the current global learning rate. The software determines the global learning rate based on the settings you specify using thetrainingOptions
function.
To control the value of the learning rate factor for the four individual matrices inRecurrentWeights
, specify a 1-by-4 vector. The entries ofRecurrentWeightsLearnRateFactor
correspond to the learning rate factor of these components:
Input gate
Forget gate
Cell candidate
Output gate
To specify the same value for all the matrices, specify a nonnegative scalar.
Example:2
Example:[1 2 1 1]
InputProjectorLearnRateFactor
—Learning rate factor for input projector
1(default) |nonnegative scalar
Learning rate factor for the input projector, specified as a nonnegative scalar.
软件增加这个因素the global learning rate to determine the learning rate factor for the input projector of the layer. For example, ifInputProjectorLearnRateFactor
is2
, then the learning rate factor for the input projector of the layer is twice the current global learning rate. The software determines the global learning rate based on the settings you specify using thetrainingOptions
function.
OutputProjectorLearnRateFactor
—Learning rate factor for output projector
1(default) |nonnegative scalar
Learning rate factor for the output projector, specified as a nonnegative scalar.
软件增加这个因素the global learning rate to determine the learning rate factor for the output projector of the layer. For example, ifOutputProjectorLearnRateFactor
is2
, then the learning rate factor for the output projector of the layer is twice the current global learning rate. The software determines the global learning rate based on the settings you specify using thetrainingOptions
function.
BiasLearnRateFactor
—Learning rate factor for biases
1(default) |nonnegative scalar|1-by-4 numeric vector
Learning rate factor for the biases, specified as a nonnegative scalar or a 1-by-4 numeric vector.
软件增加这个因素the global learning rate to determine the learning rate for the biases in this layer. For example, ifBiasLearnRateFactor
is2
, then the learning rate for the biases in the layer is twice the current global learning rate. The software determines the global learning rate based on the settings you specify using thetrainingOptions
function.
To control the value of the learning rate factor for the four individual vectors inBias
, specify a 1-by-4 vector. The entries ofBiasLearnRateFactor
correspond to the learning rate factor of these components:
Input gate
Forget gate
Cell candidate
Output gate
To specify the same value for all the vectors, specify a nonnegative scalar.
Example:2
Example:[1 2 1 1]
InputWeightsL2Factor
—L2regularization factor for input weights
1(default) |nonnegative scalar|1-by-4 numeric vector
L2regularization factor for the input weights, specified as a nonnegative scalar or a 1-by-4 numeric vector.
软件增加这个因素the globalL2regularization factor to determine theL2regularization factor for the input weights of the layer. For example, ifInputWeightsL2Factor
is2
, then theL2regularization factor for the input weights of the layer is twice the current globalL2regularization factor. The software determines theL2regularization factor based on the settings you specify using thetrainingOptions
function.
To control the value of theL2regularization factor for the four individual matrices inInputWeights
, specify a 1-by-4 vector. The entries ofInputWeightsL2Factor
correspond to theL2regularization factor of these components:
Input gate
Forget gate
Cell candidate
Output gate
To specify the same value for all the matrices, specify a nonnegative scalar.
Example:2
Example:[1 2 1 1]
RecurrentWeightsL2Factor
—L2regularization factor for recurrent weights
1(default) |nonnegative scalar|1-by-4 numeric vector
L2regularization factor for the recurrent weights, specified as a nonnegative scalar or a 1-by-4 numeric vector.
软件增加这个因素the globalL2regularization factor to determine theL2regularization factor for the recurrent weights of the layer. For example, ifRecurrentWeightsL2Factor
is2
, then theL2regularization factor for the recurrent weights of the layer is twice the current globalL2regularization factor. The software determines theL2regularization factor based on the settings you specify using thetrainingOptions
function.
To control the value of theL2regularization factor for the four individual matrices inRecurrentWeights
, specify a 1-by-4 vector. The entries ofRecurrentWeightsL2Factor
correspond to theL2regularization factor of these components:
Input gate
Forget gate
Cell candidate
Output gate
To specify the same value for all the matrices, specify a nonnegative scalar.
Example:2
Example:[1 2 1 1]
InputProjectorL2Factor
—L2regularization factor for input projector
1(default) |nonnegative scalar
L2regularization factor for the input projector, specified as a nonnegative scalar.
软件增加这个因素the globalL2regularization factor to determine theL2regularization factor for the input projector of the layer. For example, ifInputProjectorL2Factor
is2
, then theL2regularization factor for the input projector of the layer is twice the current globalL2regularization factor. The software determines the globalL2regularization factor based on the settings you specify using thetrainingOptions
function.
OutputProjectorL2Factor
—L2regularization factor for output projector
1(default) |nonnegative scalar
L2regularization factor for the output projector, specified as a nonnegative scalar.
软件增加这个因素the globalL2regularization factor to determine theL2regularization factor for the output projector of the layer. For example, ifOutputProjectorL2Factor
is2
, then theL2regularization factor for the output projector of the layer is twice the current globalL2regularization factor. The software determines the globalL2regularization factor based on the settings you specify using thetrainingOptions
function.
BiasL2Factor
—L2regularization factor for biases
0(default) |nonnegative scalar|1-by-4 numeric vector
L2regularization factor for the biases, specified as a nonnegative scalar or a 1-by-4 numeric vector.
软件增加这个因素the globalL2regularization factor to determine theL2regularization for the biases in this layer. For example, ifBiasL2Factor
is2
, then theL2regularization for the biases in this layer is twice the globalL2regularization factor. The software determines the globalL2regularization factor based on the settings you specify using thetrainingOptions
function.
To control the value of theL2regularization factor for the four individual vectors inBias
, specify a 1-by-4 vector. The entries ofBiasL2Factor
correspond to theL2regularization factor of these components:
Input gate
Forget gate
Cell candidate
Output gate
To specify the same value for all the vectors, specify a nonnegative scalar.
Example:2
Example:[1 2 1 1]
Layer
Name
—Layer name
''
(default) |character vector|string scalar
Layer name, specified as a character vector or a string scalar. ForLayer
array input, thetrainNetwork
,assembleNetwork
,layerGraph
, anddlnetwork
functions automatically assign names to layers with the name''
.
Data Types:char
|string
NumInputs
—Number of inputs
1
|3
This property is read-only.
Number of inputs to the layer.
If theHasStateInputs
property is0
(false
), then the layer has one input with the name'in'
, which corresponds to the input data. In this case, the layer uses theHiddenState
andCellState
properties for the layer operation.
If theHasStateInputs
property is1
(true
), then the layer has three inputs with the names'in'
,'hidden'
, and'cell'
, which correspond to the input data, hidden state, and cell state, respectively. In this case, the layer uses the values passed to these inputs for the layer operation. IfHasStateInputs
is1
(true
), then theHiddenState
andCellState
properties must be empty.
Data Types:double
InputNames
—Input names
{'in'}
|{'in','hidden','cell'}
This property is read-only.
Input names of the layer.
If theHasStateInputs
property is0
(false
), then the layer has one input with the name'in'
, which corresponds to the input data. In this case, the layer uses theHiddenState
andCellState
properties for the layer operation.
If theHasStateInputs
property is1
(true
), then the layer has three inputs with the names'in'
,'hidden'
, and'cell'
, which correspond to the input data, hidden state, and cell state, respectively. In this case, the layer uses the values passed to these inputs for the layer operation. IfHasStateInputs
is1
(true
), then theHiddenState
andCellState
properties must be empty.
NumOutputs
—Number of outputs
1
|3
This property is read-only.
Number of outputs to the layer.
If theHasStateOutputs
property is0
(false), then the layer has one output with the name'out'
, which corresponds to the output data.
If theHasStateOutputs
property is1
(true), then the layer has three outputs with the names'out'
,'hidden'
, and'cell'
, which correspond to the output data, hidden state, and cell state, respectively. In this case, the layer also outputs the state values that it computes.
Data Types:double
OutputNames
—Output names
{'out'}
|{'out','hidden','cell'}
This property is read-only.
Output names of the layer.
If theHasStateOutputs
property is0
(false), then the layer has one output with the name'out'
, which corresponds to the output data.
If theHasStateOutputs
property is1
(true), then the layer has three outputs with the names'out'
,'hidden'
, and'cell'
, which correspond to the output data, hidden state, and cell state, respectively. In this case, the layer also outputs the state values that it computes.
Examples
Create LSTM Projected Layer
Create an LSTM projected layer with 100 hidden units, an output projector size of 30, an input projector size of 16, and the name"lstmp"
.
layer = lstmProjectedLayer(100,30,16,Name="lstmp")
layer = LSTMProjectedLayer with properties: Name: 'lstmp' InputNames: {'in'} OutputNames: {'out'} NumInputs: 1 NumOutputs: 1 HasStateInputs: 0 HasStateOutputs: 0 Hyperparameters InputSize: 'auto' NumHiddenUnits: 100 InputProjectorSize: 16 OutputProjectorSize: 30 OutputMode: 'sequence' StateActivationFunction: 'tanh' GateActivationFunction: 'sigmoid' Learnable Parameters InputWeights: [] RecurrentWeights: [] Bias: [] InputProjector: [] OutputProjector: [] State Parameters HiddenState: [] CellState: [] Show all properties
Include an LSTM projected layer in a layer array.
inputSize = 12;numHiddenUnits = 100;outputProjectorSize = max(1,floor(0.75*numHiddenUnits)); inputProjectorSize = max(1,floor(0.25*inputSize)); layers = [ sequenceInputLayer(inputSize) lstmProjectedLayer(numHiddenUnits,outputProjectorSize,inputProjectorSize) fullyConnectedLayer(10) softmaxLayer classificationLayer];
Compare Network Projection Sizes
比较网络的大小和不contain projected layers.
Define an LSTM network architecture. Specify the input size as 12, which corresponds to the number of features of the input data. Configure an LSTM layer with 100 hidden units that outputs the last element of the sequence. Finally, specify nine classes by including a fully connected layer of size 9, followed by a softmax layer and a classification layer.
inputSize = 12;numHiddenUnits = 100;numClasses = 9; layers = [...sequenceInputLayer(inputSize) lstmLayer(numHiddenUnits,OutputMode="last") fullyConnectedLayer(numClasses) softmaxLayer classificationLayer]
layers = 5x1 Layer array with layers: 1 '' Sequence Input Sequence input with 12 dimensions 2 '' LSTM LSTM with 100 hidden units 3 '' Fully Connected 9 fully connected layer 4 '' Softmax softmax 5 '' Classification Output crossentropyex
Analyze the network using theanalyzeNetwork
function. The network has approximately 46,100 learnable parameters.
analyzeNetwork(layers)
Create an identical network with an LSTM projected layer in place of the LSTM layer.
For the LSTM projected layer:
Specify the same number of hidden units as the LSTM layer
Specify an output projector size of 25% of the number of hidden units.
Specify an input projector size of 75% of the input size.
Ensure that the output and input projector sizes are positive by taking the maximum of the sizes and 1.
outputProjectorSize = max(1,floor(0.25*numHiddenUnits)); inputProjectorSize = max(1,floor(0.75*inputSize)); layersProjected = [...sequenceInputLayer(inputSize) lstmProjectedLayer(numHiddenUnits,outputProjectorSize,inputProjectorSize,OutputMode="last") fullyConnectedLayer(numClasses) softmaxLayer classificationLayer];
Analyze the network using theanalyzeNetwork
function. The network has approximately 17,500 learnable parameters, which is a reduction of more than half. The sizes of the learnable parameters of the layers following the projected layer have the same sizes as the network without the LSTM projected layer. Reducing the number of learnable parameters by projecting an LSTM layer rather than reducing the number of hidden units of the LSTM layer maintains the output size of the layer and, in turn, the sizes of the downstream layers, which can result in better prediction accuracy.
analyzeNetwork(layersProjected)
Algorithms
Long Short-Term Memory Layer
An LSTM layer is an RNN layer that learns long-term dependencies between time steps in time series and sequence data.
The state of the layer consists of thehidden state(also known as theoutput state) and thecell state. The hidden state at time steptcontains the output of the LSTM layer for this time step. The cell state contains information learned from the previous time steps. At each time step, the layer adds information to or removes information from the cell state. The layer controls these updates usinggates.
These components control the cell state and hidden state of the layer.
Component | Purpose |
---|---|
Input gate (i) | Control level of cell state update |
Forget gate (f) | Control level of cell state reset (forget) |
Cell candidate (g) | Add information to cell state |
Output gate (o) | Control level of cell state added to hidden state |
This diagram illustrates the flow of data at time stept. This diagram shows how the gates forget, update, and output the cell and hidden states.
可学的权重LSTM层的输入ut weightsW(InputWeights
), the recurrent weightsR(RecurrentWeights
), and the biasb(Bias
). The matricesW,R, andbare concatenations of the input weights, the recurrent weights, and the bias of each component, respectively. The layer concatenates the matrices according to these equations:
wherei,f,g, andodenote the input gate, forget gate, cell candidate, and output gate, respectively.
The cell state at time steptis given by
where denotes the Hadamard product (element-wise multiplication of vectors).
The hidden state at time steptis given by
where
denotes the state activation function. By default, thelstmLayer
function uses the hyperbolic tangent function (tanh) to compute the state activation function.
These formulas describe the components at time stept.
Component | Formula |
---|---|
Input gate | |
Forget gate | |
Cell candidate | |
Output gate |
In these calculations,
denotes the gate activation function. By default, thelstmLayer
function, uses the sigmoid function, given by
, to compute the gate activation function.
LSTM Projected Layer
An LSTM projected layer is an RNN layer that learns long-term dependencies between time steps in time series and sequence data using projected learnable weights.
To compress a deep learning network, you can useprojected layers. A projected layer is a type of deep learning layer that enables compression by reducing the number of stored learnable parameters. The layer introduces learnable projector matricesQ, replaces multiplications of the form , whereWis a learnable matrix, with the multiplication , and storesQand instead of storingW. Projectingxinto a lower-dimensional space usingQtypically requires less memory to store the learnable parameters and can have similarly strong prediction accuracy.
Reducing the number of learnable parameters by projecting an LSTM layer rather than reducing the number of hidden units of the LSTM layer maintains the output size of the layer and, in turn, the sizes of the downstream layers, which can result in better prediction accuracy.
The LSTM layer operation uses four matrix multiplications of the form , whereRdenotes the recurrent weights andhtdenotes the hidden state (or, equivalently, the layer output) at time stept.
The LSTM projected layer operation instead uses multiplications of the from
, whereQois aNumHiddenUnits
-by-OutputProjectorSize
matrix known as theoutput projector. The layer uses the same projectorQofor each of the four multiplications.
To perform the four multiplications of the form
, an LSTM layer stores four recurrent weightsR, which necessitates storing4*NumHiddenUnits^2
learnable parameters. By instead storing the4*NumHiddenUnits
-by-OutputProjectorSize
matrix
andQo, an LSTM projected layer can perform the multiplication
and store only5*NumHiddenUnits*OutputProjectorSize
learnable parameters.
The LSTM layer operation uses four matrix multiplications of the form , whereWdenotes the input weights andxtdenotes the layer input at time stept.
The LSTM projected layer operation instead uses multiplications of the from
, whereQiis anInputSize
-by-InputProjectorSize
matrix known as theinput projector. The layer uses the same projectorQifor each of the four multiplications.
To perform the four multiplications of the form
, an LSTM layer stores four weight matricesW, which necessitates storing4*NumHiddenUnits*InputSize
learnable parameters. By instead storing the4*NumHiddenUnits
-by-InputProjectorSize
matrix
andQi, an LSTM projected layer can perform the multiplication
and store only(4*NumHiddenUnits+InputSize)*InputProjectorSize
learnable parameters.
Layer Input and Output Formats
Layers in a layer array or layer graph pass data to subsequent layers as formatteddlarray
objects. The format of adlarray
object is a string of characters, in which each character describes the corresponding dimension of the data. The formats consists of one or more of these characters:
"S"
— Spatial"C"
— Channel"B"
— Batch"T"
— Time"U"
— Unspecified
For example, 2-D image data represented as a 4-D array, where the first two dimensions correspond to the spatial dimensions of the images, the third dimension corresponds to the channels of the images, and the fourth dimension corresponds to the batch dimension, can be described as having the format"SSCB"
(spatial, spatial, channel, batch).
You can interact with thesedlarray
objects in automatic differentiation workflows such as developing a custom layer, using afunctionLayer
object, or using theforward
andpredict
functions withdlnetwork
objects.
This table shows the supported input formats ofLSTMProjectedLayer
objects and the corresponding output format. If the output of the layer is passed to a custom layer that does not inherit from thennet.layer.Formattable
class, or aFunctionLayer
object with theFormattable
property set to0
(false), then the layer receives an unformatteddlarray
object with dimensions ordered corresponding to the formats in this table.
Input Format | OutputMode |
Output Format |
---|---|---|
|
"sequence" |
|
"last" |
||
|
"sequence" |
|
"last" |
|
|
|
"sequence" |
|
"last" |
Indlnetwork
objects,LSTMProjectedLayer
objects also support these input and output format combinations.
Input Format | OutputMode |
Output Format |
---|---|---|
|
"sequence" |
|
"last" |
||
|
"sequence" |
|
"last" |
||
|
"sequence" |
|
"last" |
||
|
"sequence" |
|
"last" |
|
|
|
"sequence" |
|
"last" |
|
|
|
"sequence" |
|
"last" |
|
|
|
"sequence" |
|
"last" |
||
|
"sequence" |
|
"last" |
||
|
"sequence" |
|
"last" |
||
|
"sequence" |
|
"last" |
|
|
|
"sequence" |
|
"last" |
|
|
|
"sequence" |
|
"last" |
|
|
|
"sequence" |
|
"last" |
|
|
|
"sequence" |
|
"last" |
||
|
"sequence" |
|
"last" |
||
|
"sequence" |
|
"last" |
|
|
|
"sequence" |
|
"last" |
|
|
|
"sequence" |
|
"last" |
|
|
|
"sequence" |
|
"last" |
|
To use these input formats intrainNetwork
workflows, convert the data to"CB"
(channel, batch) or"CBT"
(channel, batch, time) format usingflattenLayer
.
If theHasStateInputs
property is1
(true), then the layer has two additional inputs with the names"hidden"
and"cell"
, which correspond to the hidden state and cell state, respectively. These additional inputs expect input format"CB"
(channel, batch).
If theHasStateOutputs
property is1
(true), then the layer has two additional outputs with names"hidden"
and"cell"
, which correspond to the hidden state and cell state, respectively. These additional outputs have output format"CB"
(channel, batch).
References
[1] Glorot, Xavier, and Yoshua Bengio. "Understanding the Difficulty of Training Deep Feedforward Neural Networks." InProceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 249–356. Sardinia, Italy: AISTATS, 2010.https://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf
[2] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification." InProceedings of the 2015 IEEE International Conference on Computer Vision, 1026–1034. Washington, DC: IEEE Computer Vision Society, 2015.https://doi.org/10.1109/ICCV.2015.123
[3] Saxe, Andrew M., James L. McClelland, and Surya Ganguli. "Exact solutions to the nonlinear dynamics of learning in deep linear neural networks."arXiv preprint arXiv:1312.6120(2013).
Extended Capabilities
C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.
Usage notes and limitations:
LSTM projected layer objects support generic C and C++ code generation only.
Version History
Introduced in R2022b
See Also
trainingOptions
|trainNetwork
|sequenceInputLayer
|lstmLayer
|bilstmLayer
|gruLayer
|convolution1dLayer
|neuronPCA
|compressNetworkUsingProjection
Topics
- Train Network with LSTM Projected Layer
- Compress Neural Network Using Projection
- Sequence Classification Using Deep Learning
- Sequence Classification Using 1-D Convolutions
- Time Series Forecasting Using Deep Learning
- Sequence-to-Sequence Classification Using Deep Learning
- Sequence-to-Sequence Regression Using Deep Learning
- Sequence-to-One Regression Using Deep Learning
- Classify Videos Using Deep Learning
- Long Short-Term Memory Neural Networks
- List of Deep Learning Layers
- Deep Learning Tips and Tricks
MATLAB 명령
다음 MATLAB 명령에 해당하는 링크를 클릭했습니다.
명령을 실행하려면 MATLAB 명령 창에 입력하십시오. 웹 브라우저는 MATLAB 명령을 지원하지 않습니다.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select:.
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina(Español)
- Canada(English)
- United States(English)
Europe
- Belgium(English)
- Denmark(English)
- Deutschland(Deutsch)
- España(Español)
- Finland(English)
- France(Français)
- Ireland(English)
- Italia(Italiano)
- Luxembourg(English)
- Netherlands(English)
- Norway(English)
- Österreich(Deutsch)
- Portugal(English)
- Sweden(English)
- Switzerland
- United Kingdom(English)