Supported Networks, Layers, Boards, and Tools

Supported Pretrained Networks

Deep Learning HDL Toolbox™ supports code generation for series convolutional neural networks (CNNs or ConvNets). You can generate code for any trained CNN whose computational layers are supported for code generation. For a full list, seeSupported Layers. You can use one of the pretrained networks listed in the table to generate code for your target Intel^®or Xilinx^®FPGA boards.

Network	Network Description	Type	Single Data Type (with Shipping Bitstreams)			INT8 data type (with Shipping Bitstreams)			Application Area
			ZCU102	ZC706	Arria10 SoC	ZCU102	ZC706	Arria10 SoC	Classification
AlexNet	AlexNet convolutional neural network.	Series Network	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	Classification
LogoNet	Logo recognition network (LogoNet) is a MATLAB^®developed logo identification network. For more information, seeLogo Recognition Network.	Series Network	Yes	Yes	Yes	Yes	Yes	Yes	Classification
DigitsNet	Digit classification network. SeeCreate Simple Deep Learning Neural Network for Classification	Series Network	Yes	Yes	Yes	Yes	Yes	Yes	Classification
Lane detection	LaneNet convolutional neural network. For more information, seeDeploy Transfer Learning Network for Lane Detection.	Series Network	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	Classification
VGG-16	VGG-16 convolutional neural network. For the pretrained VGG-16 model, see`vgg16`.	Series Network	No. Network exceeds PL DDR memory size	No. Network exceeds FC module memory size.	Yes	Yes	No. Network exceeds FC module memory size.	Yes	Classification
VGG-19	VGG-19 convolutional neural network. For the pretrained VGG-19 model, see`vgg19`.	Series Network	No. Network exceeds PL DDR memory size	No. Network exceeds FC module memory size.	Yes	Yes	No. Network exceeds FC module memory size.	Yes	Classification
Darknet-19	Darknet-19 convolutional neural network. For the pretrained darknet-19 model, see`darknet19`.	Series Network	Yes	Yes	Yes	Yes	Yes	Yes	Classification
Radar Classification	Convolutional neural network that uses micro-Doppler signatures to identify and classify the object. For more information, seeBicyclist and Pedestrian Classification by Using FPGA.	Series Network	Yes	Yes	Yes	Yes	Yes	Yes	Classification and Software Defined Radio (SDR)
Defect Detection`snet_defnet`	`snet_defnet`is a custom AlexNet network used to identify and classify defects. For more information, seeDefect Detection.	Series Network	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	Classification
Defect Detection`snet_blemdetnet`	`snet_blemdetnet`is a custom convolutional neural network used to identify and classify defects. For more information, seeDefect Detection.	Series Network	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	Classification
DarkNet-53	Darknet-53 convolutional neural network. For the pretrained DarkNet-53 model, see`darknet53`.	Directed acyclic graph (DAG) network based	Yes	Yes	Yes	Yes	Yes	No	Classification
ResNet-18	ResNet-18 convolutional neural network. For the pretrained ResNet-18 model, see`resnet18`.	Directed acyclic graph (DAG) network based	Yes	Yes	Yes	Yes	Yes	Yes	Classification
ResNet-50	ResNet-50 convolutional neural network. For the pretrained ResNet-50 model, see`resnet50`.	Directed acyclic graph (DAG) network based	No. Network exceeds PL DDR memory size.	No. Network exceeds PL DDR memory size.	Yes	Yes	Yes	Yes	Classification
ResNet-based YOLO v2	You only look once (YOLO) is an object detector that decodes the predictions from a convolutional neural network and generates bounding boxes around the objects. For more information, seeVehicle Detection Using DAG Network Based YOLO v2 Deployed to FPGA.	Directed acyclic graph (DAG) network based	Yes	Yes	Yes	Yes	Yes	Yes	Object detection
MobileNetV2	MobileNet-v2卷积神经网络。为pretrained MobileNet-v2 model, see`mobilenetv2`.	Directed acyclic graph (DAG) network based	Yes	Yes	Yes	Yes	Yes	Yes	Classification
GoogLeNet	GoogLeNet convolutional neural network. For the pretrained GoogLeNet model, see`googlenet`.	Directed acyclic graph (DAG) network based	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	No. To use the bitstream, enable the`LRNBlockGeneration`property of the processor configuration for the bitstream and generate the bitstream again.	Classification
PoseNet	Human pose estimation network.	Directed acyclic graph (DAG) network based	Yes.	Yes	Yes	Yes	Yes	Yes	Segmentation
U-Net	U-Net convolutional neural network designed for semantic image segmentation.	Directed acyclic graph (DAG) network based	No. PL DDR memory oversize.	No. PL DDR memory oversize.	No. PL DDR memory oversize.	No. PL DDR memory oversize.	No. PL DDR memory oversize.	Yes	Segmentation
SqueezeNet-based YOLO v3	The you-only-look-once (YOLO) v3 object detector is a multi-scale object detection network that uses a feature extraction network and multiple detection heads to make predictions at multiple scales.	`dlnetwork`object	Yes	Yes	No	No	No	No	Object detection
Sequence-to-sequence classification	Classify each time step of sequence data using a long short-term memory (LSTM) network. SeeRun Sequence-to-Sequence Classification on FPGAs by Using Deep Learning HDL Toolbox.	Long short-term memory (LSTM) network	Yes	Yes	No	No	No	No	Sequence data classification
Time series forecasting	预测时间序列数据using a long short-term memory (LSTM) network. SeeRun Sequence Forecasting on FPGA by Using Deep Learning HDL Toolbox	Long short-term memory (LSTM) network	Yes	Yes	No	No	No	No	预测时间序列数据
Word-by-word text generation	Generate text word-by-word by using a long short-term memory (LSTM) network. SeeGenerate Word-By-Word Text on FPGAs by Using Deep Learning HDL Toolbox.	Long short-term memory (LSTM) network	Yes	Yes	No	No	No	No	Sequence data prediction
YAMNet	Pretrained audio classification network. See`yamnet`(Audio Toolbox)andDeploy YAMNet Networks to FPGAs With and Without Cross-Layer Equalization.	Series Network	Yes	Yes	Yes	Yes	Yes	Yes	Audio data classification
Semantic Segmentation Using Dilated Convolutions	Semantic segmentation using dilated convolution layer to increase coverage area without increasing the number of computational parameters. SeeDeploy Semantic Segmentation Network Using Dilated Convolutions on FPGA.	Series Network	Yes	Yes	Yes	Yes	Yes	Yes	Segmentation
Time series forecasting	预测时间序列数据using a long short-term memory (LSTM) network. SeeRun Sequence Forecasting Using a GRU Layer on an FPGA.	Gated recurrent unit (GRU) layer network	Yes	Yes	No	No	No	No	预测时间序列数据
Pruned image classification network	Pruned image classification network. SeeDeploy Image Recognition Network on FPGA With and Without Pruning	Series network	Yes	Yes	Yes	Yes	Yes	Yes	Image classification
Very-deep super-resolution (VDSR) network	Create high resolution images from low-resolution images by using VDSR networks. SeeIncrease Image Resolution Using VDSR Network Running on FPGA	Series network	Yes	Yes	Yes	Yes	Yes	Yes	Image processing

Supported Layers

Deep Learning HDL Toolbox supports the layers listed in these tables.

Input Layers

Layer	Layer Type Hardware (HW) or Software(SW)	Description and Limitations	INT8 Compatible
`imageInputLayer`	SW	An image input layer inputs 2-D images to a network and applies data normalization. The normalization options`zero-center`and`zscore`can run on hardware if the`compile`method`HardwareNormalization`argument is enabled and the input data is of`single`data type. If the`HardwareNormalization`option is not enabled or the input data type is`int8`the normalization runs in software. Normalization specified using a function handle is not supported. SeeImage Input Layer Normalization Hardware Implementation. When the`Normalization`property is set to`none`the`activations`function cannot be used for the`imageInputLayer`.	Yes. Runs as single datatype in SW.
`featureInputLayer`	SW	A feature input layer inputs feature data to a network and applies data normalization.	No
`sequenceInputLayer`	SW	A sequence input layer inputs sequence data to a network.	No

Convolution and Fully Connected Layers

Layer	Layer Type Hardware (HW) or Software(SW)	Layer Output Format	Description and Limitations	INT8 Compatible
`convolution2dLayer`	HW	Convolution (Conv)	A 2-D convolutional layer applies sliding convolutional filters to the input. When generating code for a network using this layer, these limitations apply: Filter size must be 1-66. Stride size must be 1-15 and square. Padding size must be in the range 0-8. Dilation factor supported up to [16 16] and must be square. Padding value is not supported. When the dilation factor is a multiple of three the calculated dilated filter size must have a maximum value of the existing convolution filter size limit. In all other cases, the filter size can be as large as the maximum value of the existing convolution filter size.	Yes
`groupedConvolution2dLayer`	HW	Convolution (Conv)	A 2-D grouped convolutional layer separates the input channels into groups and applies sliding convolutional filters. Use grouped convolutional layers for channel-wise separable (also known as depth-wise separable) convolution. Code generation is now supported for a 2-D grouped convolution layer that has the`NumGroups`property set as`'channel-wise'`. When generating code for a network using this layer, these limitations apply: Filter size must be 1-15 and square. For example [1 1] or [14 14]. When the`NumGroups`is set as`'channel-wise'`, filter size must be 3-14. Stride size must be 1-15 and square. Padding size must be in the range 0-8. Dilation factor must be [1 1]. When the`NumGroups`is not set as`'channel-wise'`, number of groups must be 1 or 2. The input feature number must be greater than a single multiple of the square root of the`ConvThreadNumber`. When the`NumGroups`is not set as`'channel-wise'`, the number of filters per group must be a multiple of the square root of the`ConvThreadNumber`.	Yes
`transposedConv2dLayer`	HW	Convolution (Conv)	A transposed 2-D convolution layer upsamples feature maps. When generating code for a network using this layer, these limitations apply: Filter size must be 1-8 and square. Stride size must be 1-66 and square. Padding size must be in the range 0-8. Padding value is not supported.	Yes
`fullyConnectedLayer`	HW	Fully Connected (FC)	A fully connected layer multiplies the input by a weight matrix, and then adds a bias vector. When generating code for a network using this layer, these limitations apply: The layer input and output size are limited by the values specified inInputMemorySizeandOutputMemorySize.	Yes

Activation Layers

Layer	Layer Type Hardware (HW) or Software(SW)	Layer Output Format	Description and Limitations	INT8 Compatible
`reluLayer`	HW	Layer is fused.	A ReLU layer performs a threshold operation to each element of the input where any value less than zero is set to zero. A ReLU layer is supported only when it is preceded by any of these layers: Convolution Fully Connected Adder	Yes
`leakyReluLayer`	HW	Layer is fused.	A leaky ReLU layer performs a threshold operation where any input value less than zero is multiplied by a fixed scalar. A leaky ReLU layer is supported only when it is preceded by any of these layers: Convolution Fully Connected Adder	Yes
`clippedReluLayer`	HW	Layer is fused.	A clipped ReLU layer performs a threshold operation where any input value less than zero is set to zero and any value above the clipping ceiling is set to that clipping ceiling value. A clipped ReLU layer is supported only when it is preceded by any of these layers: Convolution Fully Connected Adder	Yes
`tanhLayer`	HW	Inherit from input	A hyperbolic tangent (tanh) activation layer applies the tanh function on the layer inputs.	No

Normalization, Dropout, and Cropping Layers

Layer	Layer Type Hardware (HW) or Software(SW)	Layer Output Format	Description and Limitations	INT8 Compatible
`batchNormalizationLayer`	HW	Layer is fused.	A batch normalization layer normalizes each input channel across a mini-batch. A batch normalization layer is supported when preceded by an image input layer or convolution layer.	Yes
`crossChannelNormalizationLayer`	HW	Convolution (Conv)	A channel-wise local response (cross-channel) normalization layer carries out channel-wise normalization. The`WindowChannelSize`must be in the range of 3-9 for code generation.	Yes. Runs as single datatype in HW.
`dropoutLayer`	NoOP on inference	NoOP on inference	A dropout layer randomly sets input elements to zero within a given probability.	Yes
`resize2dLayer`(Image Processing Toolbox)	HW	Inherit from input	A 2-D resize layer resizes 2-D input by a scale factor, to a specified height and width, or to the size of a reference input feature map. When generating code for a network using this layer, these limitations apply: The`Method`property must be set to`nearest`. The`GeometricTransformationMode`property must be set to`half-pixel`. The`NearestRoundingMode`property must be set to`round`. The ratio of the output size to input size must be an integer and in the range between two and 256.	No

Pooling and Unpooling Layers

Layer	Layer Type Hardware (HW) or Software(SW)	Layer Output Format	Description and Limitations	INT8 Compatible
`maxPooling2dLayer`	HW	Convolution (Conv)	A max pooling layer performs downsampling by dividing the layer input into rectangular pooling regions and computing the maximum of each region. When generating code for a network using this layer, these limitations apply: Pool size must be 1-66. Stride size must be 1-15 and square. Padding size must be in the range 0-6. `HasUnpoolingOutputs`is supported. When this parameter is enabled, these limitations apply for code generation for this layer: Pool size must be 2-by-2 or 3-by-3. The stride size must be the same as the filter size. Padding size is not supported. Pool size and stride size must be square. For example, [2 2].	Yes No, when`HasUnpoolingOutputs`is enabled.
`maxUnpooling2dLayer`	HW	Convolution (Conv)	A max unpooling layer unpools the output of a max pooling layer.	No
`averagePooling2dLayer`	HW	Convolution (Conv)	An average pooling layer performs downsampling by dividing the layer input into rectangular pooling regions and computing the average values of each region. When generating code for a network using this layer, these limitations apply: Pool size must be 1-66. Stride size must be 1-15 and square. Padding size must be in the range 0-6.	Yes
`globalAveragePooling2dLayer`	HW	Convolution (Conv)	A global average pooling layer performs downsampling by computing the mean of the height and width dimensions of the input. When generating code for a network using this layer, these limitations apply: The pool size must be 1-66 and square.	Yes

Combination Layers

Layer	Layer Type Hardware (HW) or Software(SW)	Layer Output Format	Description and Limitations	INT8 Compatible
`additionLayer`	HW	Inherit from input.	An addition layer adds inputs from multiple neural network layers element-wise. You can now generated code for this layer with`int8`data type when the layer is combined with a Leaky ReLU or Clipped ReLU layer. When generating code for a network using this layer, these limitations apply: Both input layers must have the same output layer format. For example, both layers must have conv output format or fc output format.	Yes
`depthConcatenationLayer`	HW	Inherit from input.	A depth concatenation layer takes inputs that have the same height and width and concatenates them along the third dimension (the channel dimension). When generating code for a network using this layer, these limitations apply: The input activation feature number must be a multiple of the square root of theConvThreadNumber. Layers that have a conv output format and layers that have an FC output format cannot be concatenated together.	Yes
`multiplicationLayer`	HW	Inherit from input	A multiplication layer multiplies inputs from multiple neural network layers element-wise.	No

Sequence Layers

Layer Layer Type Hardware (HW) or Software(SW) Description and Limitations INT8 Compatible

Layer	Layer Type Hardware (HW) or Software(SW)	Description and Limitations	INT8 Compatible
`lstmLayer`	HW	一个LSTM层学习起长期依赖感情n time steps in time series and sequence data. The layer performs additive interactions, which can help improve gradient flow over long sequences during training. When generating code for a network using this layer, these limitations apply: The input must be of single data type. The`OutputMode`property must be set to`sequence`.	No
`gruLayer`	HW	A GRU layer is an RNN layer that learns dependencies between time steps in time series and sequence data. When generating code for a network using this layer, these limitations apply: Inputs must be of single data type. You must set the GRU layer`OutputMode`to`sequence`.	No

lstmLayer

一个LSTM层学习起长期依赖感情n time steps in time series and sequence data. The layer performs additive interactions, which can help improve gradient flow over long sequences during training.

When generating code for a network using this layer, these limitations apply:

The input must be of single data type.
TheOutputModeproperty must be set tosequence.

gruLayer

A GRU layer is an RNN layer that learns dependencies between time steps in time series and sequence data.

When generating code for a network using this layer, these limitations apply:

Inputs must be of single data type.
You must set the GRU layerOutputModetosequence.

Output Layer

Layer	Layer Type Hardware (HW) or Software(SW)	Description and Limitations	INT8 Compatible
`softmaxLayer`	SW and HW	A softmax layer applies a softmax function to the input. If the softmax layer is implemented in hardware: The inputs must be in the range -87 to 88 . Softmax layer followed by adder layer or depth concatenation layer is not supported. The inputs to this layer must have the format 1-by-N, N-by-1, 1-by-1-by-N, N-by-1-by-1, and 1-by-N-by-1. If the convolution module of the deep learning processor is enabled the square root of the convolution thread number must be an integral power of two. If not, the layer is implemented in software.	Yes. Runs as single datatype in SW.
`classificationLayer`	SW	A classification layer computes the cross-entropy loss for multiclass classification issues that have mutually exclusive classes.	Yes
`regressionLayer`	SW	A regression layer computes the half mean squared error loss for regression problems.	Yes
`sigmoidLayer`	SW and HW	A sigmoid layer applies a sigmoid function to the input. When the data type is`single`the sigmoid layer is implemented in the custom module of the deep learning processor configuration. When generating code for a network using this layer, with`single`data type these limitations apply: The inputs must be in the range -87 to 88 . Runs as single datatype in SW.	Yes. When the data type is`int8`the sigmoid layer is implemented in the fully connected (FC) module of the deep learning processor configuration. When generating code for a network using this layer, with`int8`data type these limitations apply: The inputs must be in the range -87 to 88 . Sigmoid layer followed by adder layer or depth concatenation layer is not supported. The inputs to this layer must have the format 1-by-N, N-by-1, 1-by-1-by-N, N-by-1-by-1, and 1-by-N-by-1. If the convolution module of the deep learning processor is enabled the square root of the convolution thread number must be an integral power of two. If not, the layer is implemented in software.

Keras and ONNX Layers

Layer Layer Type Hardware (HW) or Software(SW) Layer Output Format Description and Limitations INT8 Compatible

nnet.keras.layer.FlattenCStyleLayer

Layer will be fused

Flatten activations into 1-D layers assuming C-style (row-major) order.

Annet.keras.layer.FlattenCStyleLayeris supported only when it is followed by a fully connected layer.

Yes

nnet.keras.layer.ZeroPadding2dLayer

层将会融合。

Zero padding layer for 2-D input.

Annet.keras.layer.ZeroPadding2dLayeris supported only when it is preceded by a convolution layer or a maxpool layer. Zero padding layer is supported when followed by a grouped convolution layer.

Yes

Custom Layers

Layer	Layer Type Hardware (HW) or Software(SW)	Layer Output Format	Description and Limitations	INT8 Compatible
Custom Layers	HW	Inherit from input	Custom layers, with or without learnable parameters, that you define for your problem. To learn how to define your custom deep learning layers, seeCreate Deep Learning Processor Configuration for Custom Layers.	No

Supported Boards

These boards are supported by Deep Learning HDL Toolbox:

Xilinx Zynq^®-7000 ZC706
英特尔Arria^®10 SoC
Xilinx Zynq UltraScale+™ MPSoC ZCU102
Custom boards. For more information, seeDeep Learning Processor IP Core Generation for Custom Board.

Third-Party Synthesis Tools and Version Support

Deep Learning HDL Toolbox has been tested with:

Xilinx Vivado^®Design Suite 2022.1
Intel Quartus^®Prime Standard 21.1

Image Input Layer Normalization Hardware Implementation

To enable hardware implementation of the normalization functions for the image input layer, set theHardwareNormalizationargument of thecompilemethod toautooron. WhenHardwareNormalizationis set toauto, the compile method looks for the presence of addition and multiplication layers to implement the normalization function on hardware. The normalization is implemented on hardware by:

Creating a new constant layer, This layer holds the value which is to be subtracted.
Using existing addition and multiplication layers. The layers to be used depends on the normalization function being implemented.

Constant Layer Buffer Content

This table describes the value stored in the constant layer buffer.

Normalization Function	Number of Constants	Constant Layer Buffer Value
`zerocenter`	1	`- Mean`
`zscore`	2	The first constant value is`-Mean`. The second constant value is`1/StandardDeviation`