Latency Considerations with Native Floating Point

Open Script

HDL Coder™ native floating-point technology can generate HDL code from your floating-point design. Native floating-point operators have a latency. When you generate HDL code, the code generator figures out this latency and adds matching delays to balance parallel paths.

View Latency of a Floating-Point Operator

Open thehdlcoder_nfp_delay_allocation金宝app仿真软件®模型。该模型使用singledata types and computes the square root. The model has a parallel path to illustrate how the code generator balances delays.

load_system(“hdlcoder_nfp_delay_allocation') open_system(“hdlcoder_nfp_delay_allocation/DUT')

To generate HDL code:

Right-click theDUTSubsystem and selectHDL Code>Generate HDL for Subsystem.
To see the generated model after HDL code generation, at the command line, entergm_hdlcoder_nfp_delay_allocation.

TheNFP sqrtblock is the floating-point operator corresponding to the Sqrt block in your model, and has a latency of28. The code generator determines this latency and adds a matching delay of length28in the parallel path. To see the latency of the square root operation, double-click theNFP Sqrtblock. TheDelay lengthof theSqrt_pd1block corresponds to the operator latency.

You can customize the latency of your design. Use custom latency settings to design for trade-offs between latency and throughput. You can then optimize your design implementation on the target FPGA device for area and speed. Customize the latency by using:

Latency Strategy setting: Specify whether to map your entire Simulink model or individual blocks in your model to maximum, minimum, or zero latency of the floating-point operator.
Custom Latency: You can specify a custom latency for certain blocks that you use in your Simulink model. The custom latency setting can take values from zero to the maximum latency of the floating-point operator.
Oversampling factor: Increasing theOversampling factoroperates the design at a faster clock rate and absorbs the clock-rate pipelines with the latency of the floating-point operator.
Delay blocks in the model: If your Simulink model has a latency, HDL Coder can absorb some or all of the latency with the native floating-point implementation.

Latency Strategy Setting for Model

You can specify the latency strategy setting for an entire model or for individual blocks in your model.

To specify this setting for a model:

In thehdlcoder_nfp_delay_allocationmodel, right-click theDUTSubsystem and selectHDL Code > HDL Coder Properties.
On theHDL Code Generation > Floating Point, selectUse Floating Point.
ForLatency Strategy, selectMAX,MIN, orZERO.

To specify this setting from the command line:

1. Create ahdlcoder.FloatingPointTargetConfigobject for native floating point by using thehdlcoder.createFloatingPointTargetConfigfunction.

nfpconfig = hdlcoder.createFloatingPointTargetConfig("NativeFloatingPoint"); hdlset_param(“hdlcoder_nfp_delay_allocation','FloatingPointTargetConfiguration', nfpconfig);

2. Specify the latency strategy by using theLatencyStrategy propertyof thenfpconfigobject.

nfpconfig.LibrarySettings.LatencyStrategy ='MAX'

nfpconfig = FloatingPointTargetConfig with properties: Library: 'NATIVEFLOATINGPOINT' LibrarySettings: [1x1 fpconfig.NFPLatencyDrivenMode] IPConfig: [1x1 hdlcoder.FloatingPointTargetConfig.IPConfig] VendorLibrary: [] VendorLibrarySettings: [] VendorIPConfig: []

To see the latency information, generate HDL code and then open the generated model. To open the generated model, enter the commandgm_hdlcoder_nfp_delay_allocation.

Custom Latency Strategy for Blocks

For blocks in your Simulink model, you can selectively customize the latency strategy. By default, the blocks inherit the latency strategy setting you specify for the model. For certain blocks, you can specify a custom latency value that is between zero and the maximum latency of the floating-point operator.

By specifying a custom latency, you can customize your design for trade-offs between:

Clock frequency and power consumption: A higher latency value increases the maximum clock frequency (Fmax) that you can achieve, which increases the dynamic power consumption.
Oversampling factor and sampling frequency: A combination of higher latency value and higher oversampling factor increases the Fmax that you can achieve but reduces the sampling frequency.

To learn more about this setting and how to specify the latency strategy for a block, seeLatencyStrategy.

For example, if you have an Add block in the parallel path in your model, you can specify a custom latency value of2for the Add block by entering these commands.

load_system(“hdlcoder_nfp_delay_allocation_custom') open_system(“hdlcoder_nfp_delay_allocation_custom') hdlset_param(“hdlcoder_nfp_delay_allocation_custom/DUT/Add','LatencyStrategy','Custom') hdlset_param(“hdlcoder_nfp_delay_allocation_custom/DUT/Add','NFPCustomLatency',2)

To see the latency information, generate HDL code and then open the generated model. To open the generated model, enter the commandgm_hdlcoder_nfp_delay_allocation_custom. In the generated model, you see that theNFP Addblock has a latency of2.

Custom Latency Settings for Native Floating-Point IPs

For a model that has a large number of floating-point operators, you can customize the latency for native floating-point IPs by setting the global custom latency for NFP operators. The customization applies to all operators in the model.

For example, if you have a model that has multiple add and product blocks, by default, the block inherits the latency strategy settings specified for the model. By using these commands, you can customize the latency of all theNFP addblocks to4and all theNFP mulblocks to3.

load_system(“hdlcoder_nfp_delay_allocation_global_custom') open_system(“hdlcoder_nfp_delay_allocation_global_custom/Sample_DUT'); hdlset_param(“hdlcoder_nfp_delay_allocation_global_custom','FloatingPointTargetConfiguration',...hdlcoder.createFloatingPointTargetConfig('NativeFloatingPoint','IPConfig',...{{'ADDSUB','SINGLE','CustomLatency', 4}..., {'ADDSUB','DOUBLE','CustomLatency', 4}..., {'MUL','SINGLE','CustomLatency', 3}..., {'MUL','DOUBLE','CustomLatency', 3}}))

To see the latency information, generate HDL code and then open the generated model. To open the generated model, enter the commandgm_hdlcoder_nfp_delay_allocation_global_custom. In the generated model, you see that the all theNFP Addblock has a latency of4and all theNFP mul块有延迟of3.

For the list of keywords to use for native floating-point IPs in the API in the command, refer to the table inLatency Values of Floating-Point Operators.

Oversampling Factor

When you design the blocks in your Simulink model at the data rate, specify anOversampling factorgreater than one. TheOversampling factorinserts pipeline registers at a faster clock rate, which improves clock frequency and reduces area usage. To learn more about clock-rate pipelining, seeClock-Rate Pipelining.

To see the effect ofOversampling factoron the model, in thehdlcoder_nfp_delay_allocationmodel:

Add a Delay block with aDelay lengthof1的输出Sqrtblock.
Right-click the DUT and selectHDL Code > HDL Coder Properties.
On theHDL Code Generation > Global Settingspane, enter a value of40forOversampling factor.

After HDL code generation, the generated model shows theNFP Sqrtblock operating at a clock rate that is 40 times faster than theSqrtblock in your model. TheNFP Sqrtblock absorbed the Delay block in your Simulink model. The Delay block now operates at the clock rate. This implementation saves area by absorbing the additional latency, and improves timing by operating at the faster clock rate.

Delay Absorption in the Model

If your Simulink model has a Delay block with sufficientDelay length相邻的经营者或分开加工ator by only a component that does not take zero input and output a non-zero value, such as a NOT Logical Operator block, HDL Coder absorbs the delays as part of the operator latency.

If theDelay lengthis equal to the latency of the floating-point operator, HDL Coder absorbs the delays and does not introduce any additional latency.

In thehdlcoder_nfp_delay_allocationmodel:

Double-click the Delay block at the output of the Sqrt block and change theDelay lengthto28.
Generate HDL code for theDUTSubsystem.
After HDL code generation, at the command line, entergm_hdlcoder_nfp_delay_allocationto open the generated model.

In the generated model, you see that theNFP Sqrtblock absorbs the Delay block adjacent to the Sqrt block in your original model. This delay absorption occurs because the operator latency is equal to theDelay length. The code generator therefore avoids the additional latency in your model.

If theDelay lengthis less than the operator latency, HDL Coder absorbs the available delays and balances parallel paths by adding matching delays.

In thehdlcoder_nfp_delay_allocationmodel:

Double-click the Delay block at the output of the Sqrt block and change theDelay lengthto21.
Generate HDL code for theDUTSubsystem.
After HDL code generation, at the command line, entergm_hdlcoder_nfp_delay_allocationto open the generated model.

You see that theNFP Sqrtblock absorbed a Delay of length21and added a matching delay of length7in the parallel path because the square root operation requires28delays.

If the delay length is greater than the operator latency, the code generator absorbs a certain number of delays equal to the latency and the excess delays appear outside the operator.

In thehdlcoder_nfp_delay_allocationmodel:

Double-click the Delay block at the output of the Sqrt block and change theDelay lengthto34.
Generate HDL code for theDUTSubsystem.
After HDL code generation, at the command-line, entergm_hdlcoder_nfp_delay_allocationto open the generated model.