Complex Partial-Systolic Matrix Solve Using Q-less QR Decomposition with Forgetting Factor
Compute the value ofXin the equationA'AX=Bfor complex-valued matrices with infinite number of rows using Q-less QR decomposition
Since R2020b
![](http://www.tatmou.com/help/fixedpoint/ref/block_complexpartialsystolicmatrixsolveusingqlessqrdecompositionwithff.png)
Libraries:
Fixed-Point Designer HDL Support / Matrices and Linear Algebra / Linear System Solvers
Description
TheComplex Partial-Systolic Matrix Solve Using Q-less QR Decomposition with Forgetting Factorblock solves the system of linear equations,A'AX=B, using Q-less QR decomposition, whereAandBare complex-valued matrices.Ais an infinitely tall matrix representing streaming data.
When the regularization parameter is nonzero, theComplex Partial-Systolic Matrix Solve Using Q-less QR Decomposition with Forgetting Factorinitializes the first upper-triangular factorRtoλInbefore factoring in the rows ofA, whereλis the regularization parameter andIn=eye(n)
.
Examples
Ports
Input
A(i,:)—Rows of matrixA
vector
Rows of matrixA, specified as a vector.Ais an infinitely tall matrix of streaming data. IfBis single or double,Amust be the same data type asB. IfAis a fixed-point data type,Amust be signed, use binary-point scaling, and have the same word length asB. Slope-bias representation is not supported for fixed-point data types.
Data Types:single
|double
|fixed point
Complex Number Support:Yes
B—MatrixB
matrix | vector
MatrixB, specified as a vector or a matrix.Bis ann-by-pmatrix wheren≥ 2. IfAis single or double,Bmust be the same data type asA. IfBis a fixed-point data type,Bmust be signed, use binary-point scaling, and have the same word length asA. Slope-bias representation is not supported for fixed-point data types.
Data Types:single
|double
|fixed point
validInA—Whether A input is valid
Boolean
scalar
WhetherA(i, ;)
input is valid, specified as a Boolean scalar. This control signal indicates when the data from theA(i,:)
input port is valid. When this value is1
(true
) and thereadyA
value is1
(true
), the block captures the values at theA(i,:)
input port. When this value is0
(false
), the block ignores the input samples.
After sending atrue
validInA
signal, there may be some delay beforereadyA
is set tofalse
. To ensure all data is processed, you must wait untilreadyA
is set tofalse
before sending anothertrue
validInA
signal.
Data Types:Boolean
validInB—Whether input B is valid
Boolean
scalar
Whether inputB
is valid, specified as a Boolean scalar. This control signal indicates when the data from theB
input port is valid. When this value is1
(true
) and thereadyB
value is1
(true
), the block captures the values at theB
input port. When this value is0
(false
), the block ignores the input samples.
After sending atrue
validInB
signal, there may be some delay beforereadyB
is set tofalse
. To ensure all data is processed, you must wait untilreadyB
is set tofalse
before sending anothertrue
validInB
signal.
Data Types:Boolean
restart—Whether to clear internal states
Boolean
scalar
Whether to clear internal states, specified as a Boolean scalar. When this value is 1 (true
), the block stops the current calculation and clears all internal states. When this value is 0 (false
) and thevalidInA
andvalidInB
values are 1 (true
), the block begins a new subframe.
Data Types:Boolean
Output
X—MatrixX
matrix | vector
MatrixX,作为一个矩阵或向量返回。
Data Types:single
|double
|fixed point
validOut—Whether output data is valid
Boolean
scalar
Whether the output data is valid, returned as a Boolean scalar. This control signal indicates when the data at the output portX
is valid. When this value is1
(true
), the block has successfully computed a row ofX. When this value is0
(false
), the output data is not valid.
Data Types:Boolean
readyA—Whether block is ready for input A
Boolean
scalar
Whether the block is ready for input A, returned as a Boolean scalar. This control signal indicates when the block is ready for new input data. When this value is 1 (true
) andvalidInA
value is 1 (true
), the block accepts input data in the next time step. When this value is 0 (false
), the block ignores input data in the next time step.
After sending atrue
validInA
signal, there may be some delay beforereadyA
is set tofalse
. To ensure all data is processed, you must wait untilreadyA
is set tofalse
before sending anothertrue
validInA
signal.
Data Types:Boolean
readyB—Whether block is ready for input B
Boolean
scalar
Whether the block is ready for input B, returned as a Boolean scalar. This control signal indicates when the block is ready for new input data. When this value is 1 (true
) andvalidInB
value is 1 (true
), the block accepts input data in the next time step. When this value is 0 (false
), the block ignores input data in the next time step.
After sending atrue
validInB
signal, there may be some delay beforereadyB
is set tofalse
. To ensure all data is processed, you must wait untilreadyB
is set tofalse
before sending anothertrue
validInB
signal.
Data Types:Boolean
Parameters
Number of columns in matrix A and rows in matrix B—Number of columns in matrixAand rows in matrixB
4
(default) | positive integer-valued scalar
Number of columns in matrixAand rows in matrixB, specified as a positive integer-valued scalar.
公关ogrammatic Use
Block Parameter:n |
Type:character vector |
Values:positive integer-valued scalar |
Default:4 |
Number of columns in matrix B—Number of columns in matrixB
1
(default) | positive integer-valued scalar
Number of columns in matrixB, specified as a positive integer-valued scalar.
公关ogrammatic Use
Block Parameter:p |
Type:character vector |
Values:positive integer-valued scalar |
Default:1 |
Forgetting factor—Forgetting factor applied after each row of the matrix is factored
0.99 (default) | real positive scalar
Forgetting factor applied after each row of the matrix is factored, specified as a real positive scalar. The output is updated as each row ofAis input indefinitely.
公关ogrammatic Use
Block Parameter:forgettingFactor |
Type:character vector |
Values:positive integer-valued scalar |
Default:0.99 |
Regularization parameter—Regularization parameter
0 (default) | real nonnegative scalar
Regularization parameter, specified as a nonnegative scalar. Small, positive values of the regularization parameter can improve the conditioning of the problem and reduce the variance of the estimates. While biased, the reduced variance of the estimate often results in a smaller mean squared error when compared to least-squares estimates.
公关ogrammatic Use
Block Parameter:regularizationParameter |
Type:character vector |
Values:real nonnegative scalar |
Default:0 |
Output datatype—Data type of output matrixX
fixdt(1,18,14)
(default) |double
|single
|fixdt(1,16,0)
|
Data type of the output matrixX, specified asfixdt(1,18,14)
,double
,single
,fixdt(1,16,0)
, or as a user-specified data type expression. The type can be specified directly, or expressed as a data type object such asSimulink.NumericType
.
公关ogrammatic Use
Block Parameter:OutputType |
Type:character vector |
Values:'fixdt(1,18,14)' |'double' |'single' |'fixdt(1,16,0)' |'' |
Default:'fixdt(1,18,14)' |
Tips
Use
fixed.forgettingFactor
to compute the forgetting factor,α, for an infinite number of rows with the equivalent gain of a matrix withmrows.Use
fixed.forgettingFactorInverse
to compute the number of rows,m, of a matrix with equivalent gain corresponding to forgetting factorα.
Algorithms
Q-less QR Decomposition with Forgetting Factor
TheComplex Partial-Systolic Matrix Solve Using Q-less QR Decomposition with Forgetting Factorblock implements the following recursion to compute the upper-triangular factorRof continuously streamingn-by-1 row vectorsA(k,:)using forgetting factorα. It's as if matrixAis infinitely tall. The forgetting factor in the range0 <α< 1prevents it from integrating without bound.
Q-less QR Decomposition with Forgetting Factor and Tikhonov Regularization
The outputXkafter processing thekthinputA(k,:) is computed using the following iteration.
This is mathematically equivalent to computingA'kAkX=B, whereAkis defined as follows, though the block never actually createsAk.
Forward and Backward Substitution
When an upper triangular factor is ready, then forward and backward substitution are computed with the current inputBto produce outputX.
Choosing the Implementation Method
Partial-systolic implementations prioritize speed of computations over space constraints, while burst implementations prioritize space constraints at the expense of speed of the operations. The following table illustrates the tradeoffs between the implementations available for matrix decompositions and solving systems of linear equations.
Implementation | Ready | Latency | Area |
---|---|---|---|
Systolic | C | O(n) | O(mn2) |
Partial-Systolic | C | O(m) | O(n2) |
Partial-Systolic with Forgetting Factor | C | O(n) | O(n2) |
Burst | O(n) | O(mn2) | O(n) |
WhereCis a constant proportional to the word length of the data,mis the number of rows in matrixA, andnis the number of columns in matrixA.
For additional considerations in selecting a block for your application, seeChoose a Block for HDL-Optimized Fixed-Point Matrix Operations.
AMBA AXI Handshake Process
This block uses the AMBA AXI handshake protocol [1]. Thevalid/ready
handshake process is used to transfer data and control information. This two-way control mechanism allows both the manager and subordinate to control the rate at which information moves between manager and subordinate. Avalid
signal indicates when data is available. Theready
signal indicates that the block can accept the data. Transfer of data occurs only when both thevalid
andready
signals are high.
Block Timing
TheBurst Matrix Solve Using Q-less QR Decomposition with Forgetting Factorblocks accept matrixArow-by-row and matrixBas a single vector. After accepting the first valid pair ofAandBmatrices, the block outputs theXmatrices row by row continuously. The matrix is output from the first row to the last row.
例如,假设输入Amatrix is 3-by-3. Additionally assume thatvalidIn
asserts beforeready
, meaning that the upstream data source is faster than the QR decomposition.
在图中,
A1r1
is the first row of the firstAmatrix,A1r2
第二行第一个吗Amatrix, and so on.validIn
toready
— From a successfulArow input to the block being ready to accept the next row.validOut
tovalidOut
— Because the Forward Backward Substitution block runs continuously, it generates output at a constant rate. This is the delay between two adjacent valid outputs.nthrow
validIn
tovalidOut
— From thenthrow input to the block starting to output the first solution.This block is always ready to acceptBmatrices, so
readyB
is always asserted.
ThePartial-Systolic Matrix Solve Using Q-less QR Decomposition with Forgetting Factorblocks accept matrixArow-by-row and matrixBas a single vector. After accepting the first valid pair ofAandBmatrices, the block outputs theXmatrices row by row continuously.
例如,假设输入Amatrix is 3-by-3. Additionally assume thatvalidIn
asserts beforeready
, meaning that the upstream data source is faster than the QR decomposition.
在图中,
A1r1
is the first row of the firstAmatrix,A1r2
第二行第一个吗Amatrix, and so on.validIn
toready
— From a successfulArow input to the block being ready to accept the next row.validOut
tovalidOut
— Because the Forward Backward Substitution block runs continuously, it generates output at a constant rate. This is the delay between two adjacent valid outputs.Last row
validIn
tovalidOut
— From the lastmthrow input to the block starting to output the solution.This block is always ready to acceptBmatrices, so
readyB
is always asserted.
The following table provides details of the timing for theBurst Matrix Solve Using Q-less QR Decomposition with Forgetting FactorandPartial-Systolic Matrix Solve Using Q-less QR Decomposition with Forgetting Factorblocks.
Block | 操作 | validIn toready (cycles) |
validOut tovalidOut (cycles) |
nthRowvalidIn tovalidOut (cycles) |
---|---|---|---|---|
Real Burst Matrix Solve Using Q-less QR Decomposition with Forgetting Factor | Asynchronous | (wl+ 5)*n+ 2 +n | 4*n2+ 25*n+ 5 + 2*n*wl+ 2*n*nextpow2(wl) | 4*n2+ 25*n+ 5 + 2*n*wl+ 2*n*nextpow2(wl) + (wl+ 5)*n+n |
Complex Burst Matrix Solve Using Q-less QR Decomposition with Forgetting Factor | Asynchronous | (wl*2 + 11)*n+ 2 +n | 4*n2+ 25*n+ 5 + 2*n*wl+ 2*n*nextpow2(wl) | 4*n2+ 25*n+ 5 + 2*n*wl+ 2*n*nextpow2(wl) + (wl*2 + 11)*n+n |
Real Partial-Systolic Matrix Solve Using Q-less QR Decomposition with Forgetting Factor | Asynchronous | wl+ 7 | 4*n2+ 25*n+ 5 + 2*n*wl+ 2*n*nextpow2(wl) | 4*n2+ 25*n+ 5 + 2*n*wl+ 2*n*nextpow2(wl) + (wl+ 6)*n+ 2 |
Complex Partial-Systolic Matrix Solve Using Q-less QR Decomposition with Forgetting Factor | Asynchronous | wl+ 9 | 4*n2+ 25*n+ 5 + 2*n*wl+ 2*n*nextpow2(wl) | 4*n2+ 25*n+ 5 + 2*n*wl+ 2*n*nextpow2(wl) + (wl+ 7.5)*2*n+ 2 |
In the table,mrepresents the number of rows in matrixA, andnis the number of columns in matrixA.wlrepresents the word length ofA.
If the data type ofAis fixed point, thenwlis the word length.
If the data type ofAis double, thenwlis 53.
If the data type ofAis single, thenwlis 24.
Hardware Resource Utilization
This block supports HDL code generation using the Simulink®高密度脂蛋白工作流顾问。例如,看到的HDL Code Generation and FPGA Synthesis from Simulink Model(HDL Coder)andImplement Digital Downconverter for FPGA(DSP HDL Toolbox).
In R2022b: The following tables show the post place-and-route resource utilization results and timing summary, respectively.
This example data was generated by synthesizing the block on a Xilinx®Zynq®UltraScale™ + RFSoC ZCU111 evaluation board. The synthesis tool was Vivado®v.2020.2 (win64).
The following parameters were used for synthesis.
Block parameters:
n = 16
p = 1
MatrixAdimension: inf-by-16
MatrixBdimension: 16-by-1
Input data type:
sfix16_En14
Target frequency: 250 MHz
Resource | Usage | Available | Utilization (%) |
---|---|---|---|
CLB LUTs | 334280 | 425280 | 78.60 |
CLB Registers | 261319 | 850560 | 30.72 |
DSPs | 12 | 4272 | 0.28 |
Block RAM Tile | 0 | 1080 | 0.00 |
URAM | 0 | 80 | 0.00 |
Value | |
---|---|
Requirement | 4 ns |
Data Path Delay | 3.892 ns |
Slack | 0.088 ns |
Clock Frequency | 255.62 MHz |
References
[1] "AMBA AXI and ACE Protocol Specification Version E."https://developer.arm.com/documentation/ihi0022/e/AMBA-AXI3-and-AXI4-Protocol-Specification/Single-Interface-Requirements/Basic-read-and-write-transactions/Handshake-process
Extended Capabilities
C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.
Slope-bias representation is not supported for fixed-point data types.
HDL Code Generation
Generate Verilog and VHDL code for FPGA and ASIC designs using HDL Coder™.
HDL Coder™ provides additional configuration options that affect HDL implementation and synthesized logic.
This block has one default HDL architecture.
General | |
---|---|
ConstrainedOutputPipeline | Number of registers to place at the outputs by moving existing delays within your design. Distributed pipelining does not redistribute these registers. The default is |
InputPipeline | Number of input pipeline stages to insert in the generated code. Distributed pipelining and constrained output pipelining can move these registers. The default is |
OutputPipeline | Number of output pipeline stages to insert in the generated code. Distributed pipelining and constrained output pipelining can move these registers. The default is |
Supports fixed-point data types only.
版本sion History
Introduced in R2020bR2023a:Smart unrolling for improved resource utilization
This block depends on a partial-systolic QR decomposition block. Since 23a, when you update the diagram, the loop which composes the partial-systolic pipeline in the QR decomposition block is unrolled. This updated internal architecture removes dead operations in simulation and generated code, thus requiring fewer hardware resources. This block simulates with clock and bit-true fidelity with respect to library versions of these blocks in previous releases.
R2022a:Support for Tikhonov regularization parameter
TheComplex Partial-Systolic Matrix Solve Using Q-less QR Decomposition with Forgetting Factorblock now supports the TikhonovRegularization parameter.
R2021a:Reduced HDL resource utilization
This block now has an improved algorithm to reduce resource utilization on hardware-constrained target platforms.
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select:.
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
美洲
- América Latina(Español)
- Canada(English)
- United States(English)
Europe
- Belgium(English)
- Denmark(English)
- Deutschland(Deutsch)
- España(Español)
- Finland(English)
- France(Français)
- Ireland(English)
- Italia(Italiano)
- Luxembourg(English)
- Netherlands(English)
- Norway(English)
- Österreich(Deutsch)
- Portugal(English)
- Sweden(English)
- Switzerland
- United Kingdom(English)