grouptransform
Syntax
Description
Table Data
returns transformed data in the place of the nongrouping variables in table or timetableG
= grouptransform(T
,groupvars
,method
)T
. The group-wise computations inmethod
are applied to each nongrouping variable. Groups are defined by rows in the variables ingroupvars
that have the same unique combination of values. For example,G = grouptransform(T,"HealthStatus","norm")
normalizes the data inT
通过使用2-norm健康状况。
specifies to bin rows inG
= grouptransform(T
,groupvars
,groupbins
,method
)groupvars
according to binning schemegroupbins
prior to grouping and appends the bins to the output table as additional variables. For example,G = grouptransform(T,"SaleDate","year","rescale")
bins by sale year and scales the data inT
to the range [0, 1].
specifies additional properties using one or more name-value arguments for any of the previous syntaxes. For example,G
= grouptransform(___,Name,Value
)G = grouptransform(T,"Temp","linearfill","ReplaceValues",false)
appends the filled data as an additional variable ofT
instead of replacing the nongrouping variables.
Array Data
returns transformed data in the place of column vectors in the input vector or matrixB
= grouptransform(A
,groupvars
,method
)A
. The group-wise computations inmethod
are applied to all column vectors inA
. Groups are defined by rows in the column vectors ingroupvars
that have the same unique combination of values.
specifies additional properties using one or more name-value arguments for either of the previous syntaxes for an input array.B
= grouptransform(___,Name,Value
)
Examples
Fill Missing Data by Group
Create a timetable that contains a progress status for three teams.
timeStamp = days([1 1 1 2 2 2 3 3 3]'); teamNumber = [1 2 3 1 2 3 1 2 3]'; percentComplete = [14.2 28.1 11.5 NaN NaN 19.3 46.1 51.2 30.3]'; T = timetable(timeStamp,teamNumber,percentComplete)
T=9×2 timetabletimeStamp teamNumber percentComplete _________ __________ _______________ 1 day 1 14.2 1 day 2 28.1 1 day 3 11.5 2 days 1 NaN 2 days 2 NaN 2 days 3 19.3 3 days 1 46.1 3 days 2 51.2 3 days 3 30.3
Fill missing status percentages, indicated withNaN
, for each team using linear interpolation.
G = grouptransform(T,"teamNumber","linearfill","percentComplete")
G=9×2 timetabletimeStamp teamNumber percentComplete _________ __________ _______________ 1 day 1 14.2 1 day 2 28.1 1 day 3 11.5 2 days 1 30.15 2 days 2 39.65 2 days 3 19.3 3 days 1 46.1 3 days 2 51.2 3 days 3 30.3
To append the filled data to the original table instead of replacing thepercentComplete
variable, useReplaceValues
.
Gappend = grouptransform(T,"teamNumber","linearfill","percentComplete","ReplaceValues",false)
Gappend=9×3 timetabletimeStamp teamNumber percentComplete linearfill_percentComplete _________ __________ _______________ __________________________ 1 day 1 14.2 14.2 1 day 2 28.1 28.1 1 day 3 11.5 11.5 2 days 1 NaN 30.15 2 days 2 NaN 39.65 2 days 3 19.3 19.3 3 days 1 46.1 46.1 3 days 2 51.2 51.2 3 days 3 30.3 30.3
Normalize Data by Day Name
Create a table of dates and corresponding profits.
timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10;...2017 3 14; 2017 3 31; 2017 3 25;...2017 3 29; 2017 3 21; 2017 3 18]); profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]'; T = table(timeStamps,profit)
T=10×2 tabletimeStamps profit ___________ ______ 04-Mar-2017 2032 02-Mar-2017 3071 15-Mar-2017 1185 10-Mar-2017 2587 14-Mar-2017 1998 31-Mar-2017 2899 25-Mar-2017 3112 29-Mar-2017 909 21-Mar-2017 2619 18-Mar-2017 3085
Binning by day name, normalize the profits using the 2-norm.
G = grouptransform(T,"timeStamps","dayname","norm")
G=10×3 tabletimeStamps profit dayname_timeStamps ___________ _______ __________________ 04-Mar-2017 0.42069 Saturday 02-Mar-2017 1 Thursday 15-Mar-2017 0.79344 Wednesday 10-Mar-2017 0.66582 Friday 14-Mar-2017 0.60654 Tuesday 31-Mar-2017 0.74612 Friday 25-Mar-2017 0.64428 Saturday 29-Mar-2017 0.60864 Wednesday 21-Mar-2017 0.79506 Tuesday 18-Mar-2017 0.63869 Saturday
Group Operations with Vector Data
Create a vector of dates and a vector of corresponding profit values.
timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10;...2017 3 14; 2017 3 31; 2017 3 25;...2017 3 29; 2017 3 21; 2017 3 18]); profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]';
Binning by day name, normalize the profits using the 2-norm. Display the transformed data and which group it corresponds to.
[normDailyProfit,dayName] = grouptransform(profit,timeStamps,"dayname","norm")
normDailyProfit =10×10.4207 1.0000 0.7934 0.6658 0.6065 0.7461 0.6443 0.6086 0.7951 0.6387
dayName =10x1 categoricalSaturday Thursday Wednesday Friday Tuesday Friday Saturday Wednesday Tuesday Saturday
Input Arguments
T
—Input table
表格|timetable
Input table, specified as a table or timetable.
A
—Input array
column vector|matrix
Input array, specified as a column vector or a group of column vectors stored as a matrix.
groupvars
—Grouping variables or vectors
scalar|vector|matrix|cell array|pattern|function handle|表格vartype
subscript
Grouping variables or vectors, specified as one of these options:
For array input data,
groupvars
can be either a column vector with the same number of rows asA
or a group of column vectors arranged in a matrix or cell array.For table or timetable input data,
groupvars
indicates which variables to use to compute groups in the data. You can specify the grouping variables with any of the options in this table.Indexing Scheme Examples Variable names:
A string, character vector, or cell array
A
pattern
object
"A"
or'A'
— A variable namedA
["A","B"]
or{'A','B'}
— Two variables namedA
andB
"Var"+digitsPattern(1)
— Variables named"Var"
followed by a single digit
Variable index:
An index number that refers to the location of a variable in the table
A vector of numbers
A logical vector. Typically, this vector is the same length as the number of variables, but you can omit trailing
0
orfalse
values
3
— The third variable from the table[2 3]
— The second and third variables from the table[false false true]
— The third variable
Function handle:
A function handle that takes a table variable as input and returns a logical scalar
@isnumeric
— All the variables containing numeric values
Variable type:
A
vartype
subscript that selects variables of a specified type
vartype("numeric")
— All the variables containing numeric values
Example:grouptransform(T,"Var3",method)
method
—Transformation method
"zscore"
|"norm"
|"meancenter"
|"rescale"
|"meanfill"
|"linearfill"
|function handle
Transformation method, specified as one of these values:
Method |
Description |
---|---|
|
Normalize data to have mean 0 and standard deviation 1 |
|
Normalize data by 2-norm |
|
Normalize data to have mean 0 |
|
Rescale range to [0,1] |
|
Fill missing values with the mean of the group data |
|
Fill missing values by linear interpolation of nonmissing group data |
You can also specify a function handle that returns one array whose first dimension has length 1 or has the same number of rows as the input data. If the function returns an array with first dimension length equal to 1, thengrouptransform
repeats that value so that the output has the same number of rows as the input.
Data Types:char
|string
|function_handle
groupbins
—Binning scheme
"none"
(default) |scalar|vector|cell array
Binning scheme, specified as one of these options:
"none"
, indicating no binningA list of bin edges, specified as a numeric vector, or a
datetime
vector fordatetime
grouping variables or vectorsA number of bins, specified as a positive integer scalar
A time duration, specified as a scalar of type
duration
orcalendarDuration
indicating bin widths (fordatetime
orduration
grouping variables or vectors only)为每个组织单元阵列清单装箱方法ping variable or vector
A time bin for
datetime
andduration
grouping variables or vectors only, specified as one of these strings.Value Description Data Type "second"
Each bin is 1 second.
datetime
andduration
"minute"
Each bin is 1 minute.
datetime
andduration
"hour"
Each bin is 1 hour.
datetime
andduration
"day"
Each bin is 1 calendar day. This value accounts for daylight saving time shifts.
datetime
andduration
"week"
Each bin is 1 calendar week. datetime
only"month"
Each bin is 1 calendar month. datetime
only"quarter"
Each bin is 1 calendar quarter. datetime
only"year"
Each bin is 1 calendar year. This value accounts for leap days.
datetime
andduration
"decade"
Each bin is 1 decade (10 calendar years). datetime
only"century"
Each bin is 1 century (100 calendar years). datetime
only"secondofminute"
Bins are seconds from 0 to 59.
datetime
only"minuteofhour"
Bins are minutes from 0 to 59.
datetime
only"hourofday"
Bins are hours from 0 to 23.
datetime
only"dayofweek"
Bins are days from 1 to 7. The first day of the week is Sunday.
datetime
only"dayname"
Bins are full day names, such as "Sunday"
.datetime
only"dayofmonth"
Bins are days from 1 to 31. datetime
only"dayofyear"
Bins are days from 1 to 366. datetime
only"weekofmonth"
Bins are weeks from 1 to 6. datetime
only"weekofyear"
Bins are weeks from 1 to 54. datetime
only"monthname"
Bins are full month names, such as "January"
.datetime
only"monthofyear"
Bins are months from 1 to 12.
datetime
only"quarterofyear"
Bins are quarters from 1 to 4. datetime
only
When multiple grouping variables or vectors are specified, you can provide a single binning method that is applied to all grouping variables or vectors, or a cell array containing a binning method for each grouping variable or vector such as{"none",[0 2 4 Inf]}
.
datavars
—Table variables to operate on
scalar|vector|cell array|pattern|function handle|表格vartype
subscript
Table variables to operate on, specified as one of the options in this table.datavars
indicates which variables of the input table or timetable to apply the methods to. Other variables not specified bydatavars
pass through to the output without being operated on. Whendatavars
is not specified,grouptransform
operates on each nongrouping variable.
Indexing Scheme | Examples |
---|---|
Variable names:
|
|
Variable index:
|
|
Function handle:
|
|
Variable type:
|
|
Example:grouptransform(T,groupvars,method,["Var1" "Var2" "Var4"])
Name-Value Arguments
Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN
, whereName
is the argument name andValue
is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Example:G = grouptransform(T,groupvars,groupbins,"zscore",IncludedEdge="right")
Before R2021a, use commas to separate each name and value, and encloseName
in quotes.
Example:G = grouptransform(T,groupvars,groupbins,"zscore","IncludedEdge","right")
IncludedEdge
—Included bin edge
"left"
(default) |"right"
Included bin edge, specified as either"left"
or"right"
, indicating which end of the bin interval is inclusive.
This name-value argument can be specified only whengroupbins
is specified, and the value applies to all binning schemes for all grouping variables or vectors.
ReplaceValues
—Replace values indicator
true
or1
(default) |false
or0
Replace values indicator, specified as one of these values:
true
or1
— Replace nongrouping table variables or column vectors in the input data with table variables or column vectors containing transformed data.false
or0
— Append the input data with the table variables or column vectors containing transformed data.
Output Arguments
G
— Output table
表格| timetable
Output table for table or timetable input data, returned as a table or timetable.G
contains the transformed data for each group.
B
— Output array
vector | matrix
Output array for array input data, returned as a vector or matrix.B
contains the transformed data in the place of the nongrouping vectors.
BG
— Grouping vectors
column vector | cell array of column vectors
Grouping vectors for array input data, returned as a column vector or cell array of column vectors.BG
contains the unique grouping vector or binned grouping vector combinations that correspond to the rows inB
.
Tips
When making many calls to
grouptransform
, consider converting grouping variables to typecategorical
orlogical
在可能的情况下提高性能。对于example, if you have a string array grouping variable (such asHealthStatus
with elements"Poor"
,"Fair"
,"Good"
, and"Excellent"
), you can convert it to a categorical variable using the commandcategorical(HealthStatus)
.
Extended Capabilities
Tall Arrays
Calculate with arrays that have more rows than fit in memory.
Usage notes and limitations:
If
A
andgroupvars
are both tall matrices, then they must have the same number of rows.If the first input is a tall matrix, then
groupvars
can be a cell array containing tall grouping vectors.The
groupvars
anddatavars
arguments do not support function handles.If the
method
argument is a function handle, then it must be a valid input forsplitapply
operating on a tall array.When grouping by discretized datetime arrays, the categorical group names are different compared to in-memory
grouptransform
calculations.
For more information, seeTall Arrays.
C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.
Usage notes and limitations:
Sparse inputs are not supported.
Binning scheme is not supported for datetime or duration data.
Input tables that contain multidimensional arrays are not supported.
Computation methods must be constant.
Grouping variables must be constant when the first input argument is a table.
Data variables must be constant.
Binning scheme specified as character vectors or strings must be constant.
Name-value arguments must be constant.
Computation methods cannot return sparse or multidimensional results.
Thread-Based Environment
Run code in the background using MATLAB®backgroundPool
or accelerate code with Parallel Computing Toolbox™ThreadPool
.
This function fully supports thread-based environments. For more information, seeRun MATLAB Functions in Thread-Based Environment.
Version History
Introduced in R2018bR2022b:Code generation support
Generate C or C++ code for thegrouptransform
function. For usage notes and limitations, seeC/C++ Code Generation.
R2022a:Improved performance with small group size
Thegrouptransform
function shows improved performance, especially when the data count in each group is small.
对于example, this code transforms by group a matrix with 500 groups with a count of 10 each. The code is about 6.20x faster than in the previous release.
functiontimingGrouptransform data = (1:5000)'; groups = repelem(1:length(data)/10,10)'; p = randperm(length(data)); data = data(p); groups = groups(p); ticfork = 1:290 G = grouptransform(data,groups,"norm");endtocend
The approximate execution times are:
R2021b:6.26 s
R2022a:1.01 s
The code was timed on a Windows®10, Intel®Xeon®CPU E5-1650 v4 @ 3.60 GHz test system by calling thetimingGrouptransform
function.
See Also
Functions
Live Editor Tasks
Open Example
You have a modified version of this example. Do you want to open this example with your edits?
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select:.
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina(Español)
- Canada(English)
- United States(English)
Europe
- Belgium(English)
- Denmark(English)
- Deutschland(Deutsch)
- España(Español)
- Finland(English)
- France(Français)
- Ireland(English)
- Italia(Italiano)
- Luxembourg(English)
- Netherlands(English)
- Norway(English)
- Österreich(Deutsch)
- Portugal(English)
- Sweden(English)
- Switzerland
- United Kingdom(English)