Main Content

grouptransform

Transform by group

Since R2018b

Description

Table Data

G= grouptransform(T,groupvars,method)returns transformed data in the place of the nongrouping variables in table or timetableT. The group-wise computations inmethodare applied to each nongrouping variable. Groups are defined by rows in the variables ingroupvarsthat have the same unique combination of values. For example,G = grouptransform(T,"HealthStatus","norm")normalizes the data inT通过使用2-norm健康状况。

example

G= grouptransform(T,groupvars,groupbins,method)specifies to bin rows ingroupvarsaccording to binning schemegroupbinsprior to grouping and appends the bins to the output table as additional variables. For example,G = grouptransform(T,"SaleDate","year","rescale")bins by sale year and scales the data inTto the range [0, 1].

G= grouptransform(___,datavars)表变量指定了应用方法to for either of the previous syntaxes.

example

G= grouptransform(___,Name,Value)specifies additional properties using one or more name-value arguments for any of the previous syntaxes. For example,G = grouptransform(T,"Temp","linearfill","ReplaceValues",false)appends the filled data as an additional variable ofTinstead of replacing the nongrouping variables.

Array Data

B= grouptransform(A,groupvars,method)returns transformed data in the place of column vectors in the input vector or matrixA. The group-wise computations inmethodare applied to all column vectors inA. Groups are defined by rows in the column vectors ingroupvarsthat have the same unique combination of values.

example

B= grouptransform(A,groupvars,groupbins,method)specifies to bin rows ingroupvarsaccording to binning schemegroupbinsprior to grouping.

B= grouptransform(___,Name,Value)specifies additional properties using one or more name-value arguments for either of the previous syntaxes for an input array.

example

[B,BG] = grouptransform(A,___)also returns the values of the grouping vectors or binned grouping vectors corresponding to the rows inB.

Examples

collapse all

Create a timetable that contains a progress status for three teams.

timeStamp = days([1 1 1 2 2 2 3 3 3]'); teamNumber = [1 2 3 1 2 3 1 2 3]'; percentComplete = [14.2 28.1 11.5 NaN NaN 19.3 46.1 51.2 30.3]'; T = timetable(timeStamp,teamNumber,percentComplete)
T=9×2 timetabletimeStamp teamNumber percentComplete _________ __________ _______________ 1 day 1 14.2 1 day 2 28.1 1 day 3 11.5 2 days 1 NaN 2 days 2 NaN 2 days 3 19.3 3 days 1 46.1 3 days 2 51.2 3 days 3 30.3

Fill missing status percentages, indicated withNaN, for each team using linear interpolation.

G = grouptransform(T,"teamNumber","linearfill","percentComplete")
G=9×2 timetabletimeStamp teamNumber percentComplete _________ __________ _______________ 1 day 1 14.2 1 day 2 28.1 1 day 3 11.5 2 days 1 30.15 2 days 2 39.65 2 days 3 19.3 3 days 1 46.1 3 days 2 51.2 3 days 3 30.3

To append the filled data to the original table instead of replacing thepercentCompletevariable, useReplaceValues.

Gappend = grouptransform(T,"teamNumber","linearfill","percentComplete","ReplaceValues",false)
Gappend=9×3 timetabletimeStamp teamNumber percentComplete linearfill_percentComplete _________ __________ _______________ __________________________ 1 day 1 14.2 14.2 1 day 2 28.1 28.1 1 day 3 11.5 11.5 2 days 1 NaN 30.15 2 days 2 NaN 39.65 2 days 3 19.3 19.3 3 days 1 46.1 46.1 3 days 2 51.2 51.2 3 days 3 30.3 30.3

Create a table of dates and corresponding profits.

timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10;...2017 3 14; 2017 3 31; 2017 3 25;...2017 3 29; 2017 3 21; 2017 3 18]); profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]'; T = table(timeStamps,profit)
T=10×2 tabletimeStamps profit ___________ ______ 04-Mar-2017 2032 02-Mar-2017 3071 15-Mar-2017 1185 10-Mar-2017 2587 14-Mar-2017 1998 31-Mar-2017 2899 25-Mar-2017 3112 29-Mar-2017 909 21-Mar-2017 2619 18-Mar-2017 3085

Binning by day name, normalize the profits using the 2-norm.

G = grouptransform(T,"timeStamps","dayname","norm")
G=10×3 tabletimeStamps profit dayname_timeStamps ___________ _______ __________________ 04-Mar-2017 0.42069 Saturday 02-Mar-2017 1 Thursday 15-Mar-2017 0.79344 Wednesday 10-Mar-2017 0.66582 Friday 14-Mar-2017 0.60654 Tuesday 31-Mar-2017 0.74612 Friday 25-Mar-2017 0.64428 Saturday 29-Mar-2017 0.60864 Wednesday 21-Mar-2017 0.79506 Tuesday 18-Mar-2017 0.63869 Saturday

Create a vector of dates and a vector of corresponding profit values.

timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10;...2017 3 14; 2017 3 31; 2017 3 25;...2017 3 29; 2017 3 21; 2017 3 18]); profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]';

Binning by day name, normalize the profits using the 2-norm. Display the transformed data and which group it corresponds to.

[normDailyProfit,dayName] = grouptransform(profit,timeStamps,"dayname","norm")
normDailyProfit =10×10.4207 1.0000 0.7934 0.6658 0.6065 0.7461 0.6443 0.6086 0.7951 0.6387
dayName =10x1 categoricalSaturday Thursday Wednesday Friday Tuesday Friday Saturday Wednesday Tuesday Saturday

Input Arguments

collapse all

Input table, specified as a table or timetable.

Input array, specified as a column vector or a group of column vectors stored as a matrix.

Grouping variables or vectors, specified as one of these options:

  • For array input data,groupvarscan be either a column vector with the same number of rows asAor a group of column vectors arranged in a matrix or cell array.

  • For table or timetable input data,groupvarsindicates which variables to use to compute groups in the data. You can specify the grouping variables with any of the options in this table.

    Indexing Scheme Examples

    Variable names:

    • A string, character vector, or cell array

    • Apatternobject

    • "A"or'A'— A variable namedA

    • ["A","B"]or{'A','B'}— Two variables namedAandB

    • "Var"+digitsPattern(1)— Variables named"Var"followed by a single digit

    Variable index:

    • An index number that refers to the location of a variable in the table

    • A vector of numbers

    • A logical vector. Typically, this vector is the same length as the number of variables, but you can omit trailing0orfalsevalues

    • 3— The third variable from the table

    • [2 3]— The second and third variables from the table

    • [false false true]— The third variable

    Function handle:

    • A function handle that takes a table variable as input and returns a logical scalar

    • @isnumeric— All the variables containing numeric values

    Variable type:

    • Avartypesubscript that selects variables of a specified type

    • vartype("numeric")— All the variables containing numeric values

Example:grouptransform(T,"Var3",method)

Transformation method, specified as one of these values:

Method

Description

"zscore"

Normalize data to have mean 0 and standard deviation 1

"norm"

Normalize data by 2-norm

"meancenter"

Normalize data to have mean 0

"rescale"

Rescale range to [0,1]

"meanfill"

Fill missing values with the mean of the group data

"linearfill"

Fill missing values by linear interpolation of nonmissing group data

You can also specify a function handle that returns one array whose first dimension has length 1 or has the same number of rows as the input data. If the function returns an array with first dimension length equal to 1, thengrouptransformrepeats that value so that the output has the same number of rows as the input.

Data Types:char|string|function_handle

Binning scheme, specified as one of these options:

  • "none", indicating no binning

  • A list of bin edges, specified as a numeric vector, or adatetimevector fordatetimegrouping variables or vectors

  • A number of bins, specified as a positive integer scalar

  • A time duration, specified as a scalar of typedurationorcalendarDurationindicating bin widths (fordatetimeordurationgrouping variables or vectors only)

  • 为每个组织单元阵列清单装箱方法ping variable or vector

  • A time bin fordatetimeanddurationgrouping variables or vectors only, specified as one of these strings.

    Value Description Data Type
    "second"

    Each bin is 1 second.

    datetimeandduration
    "minute"

    Each bin is 1 minute.

    datetimeandduration
    "hour"

    Each bin is 1 hour.

    datetimeandduration
    "day"

    Each bin is 1 calendar day. This value accounts for daylight saving time shifts.

    datetimeandduration
    "week" Each bin is 1 calendar week. datetimeonly
    "month" Each bin is 1 calendar month. datetimeonly
    "quarter" Each bin is 1 calendar quarter. datetimeonly
    "year"

    Each bin is 1 calendar year. This value accounts for leap days.

    datetimeandduration
    "decade" Each bin is 1 decade (10 calendar years). datetimeonly
    "century" Each bin is 1 century (100 calendar years). datetimeonly
    "secondofminute"

    Bins are seconds from 0 to 59.

    datetimeonly
    "minuteofhour"

    Bins are minutes from 0 to 59.

    datetimeonly
    "hourofday"

    Bins are hours from 0 to 23.

    datetimeonly
    "dayofweek"

    Bins are days from 1 to 7. The first day of the week is Sunday.

    datetimeonly
    "dayname" Bins are full day names, such as"Sunday". datetimeonly
    "dayofmonth" Bins are days from 1 to 31. datetimeonly
    "dayofyear" Bins are days from 1 to 366. datetimeonly
    "weekofmonth" Bins are weeks from 1 to 6. datetimeonly
    "weekofyear" Bins are weeks from 1 to 54. datetimeonly
    "monthname" Bins are full month names, such as"January". datetimeonly
    "monthofyear"

    Bins are months from 1 to 12.

    datetimeonly
    "quarterofyear" Bins are quarters from 1 to 4. datetimeonly

When multiple grouping variables or vectors are specified, you can provide a single binning method that is applied to all grouping variables or vectors, or a cell array containing a binning method for each grouping variable or vector such as{"none",[0 2 4 Inf]}.

Table variables to operate on, specified as one of the options in this table.datavarsindicates which variables of the input table or timetable to apply the methods to. Other variables not specified bydatavarspass through to the output without being operated on. Whendatavarsis not specified,grouptransformoperates on each nongrouping variable.

Indexing Scheme Examples

Variable names:

  • A string, character vector, or cell array

  • Apatternobject

  • "A"or'A'— A variable namedA

  • ["A","B"]or{'A','B'}— Two variables namedAandB

  • "Var"+digitsPattern(1)— Variables named"Var"followed by a single digit

Variable index:

  • An index number that refers to the location of a variable in the table

  • A vector of numbers

  • A logical vector. Typically, this vector is the same length as the number of variables, but you can omit trailing0orfalsevalues

  • 3— The third variable from the table

  • [2 3]— The second and third variables from the table

  • [false false true]— The third variable

Function handle:

  • A function handle that takes a table variable as input and returns a logical scalar

  • @isnumeric— All the variables containing numeric values

Variable type:

  • Avartypesubscript that selects variables of a specified type

  • vartype("numeric")— All the variables containing numeric values

Example:grouptransform(T,groupvars,method,["Var1" "Var2" "Var4"])

Name-Value Arguments

Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN, whereNameis the argument name andValueis the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example:G = grouptransform(T,groupvars,groupbins,"zscore",IncludedEdge="right")

Before R2021a, use commas to separate each name and value, and encloseNamein quotes.

Example:G = grouptransform(T,groupvars,groupbins,"zscore","IncludedEdge","right")

Included bin edge, specified as either"left"or"right", indicating which end of the bin interval is inclusive.

This name-value argument can be specified only whengroupbinsis specified, and the value applies to all binning schemes for all grouping variables or vectors.

Replace values indicator, specified as one of these values:

  • trueor1— Replace nongrouping table variables or column vectors in the input data with table variables or column vectors containing transformed data.

  • falseor0— Append the input data with the table variables or column vectors containing transformed data.

Output Arguments

collapse all

Output table for table or timetable input data, returned as a table or timetable.Gcontains the transformed data for each group.

Output array for array input data, returned as a vector or matrix.Bcontains the transformed data in the place of the nongrouping vectors.

Grouping vectors for array input data, returned as a column vector or cell array of column vectors.BGcontains the unique grouping vector or binned grouping vector combinations that correspond to the rows inB.

Tips

  • When making many calls togrouptransform, consider converting grouping variables to typecategoricalorlogical在可能的情况下提高性能。对于example, if you have a string array grouping variable (such asHealthStatuswith elements"Poor","Fair","Good", and"Excellent"), you can convert it to a categorical variable using the commandcategorical(HealthStatus).

Extended Capabilities

Version History

Introduced in R2018b

expand all