Main Content

dummyvar

Create dummy variables

描述

example

D= dummyvar(group)returns a matrixDcontaining zeros and ones, whose columns are dummy variables for thegrouping variablesingroup. Each column ofgroupis a single grouping variable, with values indicating category levels. The rows ofgrouprepresent observations across all variables.

Examples

collapse all

Create a column vector of categorical data specifying color types.

Colors = {'Red';'Blue';'Green';'Red';'Green';'Blue'}; Colors = categorical(Colors);

Create dummy variables for each color type.

D = dummyvar(Colors)
D =6×30 0 1 1 0 0 0 1 0 0 0 1 0 1 0 1 0 0

The columns inDcorrespond to the levels inColors. For example, the first column ofdummyvarcorresponds to the first level,'Blue', inColors.

Display the category levels ofColors.

categories(Colors)
ans =3x1 cell{'Blue' } {'Green'} {'Red' }

Create a matrixgroupof data containing the effects of two machines and three operators on a process.

machine = [1 1 1 1 2 2 2 2]'; operator = [1 2 3 1 2 3 1 2]'; group = [machine operator]
group =8×21 1 1 2 1 3 1 1 2 2 2 3 2 1 2 2

Create dummy variables of the data ingroup.

D = dummyvar(group)
D =8×51 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 1 1 0 0 0 1 0 1 0

The first two columns ofDrepresent observations of machine 1 and machine 2, respectively. The remaining columns represent observations of the three operators.

Create a cell array of phone types and a numeric vector of area codes.

phone = {'mobile';'landline';'mobile';'mobile';'mobile';'landline';'landline'}; codes = [802 802 603 603 802 603 802]';

Because the area code data has two levels (rather than 802 levels corresponding to the integers1:802), convertcodesto a categorical vector.

newcodes = categorical(codes);

Combine thephoneandnewcodesgrouping variables into the cell arraygroup.

group = {phone,newcodes};

Create dummy variables for the groups ingroup.

D = dummyvar(group)
D =7×41 0 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 0 0 1 0 1 1 0 0 1 0 1

The first two columns ofDcorrespond to the phone types, and the last two columns correspond to the area codes.

Create dummy variables, and then decode them back into the original data.

Create a column vector of categorical data specifying color types.

colorsOriginal = ["red";"blue";"red";"green";"yellow";"blue"]; colorsOriginal = categorical(colorsOriginal)
colorsOriginal =6x1 categoricalred blue red green yellow blue

Determine the classes in the categorical vector.

classes = categories(colorsOriginal);

Create dummy variables for each color type by using thedummyvarfunction.

dummyColors = dummyvar(colorsOriginal)
dummyColors =6×40 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0

Decode the dummy variables in the second dimension by using theonehotdecodefunction.

colorsDecoded = onehotdecode(dummyColors,classes,2)
colorsDecoded =6x1 categoricalred blue red green yellow blue

The decoded variables match the original color types.

Input Arguments

collapse all

Grouping variables, specified as a positive integer vector or categorical column vector representing levels within a single variable, a cell array containing one or moregrouping variables, or a positive integer matrix representing levels within multiple variables.

Ifgroupis a categorical vector, then the groups and their order match the output of thecategoriesfunction applied togroup. Ifgroupis a numeric vector, thendummyvarassumes that the groups and their order are1:max(group). In this respect,dummyvartreats a numeric grouping variable differently fromgrp2idx. For information on the order of groups within grouping variables, seeGrouping Variables.

Example:[2 1 1 1 2 3 3 2]'

Example:{Origin,Cylinders}

Data Types:single|double|categorical|cell

Output Arguments

collapse all

Dummy variables, returned as ann-by-snumeric matrix, wherenis the number of rows ofgroupandsis the sum of the number of levels in each column ofgroup. From left to right, the columns ofDare dummy variables created from the first column ofgroup, followed by dummy variables created from the second column ofgroup, and so on.

Data Types:single|double

Tips

  • Use dummy variables in regression analysis and ANOVA to indicate values of categorical predictors.

  • dummyvartreatsNaNvalues and undefined categorical levels ingroupas missing data and returnsNaNvalues inD.

  • If a column of ones is introduced in the matrixD, then the resulting matrixX = [ones(size(D,1),1) D]is rank deficient. Ifgrouphas multiple columns, then the matrixDitself is rank deficient because dummy variables produced from any column ofgroupalways sum to a column of ones. Regression and ANOVA calculations often address this issue by eliminating one dummy variable (implicitly setting the coefficients for dropped columns to zero) from each group of dummy variables produced by a column ofgroup.

  • Ifgroupis a numeric vector with levels that do not correspond exactly to the integers1:max(group), first convert the data to a categorical vector by usingcategorical. You can then pass the result todummyvar. For an example, seeCreate Dummy Variables from Multiple Grouping Variables.

Alternative Functionality

Alternatively, useonehotencodeto encode data labels. Consider usingonehotencodeinstead ofdummyvarin these cases:

  • To encode a table of categorical data labels

  • To specify the dimension to expand for encoding the data labels

Extended Capabilities

Version History

Introduced before R2006a