Documentation

crosstab

Cross-tabulation

Syntax

tbl = crosstab(x1,x2)
tbl = crosstab(x1,...,xn)
[tbl,chi2,p] = crosstab(___)
[tbl,chi2,p,labels] = crosstab(___)

Description

example

tbl= crosstab(x1,x2)returns a cross-tabulation,tbl, of two vectors of the same length,x1andx2.

example

tbl= crosstab(x1,...,xn)returns a multi-dimensional cross-tabulation,tbl, of data for multiple input vectors,x1,x2, ...,xn.

example

[tbl,chi2,p] = crosstab(___)also returns the chi-square statistic,chi2, and itsp-value,p, for a test thattblis independent in each dimension. You can use any of the previous syntaxes.

example

[tbl,chi2,p,labels] = crosstab(___)also returns a cell array,labels, which contains one column of labels for each input argument,x1...xn.

Examples

collapse all

Create two sample data vectors, containing three and four distinct values, respectively.

x = [1 1 2 3 1]; y = [1 2 5 3 1];

Cross-tabulatexandy.

table = crosstab(x,y)
table = 2 1 0 0 0 0 0 1 0 0 1 0

The rows intablecorrespond to the three distinct values inx, and the columns correspond to the four distinct values iny.

Generate two independent vectors,x1andx2, each containing 50 discrete uniform random numbers in the range1:3.

rngdefault;% for reproducibilityx1 = unidrnd(3,50,1); x2 = unidrnd(3,50,1);

Cross-tabulatex1andx2.

[table,chi2,p] = crosstab(x1,x2)
table = 1 6 7 5 5 2 11 7 6 chi2 = 7.5449 p = 0.1097

The returnedpvalue of0.1097indicates that, at the 5% significance level,crosstabfails to reject the null hypothesis thattableis independent in each dimension.

加载示例数据,contains measurements of large model cars during the years 1970-1982.

loadcarbig

Cross-tabulate the data of four-cylinder cars (cyl4) based on model year (when) and country of origin (org).

[table,chi2,p,labels] = crosstab(cyl4,when,org);

Uselabelsto determine the index location intablefor the number of four-cylinder cars made in the USA during the late period of the data.

labels
labels = 3×3 cell array 'Other' 'Early' 'USA' 'Four' 'Mid' 'Europe' [] 'Late' 'Japan'

The first column oflabelscorresponds to the data incyl4, and indicates that row2oftablecontains data on cars with four cylinders. The second column oflabelscorresponds to the data inwhen, and indicates that column3oftablecontains data on cars made during the late period. The third column oflabelscorresponds to the data inorg, and indicates that location1of the third dimension oftablecontains data on cars made in the USA.

Therefore,table(2,3,1)contains the number of four-cylinder cars made in the USA during the late period.

table(2,3,1)
ans = 38

The data contains 38 four-cylinder cars made in the USA during the late period.

Load the hospital data.

loadhospital

Thehospitaldataset array contains data on 100 hospital patients, including last name, gender, age, weight, smoking status, and systolic and diastolic blood pressure measurements.

To determine whether smoking status is independent of gender, usecrosstab创建一个2×2列联表的吸烟者d nonsmokers, grouped by gender.

[tbl,chi2,p,labels] = crosstab(hospital.Sex, hospital.Smoker)
tbl = 40 13 26 21 chi2 = 4.5083 p = 0.0337 labels = 2×2 cell array 'Female' '0' 'Male' '1'

The rows of the resulting contingency tabletblcorrespond to the patient's gender, with row 1 containing data for females and row 2 containing data for males. The columns correspond to the patient's smoking status, with column 1 containing data for nonsmokers and column 2 containing data for smokers. The returned resultchi2 = 4.5083is the value of the chi-squared test statistic for a Pearson's chi-squared test of independence. The returned valuep = 0.0337is an approximate$p$-value based on the chi-squared distribution.

Input Arguments

collapse all

Input vector, specified as a vector of grouping variables. All input vectors, includingx1,x2, ...,xn, must be the same length.

Data Types:single|double|char|logical

Input vector, specified as a vector of grouping variables. All input vectors, includingx1,x2, ...,xn, must be the same length.

Data Types:single|double|char|logical

Input vectors, specified as vectors of grouping variables. If you use this syntax to specify more than two input vectors, thencrosstabgenerates a multi-dimensional cross-tabulation table. All input vectors, includingx1,x2, ...,xn, must be the same length.

Data Types:single|double|char|logical

Output Arguments

collapse all

Cross-tabulation table, returned as a matrix of integer values.

If you specify two input vectors,x1andx2, thentblis anm-by-nmatrix, wheremis the number of distinct values inx1andnis the number of distinct values inx2.

If you specify three or more input vectors, thentbl(i,j,...,n)is a count of indices wheregrp2idx(x1)isi,grp2idx(x2)isj,grp2idx(x3)isk, and so on.

Chi-square statistic, returned as a positive scalar value. The null hypothesis is that the proportion in any entry oftblis the product of the proportions in each dimension.

p-value for the chi-square test statistic, returned as a scalar value in the range[0,1].crosstabtests thattblis independent in each dimension.

Data labels, returned as a cell array. The entries in the first column are labels for the rows oftbl, the entries in the second column are labels for the columns, and so on, for a multi-dimensionaltbl.

Algorithms

crosstabusesgrp2idxto assign a positive integer to each distinct value.tbl(i,j)is a count of indices wheregrp2idx(x1)isiandgrp2idx(x2)isj. The numerical order ofgrp2idx(x1)andgrp2idx(x2)order rows and columns oftbl, respectively.

In this case, the returned value oftbl(i,j,...,n)is a count of indices wheregrp2idx(x1)isi,grp2idx(x2)isj,grp2idx(x3)isk, and so on.

Extended Capabilities

Introduced before R2006a

Was this topic helpful?