splitlabels

Find indices to split labels according to specified proportions

Syntax

idxs = splitlabels(lblsrc,p)

idxs = splitlabels(lblsrc,p,'randomized')

idxs = splitlabels(___,Name,Value)

Description

使用这个函数当你正在做一个食蟹猴e or deep learning classification problem and you want to split a dataset into training, testing, and validation sets that hold the same proportion of label values.

example

idxs= splitlabels(lblsrc,p)finds logical indices that split the labels inlblsrcbased on the proportions or number of labels specified inp.

example

idxs= splitlabels(lblsrc,p,'randomized')randomly assigns the specified proportion of label values to each index set inidxs.

example

idxs= splitlabels(___,Name,Value)specifies additional input arguments using name-value pairs. For example,'UnderlyingDatastoreIndex',3splits the labels only in the third underlying datastore of a combined datastore.

Examples

collapse all

Split Vowels

Open Live Script

Read William Shakespeare's sonnets with thefilereadfunction. Extract all the vowels from the text and convert them to lowercase.

sonnets = fileread("sonnets.txt"); vowels = lower(sonnets(regexp(sonnets,"[AEIOUaeiou]")))';

Count the number of instances of each vowel.

cnts = countlabels(vowels)

cnts=5×3 table实验室el Count Percent _____ _____ _______ a 4940 18.368 e 9028 33.569 i 4895 18.201 o 5710 21.232 u 2321 8.6302

Split the vowels into a training set containing 500 instances of each vowel, a validation set containing 300, and a testing set with the rest. All vowels are represented with equal weights in the first two sets but not in the third.

spltn = splitlabels(vowels,[500 300]);forkj = 1:length(spltn) cntsn{kj} = countlabels(vowels(spltn{kj}));endcntsn{:}

ans=5×3 table实验室el Count Percent _____ _____ _______ a 500 20 e 500 20 i 500 20 o 500 20 u 500 20

ans=5×3 table实验室el Count Percent _____ _____ _______ a 300 20 e 300 20 i 300 20 o 300 20 u 300 20

ans=5×3 table实验室el Count Percent _____ _____ _______ a 4140 18.083 e 8228 35.94 i 4095 17.887 o 4910 21.447 u 1521 6.6437

Split the vowels into a training set containing 50% of the instances, a validation set containing another 30%, and a testing set with the rest. All vowels are represented with the same weight across all three sets.

spltp = splitlabels(vowels,[0.5 0.3]);forkj = 1:length(spltp) cntsp{kj} = countlabels(vowels(spltp{kj}));endcntsp{:}

ans=5×3 table实验室el Count Percent _____ _____ _______ a 2470 18.367 e 4514 33.566 i 2448 18.203 o 2855 21.23 u 1161 8.6333

ans=5×3 table实验室el Count Percent _____ _____ _______ a 1482 18.371 e 2708 33.569 i 1468 18.198 o 1713 21.235 u 696 8.6277

ans=5×3 table实验室el Count Percent _____ _____ _______ a 988 18.368 e 1806 33.575 i 979 18.2 o 1142 21.231 u 464 8.6261

Split Vowels and Consonants

Open Live Script

Read William Shakespeare's sonnets with thefilereadfunction. Remove all nonalphabetic characters from the text and convert to lowercase.

sonnets = fileread("sonnets.txt"); letters = lower(sonnets(regexp(sonnets,"[A-z]")))';

Classify the letters as consonants or vowels and create a table with the results. Show the first few rows of the table.

type = repmat("consonant",size(letters)); type(regexp(letters',"[aeiou]")) ="vowel"; T = table(letters,type,'VariableNames',["Letter""Type"]); head(T)

ans=8×2 tableLetter Type ______ ___________ t "consonant" h "consonant" e "vowel" s "consonant" o "vowel" n "consonant" n "consonant" e "vowel"

Display the number of instances of each category.

cnt = countlabels(T,'TableVariable',"Type")

cnt=2×3 tableType Count Percent _________ _____ _______ consonant 46516 63.365 vowel 26894 36.635

Split the table into two sets, one containing 60% of the consonants and vowels and the other containing 40%. Display the number of instances of each category.

splt = splitlabels(T,0.6,'TableVariable',"Type"); sixty = countlabels(T(splt{1},:),'TableVariable',"Type")

sixty=2×3 tableType Count Percent _________ _____ _______ consonant 27910 63.366 vowel 16136 36.634

forty = countlabels(T(splt{2},:),'TableVariable',"Type")

forty=2×3 tableType Count Percent _________ _____ _______ consonant 18606 63.363 vowel 10758 36.637

Split the table into two sets, one containing 60% of each particular letter and the other containing 40%. Exclude the lettery, which sometimes acts as a consonant and sometimes as a vowel. Display the number of instances of each category.

splt = splitlabels(T,0.6,'Exclude',"y"); sixti = countlabels(T(splt{1},:),'TableVariable',"Type")

sixti=2×3 tableType Count Percent _________ _____ _______ consonant 26719 62.346 vowel 16137 37.654

forti = countlabels(T(splt{2},:),'TableVariable',"Type")

forti=2×3 tableType Count Percent _________ _____ _______ consonant 17813 62.349 vowel 10757 37.651

Split the table into two sets of the same size. Include only the letterseands. Randomize the sets.

halves = splitlabels(T,0.5,'randomized','Include',["e""s"]); cnt = countlabels(T(halves{1},:))

cnt=2×3 tableLetter Count Percent ______ _____ _______ e 4514 64.385 s 2497 35.615

Split Data in Datastore

Open Live Script

Create a dataset that consists of 100 Gaussian random numbers. Label 40 of the numbers asA, 30 asB, and 30 asC. Store the data in a combined datastore containing two datastores. The first datastore has the data and the second datastore contains the labels.

dsData = arrayDatastore(randn(100,1)); dsLabels = arrayDatastore([repmat("A",40,1); repmat("B",30,1); repmat("C",30,1)]); dsDataset = combine(dsData,dsLabels); cnt = countlabels(dsDataset,'UnderlyingDatastoreIndex',2)

cnt=3×3 table实验室el Count Percent _____ _____ _______ A 40 40 B 30 30 C 30 30

Split the data set into two sets, one containing 60% of the numbers and the other with the rest.

splitIndices = splitlabels(dsDataset,0.6,'UnderlyingDatastoreIndex',2); dsDataset1 = subset(dsDataset,splitIndices{1}); cnt1 = countlabels(dsDataset1,'UnderlyingDatastoreIndex',2)

cnt1=3×3 table实验室el Count Percent _____ _____ _______ A 24 40 B 18 30 C 18 30

dsDataset2 = subset(dsDataset,splitIndices{2}); cnt2 = countlabels(dsDataset2,'UnderlyingDatastoreIndex',2)

cnt2=3×3 table实验室el Count Percent _____ _____ _______ A 16 40 B 12 30 C 12 30

Input Arguments

collapse all

`lblsrc`—Input label source
categorical vector|string vector|logical vector|numeric vector|cell array|table|数据存储|`CombinedDatastore`object

Input label source, specified as one of these:

A categorical vector.
A string vector or a cell array of character vectors.
A numeric vector or a cell array of numeric scalars.
A logical vector or a cell array of logical scalars.
一个包含任何previ表变量ous data types.
A datastore whosereadallfunction returns any of the previous data types.
ACombinedDatastoreobject containing an underlying datastore whosereadallfunction returns any of the previous data types. In this case, you must specify the index of the underlying datastore that has the label values.

lblsrcmust contain labels that can be converted to a vector with a discrete set of categories.

Example:lblsrc = categorical(["B" "C" "A" "E" "B" "A" "A" "B" "C" "A"],["A" "B" "C" "D"])creates the label source as a ten-sample categorical vector with four categories:A,B,C, andD.

Example:lblsrc = [0 7 2 5 11 17 15 7 7 11]creates the label source as a ten-sample numeric vector.

`p`—Proportions or numbers of labels
integer scalar|scalar in (0, 1)|向量的整数|vector of fractions

Proportions or numbers of labels, specified as an integer scalar, a scalar in the range (0, 1), a vector of integers, or a vector of fractions.

Ifpis a scalar,splitlabelsfinds two splitting index sets and returns a two-element cell array inidxs.
- Ifpis an integer, the first element ofidxscontains a vector of indices pointing to the firstpvalues of each label category. The second element ofidxscontains indices pointing to the remaining values of each label category.
- Ifpis a value in the range (0, 1) andlblsrchasK_ielements in theith category, the first element ofidxscontains a vector of indices pointing to the firstp×K_ivalues of each label category. The second element ofidxscontains the indices of the remaining values of each label category.
Ifpis a vector withNelements of the formp₁,p₂, …,p_N,splitlabelsfindsN+ 1splitting index sets and returns an(N+ 1)-element cell array inidxs.
- Ifpis a vector of integers, the first element ofidxsis a vector of indices pointing to the firstp₁values of each label category, the next element ofidxscontains the nextp₂values of each label category, and so on. The last element inidxscontains the remaining indices of each label category.
- Ifpis a vector of fractions andlblsrchasK_ielements of theith category, the first element ofidxsis a vector of indices concatenating the firstp₁×K_ivalues of each category, the next element ofidxscontains the nextp₂×K_ivalues of each label category, and so on. The last element inidxscontains the remaining indices of each label category.

Note

Ifpcontains fractions, then the sum of its elements must not be greater than one.
Ifpcontains numbers of label values, then the sum of its elements must not be greater than the smallest number of labels available for any of the label categories.

Name-Value Arguments

Specify optional comma-separated pairs ofName,Valuearguments.Nameis the argument name andValueis the corresponding value.Namemust appear inside quotes. You can specify several name and value pair arguments in any order asName1,Value1,...,NameN,ValueN.

Example:'TableVariable',"AreaCode",'Exclude',["617" "508"]specifies that the function split labels based on telephone area code and exclude numbers from Boston and Natick.

`Include`—实验室els to include in index sets
vector of label categories|cell array of label categories

实验室els to include in the index sets, specified as a vector or cell array of label categories. The categories specified with this argument must be of the same type as the labels inlblsrc. Each category in the vector or cell array must match one of the label categories inlblsrc.

`Exclude`—实验室els to exclude from index sets
vector of label categories|cell array of label categories

实验室els to exclude from the index sets, specified as a vector or cell array of label categories. The categories specified with this argument must be of the same type as the labels inlblsrc. Each category in the vector or cell array must match one of the label categories inlblsrc.

`TableVariable`—Table variable to read
first table variable(default) |character vector|string scalar

Table variable to read, specified as a character vector or string scalar. If this argument is not specified, thensplitlabelsuses the first table variable.

`UnderlyingDatastoreIndex`—Underlying datastore index
integer scalar

Underlying datastore index, specified as an integer scalar. This argument applies whenlblsrcis aCombinedDatastoreobject.splitlabelscounts the labels in the datastore obtained using theUnderlyingDatastoresproperty oflblsrc.

Output Arguments

collapse all

`idxs`— Splitting indices
cell array

Splitting indices, returned as a cell array.

splitlabels

Syntax

Description

Examples

Split Vowels

Split Vowels and Consonants

Split Data in Datastore

Input Arguments

`lblsrc`—Input label source
categorical vector|string vector|logical vector|numeric vector|cell array|table|数据存储|`CombinedDatastore`object

`p`—Proportions or numbers of labels
integer scalar|scalar in (0, 1)|向量的整数|vector of fractions

Name-Value Arguments

`Include`—实验室els to include in index sets
vector of label categories|cell array of label categories

`Exclude`—实验室els to exclude from index sets
vector of label categories|cell array of label categories

`TableVariable`—Table variable to read
first table variable(default) |character vector|string scalar

`UnderlyingDatastoreIndex`—Underlying datastore index
integer scalar

Output Arguments

`idxs`— Splitting indices
cell array

See Also

Signal Processing Toolbox Documentation

金宝app

MATLAB による信号処理向けディープラーニング

splitlabels

Syntax

Description

Examples

Split Vowels

Split Vowels and Consonants

Split Data in Datastore

Input Arguments

lblsrc—Input label sourcecategorical vector|string vector|logical vector|numeric vector|cell array|table|数据存储|CombinedDatastoreobject

p—Proportions or numbers of labelsinteger scalar|scalar in (0, 1)|向量的整数|vector of fractions

Name-Value Arguments

Include—实验室els to include in index setsvector of label categories|cell array of label categories

Exclude—实验室els to exclude from index setsvector of label categories|cell array of label categories

TableVariable—Table variable to readfirst table variable(default) |character vector|string scalar

UnderlyingDatastoreIndex—Underlying datastore indexinteger scalar

Output Arguments

idxs— Splitting indicescell array

See Also

Signal Processing Toolbox Documentation

金宝app

MATLAB による 信号処理向けディープラーニング

`lblsrc`—Input label source
categorical vector|string vector|logical vector|numeric vector|cell array|table|数据存储|`CombinedDatastore`object

`p`—Proportions or numbers of labels
integer scalar|scalar in (0, 1)|向量的整数|vector of fractions

`Include`—实验室els to include in index sets
vector of label categories|cell array of label categories

`Exclude`—实验室els to exclude from index sets
vector of label categories|cell array of label categories

`TableVariable`—Table variable to read
first table variable(default) |character vector|string scalar

`UnderlyingDatastoreIndex`—Underlying datastore index
integer scalar

`idxs`— Splitting indices
cell array

MATLAB による信号処理向けディープラーニング