Find indices to split labels according to specified proportions
使用这个函数当你正在做一个食蟹猴e or deep learning classification problem and you want to split a dataset into training, testing, and validation sets that hold the same proportion of label values.
specifies additional input arguments using name-value pairs. For example,idxs
= splitlabels(___,Name,Value
)'UnderlyingDatastoreIndex',3
splits the labels only in the third underlying datastore of a combined datastore.
Read William Shakespeare's sonnets with thefileread
function. Extract all the vowels from the text and convert them to lowercase.
sonnets = fileread("sonnets.txt"); vowels = lower(sonnets(regexp(sonnets,"[AEIOUaeiou]")))';
Count the number of instances of each vowel.
cnts = countlabels(vowels)
cnts=5×3 table实验室el Count Percent _____ _____ _______ a 4940 18.368 e 9028 33.569 i 4895 18.201 o 5710 21.232 u 2321 8.6302
Split the vowels into a training set containing 500 instances of each vowel, a validation set containing 300, and a testing set with the rest. All vowels are represented with equal weights in the first two sets but not in the third.
spltn = splitlabels(vowels,[500 300]);forkj = 1:length(spltn) cntsn{kj} = countlabels(vowels(spltn{kj}));endcntsn{:}
ans=5×3 table实验室el Count Percent _____ _____ _______ a 500 20 e 500 20 i 500 20 o 500 20 u 500 20
ans=5×3 table实验室el Count Percent _____ _____ _______ a 300 20 e 300 20 i 300 20 o 300 20 u 300 20
ans=5×3 table实验室el Count Percent _____ _____ _______ a 4140 18.083 e 8228 35.94 i 4095 17.887 o 4910 21.447 u 1521 6.6437
Split the vowels into a training set containing 50% of the instances, a validation set containing another 30%, and a testing set with the rest. All vowels are represented with the same weight across all three sets.
spltp = splitlabels(vowels,[0.5 0.3]);forkj = 1:length(spltp) cntsp{kj} = countlabels(vowels(spltp{kj}));endcntsp{:}
ans=5×3 table实验室el Count Percent _____ _____ _______ a 2470 18.367 e 4514 33.566 i 2448 18.203 o 2855 21.23 u 1161 8.6333
ans=5×3 table实验室el Count Percent _____ _____ _______ a 1482 18.371 e 2708 33.569 i 1468 18.198 o 1713 21.235 u 696 8.6277
ans=5×3 table实验室el Count Percent _____ _____ _______ a 988 18.368 e 1806 33.575 i 979 18.2 o 1142 21.231 u 464 8.6261
Read William Shakespeare's sonnets with thefileread
function. Remove all nonalphabetic characters from the text and convert to lowercase.
sonnets = fileread("sonnets.txt"); letters = lower(sonnets(regexp(sonnets,"[A-z]")))';
Classify the letters as consonants or vowels and create a table with the results. Show the first few rows of the table.
type = repmat("consonant",size(letters)); type(regexp(letters',"[aeiou]")) ="vowel"; T = table(letters,type,'VariableNames',["Letter""Type"]); head(T)
ans=8×2 tableLetter Type ______ ___________ t "consonant" h "consonant" e "vowel" s "consonant" o "vowel" n "consonant" n "consonant" e "vowel"
Display the number of instances of each category.
cnt = countlabels(T,'TableVariable',"Type")
cnt=2×3 tableType Count Percent _________ _____ _______ consonant 46516 63.365 vowel 26894 36.635
Split the table into two sets, one containing 60% of the consonants and vowels and the other containing 40%. Display the number of instances of each category.
splt = splitlabels(T,0.6,'TableVariable',"Type"); sixty = countlabels(T(splt{1},:),'TableVariable',"Type")
sixty=2×3 tableType Count Percent _________ _____ _______ consonant 27910 63.366 vowel 16136 36.634
forty = countlabels(T(splt{2},:),'TableVariable',"Type")
forty=2×3 tableType Count Percent _________ _____ _______ consonant 18606 63.363 vowel 10758 36.637
Split the table into two sets, one containing 60% of each particular letter and the other containing 40%. Exclude the lettery, which sometimes acts as a consonant and sometimes as a vowel. Display the number of instances of each category.
splt = splitlabels(T,0.6,'Exclude',"y"); sixti = countlabels(T(splt{1},:),'TableVariable',"Type")
sixti=2×3 tableType Count Percent _________ _____ _______ consonant 26719 62.346 vowel 16137 37.654
forti = countlabels(T(splt{2},:),'TableVariable',"Type")
forti=2×3 tableType Count Percent _________ _____ _______ consonant 17813 62.349 vowel 10757 37.651
Split the table into two sets of the same size. Include only the letterseands. Randomize the sets.
halves = splitlabels(T,0.5,'randomized','Include',["e""s"]); cnt = countlabels(T(halves{1},:))
cnt=2×3 tableLetter Count Percent ______ _____ _______ e 4514 64.385 s 2497 35.615
Create a dataset that consists of 100 Gaussian random numbers. Label 40 of the numbers asA
, 30 asB
, and 30 asC
. Store the data in a combined datastore containing two datastores. The first datastore has the data and the second datastore contains the labels.
dsData = arrayDatastore(randn(100,1)); dsLabels = arrayDatastore([repmat("A",40,1); repmat("B",30,1); repmat("C",30,1)]); dsDataset = combine(dsData,dsLabels); cnt = countlabels(dsDataset,'UnderlyingDatastoreIndex',2)
cnt=3×3 table实验室el Count Percent _____ _____ _______ A 40 40 B 30 30 C 30 30
Split the data set into two sets, one containing 60% of the numbers and the other with the rest.
splitIndices = splitlabels(dsDataset,0.6,'UnderlyingDatastoreIndex',2); dsDataset1 = subset(dsDataset,splitIndices{1}); cnt1 = countlabels(dsDataset1,'UnderlyingDatastoreIndex',2)
cnt1=3×3 table实验室el Count Percent _____ _____ _______ A 24 40 B 18 30 C 18 30
dsDataset2 = subset(dsDataset,splitIndices{2}); cnt2 = countlabels(dsDataset2,'UnderlyingDatastoreIndex',2)
cnt2=3×3 table实验室el Count Percent _____ _____ _______ A 16 40 B 12 30 C 12 30
lblsrc
—Input label sourceCombinedDatastore
objectInput label source, specified as one of these:
A categorical vector.
A string vector or a cell array of character vectors.
A numeric vector or a cell array of numeric scalars.
A logical vector or a cell array of logical scalars.
一个包含任何previ表变量ous data types.
A datastore whosereadall
function returns any of the previous data types.
ACombinedDatastore
object containing an underlying datastore whosereadall
function returns any of the previous data types. In this case, you must specify the index of the underlying datastore that has the label values.
lblsrc
must contain labels that can be converted to a vector with a discrete set of categories.
Example:lblsrc = categorical(["B" "C" "A" "E" "B" "A" "A" "B" "C" "A"],["A" "B" "C" "D"])
creates the label source as a ten-sample categorical vector with four categories:A
,B
,C
, andD
.
Example:lblsrc = [0 7 2 5 11 17 15 7 7 11]
creates the label source as a ten-sample numeric vector.
Data Types:single
|double
|int8
|int16
|int32
|int64
|uint8
|uint16
|uint32
|uint64
|logical
|char
|string
|table
|cell
|categorical
p
—Proportions or numbers of labelsProportions or numbers of labels, specified as an integer scalar, a scalar in the range (0, 1), a vector of integers, or a vector of fractions.
Ifp
is a scalar,splitlabels
finds two splitting index sets and returns a two-element cell array inidxs
.
Ifp
is an integer, the first element ofidxs
contains a vector of indices pointing to the firstp
values of each label category. The second element ofidxs
contains indices pointing to the remaining values of each label category.
Ifp
is a value in the range (0, 1) andlblsrc
hasKielements in theith category, the first element ofidxs
contains a vector of indices pointing to the firstp
×Kivalues of each label category. The second element ofidxs
contains the indices of the remaining values of each label category.
Ifp
is a vector withNelements of the formp1,p2, …,pN,splitlabels
findsN+ 1splitting index sets and returns an(N+ 1)-element cell array inidxs
.
Ifp
is a vector of integers, the first element ofidxs
is a vector of indices pointing to the firstp1values of each label category, the next element ofidxs
contains the nextp2values of each label category, and so on. The last element inidxs
contains the remaining indices of each label category.
Ifp
is a vector of fractions andlblsrc
hasKielements of theith category, the first element ofidxs
is a vector of indices concatenating the firstp1×Kivalues of each category, the next element ofidxs
contains the nextp2×Kivalues of each label category, and so on. The last element inidxs
contains the remaining indices of each label category.
Note
Ifp
contains fractions, then the sum of its elements must not be greater than one.
Ifp
contains numbers of label values, then the sum of its elements must not be greater than the smallest number of labels available for any of the label categories.
Data Types:single
|double
|int8
|int16
|int32
|int64
|uint8
|uint16
|uint32
|uint64
Specify optional comma-separated pairs ofName,Value
arguments.Name
is the argument name andValue
is the corresponding value.Name
must appear inside quotes. You can specify several name and value pair arguments in any order asName1,Value1,...,NameN,ValueN
.
'TableVariable',"AreaCode",'Exclude',["617" "508"]
specifies that the function split labels based on telephone area code and exclude numbers from Boston and Natick.
Include
—实验室els to include in index sets实验室els to include in the index sets, specified as a vector or cell array of label categories. The categories specified with this argument must be of the same type as the labels inlblsrc
. Each category in the vector or cell array must match one of the label categories inlblsrc
.
Exclude
—实验室els to exclude from index sets实验室els to exclude from the index sets, specified as a vector or cell array of label categories. The categories specified with this argument must be of the same type as the labels inlblsrc
. Each category in the vector or cell array must match one of the label categories inlblsrc
.
TableVariable
—Table variable to readTable variable to read, specified as a character vector or string scalar. If this argument is not specified, thensplitlabels
uses the first table variable.
UnderlyingDatastoreIndex
—Underlying datastore indexUnderlying datastore index, specified as an integer scalar. This argument applies whenlblsrc
is aCombinedDatastore
object.splitlabels
counts the labels in the datastore obtained using theUnderlyingDatastores
property oflblsrc
.
idxs
— Splitting indicesSplitting indices, returned as a cell array.
次の MATLAB コマンドに対応するリンクがクリックされました。
コマンドを MATLAB コマンド ウィンドウに入力して実行してください。Web ブラウザーは MATLAB コマンドをサポートしていません。
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select:.
Selectweb siteYou can also select a web site from the following list:
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.