DocumentationHelp CenterDocumentation
从令牌化的文档阵列,词袋模型或n-grams型号中删除空文档
newDocuments = removeEmptyDocuments(documents)
newBag = removeEmptyDocuments(bag)
[___,idx] = emoveementyDocuments(___)
example
newDocuments= removeEmptyDocuments(documents)removes documents which have no words fromdocuments.
newDocuments= removeEmptyDocuments(documents)
newDocuments
documents
newBag= removeEmptyDocuments(袋)removes documents which have no words or n-grams from the bag-of-words or bag-of-n-grams model袋.
newBag= removeEmptyDocuments(袋)
newBag
袋
[___,idx] = removeEmptyDocuments(___)also returns the indices of the removed documents.
[___,idx] = removeEmptyDocuments(___)
idx
collapse all
Remove documents containing no words from an array of tokenized documents.
Create an array of tokenized documents which includes empty documents.
documents = tokenizedDocument(["an example of a short sentence""""a second short sentence"""])
documents = 4x1 tokenizedDocument: 6 tokens: an example of a short sentence 0 tokens: 4 tokens: a second short sentence 0 tokens:
Remove the empty documents.
newDocuments = 2x1 tokenizedDocument: 6 tokens: an example of a short sentence 4 tokens: a second short sentence
Remove documents containing no words from bag-of-words model.
从一系列令牌化文档中创建一个单词型模型。
documents = tokenizedDocument(["An example of a short sentence.""""A second short sentence."""]); bag = bagOfWords(documents)
袋= bagOfWords with properties: Counts: [4x9 double] Vocabulary: ["An" "example" "of" "a" "short" ... ] NumWords: 9 NumDocuments: 4
Remove the empty documents from the bag-of-words model.
newBag = bagOfWords with properties: Counts: [2x9 double] Vocabulary: ["An" "example" "of" "a" "short" ... ] NumWords: 9 NumDocuments: 2
删除包含数组中不包含单词的文档,并使用删除文档的索引也删除相应的标签。
Create a vector of labels.
labels = [“ T”;"F";"F";“ T”]
labels =4x1 string“ t”“ f”“ f”“ t”
Remove the empty documents and get the indices of the removed documents.
[newDocuments, idx] = removeEmptyDocuments(documents)
idx =2×12 4
Remove the corresponding labels fromlabels.
labels
标签(idx)= []
labels =2x1 string“ t”“ f”
象征性文档
Input documents, specified as a象征性文档大批.
袋OfWords
袋OfNgrams
Input bag-of-words or bag-of-n-grams model, specified as a袋OfWords对象或一个袋OfNgrams目的。
Output documents, returned as a象征性文档大批.
Output model, returned as a袋OfWords对象或一个袋OfNgrams目的。The type ofnewBagis the same as the type of袋.
Indices of removed documents, returned as a vector of positive integers.
袋OfWords|袋OfNgrams|addDocument|删除文档|象征性文档
addDocument
删除文档
You have a modified version of this example. Do you want to open this example with your edits?
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select:.
You can also select a web site from the following list:
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Contact your local office