removeEmptyDocuments

从令牌化的文档阵列，词袋模型或n-grams型号中删除空文档

所有的页面崩溃

Syntax

newDocuments = removeEmptyDocuments(documents)

newBag = removeEmptyDocuments(bag)

[___，idx] = emoveementyDocuments（___)

Description

example

newDocuments= removeEmptyDocuments(documents)removes documents which have no words fromdocuments.

example

newBag= removeEmptyDocuments(袋)removes documents which have no words or n-grams from the bag-of-words or bag-of-n-grams model袋.

example

[___,idx] = removeEmptyDocuments(___)also returns the indices of the removed documents.

Examples

collapse all

Remove Empty Documents from Array

打开实时脚本

Remove documents containing no words from an array of tokenized documents.

Create an array of tokenized documents which includes empty documents.

documents = tokenizedDocument(["an example of a short sentence""""a second short sentence"""])

documents = 4x1 tokenizedDocument: 6 tokens: an example of a short sentence 0 tokens: 4 tokens: a second short sentence 0 tokens:

Remove the empty documents.

newDocuments = removeEmptyDocuments(documents)

newDocuments = 2x1 tokenizedDocument: 6 tokens: an example of a short sentence 4 tokens: a second short sentence

Remove Empty Documents from Bag-of-Words Model

打开实时脚本

Remove documents containing no words from bag-of-words model.

从一系列令牌化文档中创建一个单词型模型。

documents = tokenizedDocument(["An example of a short sentence.""""A second short sentence."""]); bag = bagOfWords(documents)

袋= bagOfWords with properties: Counts: [4x9 double] Vocabulary: ["An" "example" "of" "a" "short" ... ] NumWords: 9 NumDocuments: 4

Remove the empty documents from the bag-of-words model.

newBag = removeEmptyDocuments(bag)

newBag = bagOfWords with properties: Counts: [2x9 double] Vocabulary: ["An" "example" "of" "a" "short" ... ] NumWords: 9 NumDocuments: 2

Remove Documents and Corresponding Labels

打开实时脚本

删除包含数组中不包含单词的文档，并使用删除文档的索引也删除相应的标签。

Create an array of tokenized documents which includes empty documents.

documents = tokenizedDocument(["an example of a short sentence""""a second short sentence"""])

documents = 4x1 tokenizedDocument: 6 tokens: an example of a short sentence 0 tokens: 4 tokens: a second short sentence 0 tokens:

Create a vector of labels.

labels = [“ T”;"F";"F";“ T”]

labels =4x1 string“ t”“ f”“ f”“ t”

Remove the empty documents and get the indices of the removed documents.

[newDocuments, idx] = removeEmptyDocuments(documents)

newDocuments = 2x1 tokenizedDocument: 6 tokens: an example of a short sentence 4 tokens: a second short sentence

idx =2×12 4

Remove the corresponding labels fromlabels.

标签（idx）= []

labels =2x1 string“ t”“ f”

Input Arguments

collapse all

`documents`—Input documents
`象征性文档`大批

Input documents, specified as a象征性文档大批.

`袋`—Input bag-of-words or bag-of-n-grams model
`袋OfWords`object|`袋OfNgrams`object

Input bag-of-words or bag-of-n-grams model, specified as a袋OfWords对象或一个袋OfNgrams目的。

Output Arguments

collapse all

`newDocuments`— Output documents
`象征性文档`大批

Output documents, returned as a象征性文档大批.

`newBag`— Output model
`袋OfWords`object |`袋OfNgrams`object

Output model, returned as a袋OfWords对象或一个袋OfNgrams目的。The type ofnewBagis the same as the type of袋.

`idx`— Indices of removed documents
vector of positive integers

Indices of removed documents, returned as a vector of positive integers.

Text Analytics Toolbox Documentation

金宝app

Getting Started with Text Analytics in MATLAB

Download now

removeEmptyDocuments

Syntax

Description

Examples

Remove Empty Documents from Array

Remove Empty Documents from Bag-of-Words Model

Remove Documents and Corresponding Labels

Input Arguments

`documents`—Input documents
`象征性文档`大批

`袋`—Input bag-of-words or bag-of-n-grams model
`袋OfWords`object|`袋OfNgrams`object

Output Arguments

`newDocuments`— Output documents
`象征性文档`大批

`newBag`— Output model
`袋OfWords`object |`袋OfNgrams`object

`idx`— Indices of removed documents
vector of positive integers

See Also

Topics

Text Analytics Toolbox Documentation

金宝app

Getting Started with Text Analytics in MATLAB

removeEmptyDocuments

Syntax

Description

Examples

Remove Empty Documents from Array

Remove Empty Documents from Bag-of-Words Model

Remove Documents and Corresponding Labels

Input Arguments

documents—Input documents象征性文档大批

袋—Input bag-of-words or bag-of-n-grams model袋OfWordsobject|袋OfNgramsobject

Output Arguments

newDocuments— Output documents象征性文档大批

newBag— Output model袋OfWordsobject |袋OfNgramsobject

idx— Indices of removed documentsvector of positive integers

See Also

Topics

Text Analytics Toolbox Documentation

金宝app

Getting Started with Text Analytics in MATLAB

`documents`—Input documents
`象征性文档`大批

`袋`—Input bag-of-words or bag-of-n-grams model
`袋OfWords`object|`袋OfNgrams`object

`newDocuments`— Output documents
`象征性文档`大批

`newBag`— Output model
`袋OfWords`object |`袋OfNgrams`object

`idx`— Indices of removed documents
vector of positive integers