transform

Transform documents into lower-dimensional space

在页面中崩溃

句法

dscores =变换（LSAMDL，文档）

dscores =变换（LSAMDL，袋子）

DSCORES=变换（LSAMDL，，，，计数）

DSCORES=变换（ldamdl，，，，文件）

DSCORES=变换（ldamdl，，，，包）

DSCORES=变换（ldamdl，，，，计数）

DSCORES=变换（___，，，，名称，价值）

描述

例子

DSCORES=变换（LSAMDL，，，，文件）将文档转换为潜在语义分析（LSA）模型的语义空间LSAMDL。

DSCORES=变换（LSAMDL，，，，包）转换由字袋或n-gram型号代表的文档包进入LSA模型的语义空间LSAMDL。

DSCORES=变换（LSAMDL，，，，计数）由单词计数代表的文档转换为LSA模型的语义空间LSAMDL。

例子

DSCORES=变换（ldamdl，，，，文件）transforms documents into the latent Dirichlet allocation (LDA) topic probability space of LDA modelldamdl。The rows ofDSCORES是文档的主题混合物表示。

DSCORES=变换（ldamdl，，，，包）转换由字袋或n-gram型号代表的文档包进入LDA模型的LDA主题概率空间ldamdl。

例子

DSCORES=变换（ldamdl，，，，计数）transforms documents represented by the matrix of word counts into the LDA topic probability space of LDA modelldamdl。

DSCORES=变换（___，，，，名称，价值）使用一个或多个名称值对参数指定其他选项。这些名称值对仅当输入模型是一个ldamodel目的。

例子

全部收缩

Transform Documents into LSA Semantic Space

Open Live Script

加载示例数据。文件sonnetspreprocessed.txt包含莎士比亚十四行诗的预处理版本。该文件包含每行十四行诗，单词被一个空间隔开。提取文本sonnetspreprocessed.txt，将文本分成新线字符的文档，然后将文档归为文档。

文件名="sonnetsPreprocessed.txt";str = extractfiletext（filename）;textdata = split（str，newline）;documents = tokenizedDocument（textData）;

Create a bag-of-words model using小词。

bag = bagofwords（文档）

Bag =带有属性的Bagofword：计数：[154x3092 double]词汇：[“ fairest”“最”“生物”“ desire” ...] numwords：3092 numdocuments：154

适合具有20个组件的LSA型号。

numCompnents = 20; mdl = fitlsa(bag,numCompnents)

mdl = lsamodel具有属性：numComponents：20型重量：[2.7866e+03 515.5889 443.6428 316.4191 ...]文档cormentscorscores：[154x20 double cores：[154x20 double] wordscores：[3092x20 double] vocabulary：[3092x20 double] vocabulary：[] taperurgenth Expents：2

Usetransform将前10个文档转换为LSA模型的语义空间。

DSCORES=变换（mdl,documents(1:10))

DSCORES=10×205.6059 -1.8559 0.9286 -0.7086 -0.4652 -0.8340 0.6751 0.0611 0.2268 1.9320 -0.7289 -1.0864 0.7131 -0.0571 -0.3401 0.0940 -0.4406 1.7507 -1.1534 0.1785 7.3069 -2.3578 1.8359 -2.3442 -1.5776 -2.0310 0.7948 1.3411 -1.1700 1.8839 0.0883 0.4734 -1.1244 0.67951.3585 -0.0247 0.3627 -0.5414 -0.0272 -0.0114 7.1056 -2.3508 -2.8837 -1.0688 -0.3462 -0.6962 0.0334 -0.0472 0.4916 0.6496 -1.1959 -1.0171 -0.4020 1.2953 -0.4583 0.5984 -0.3890 1.1780 0.6413 0.6575 8.6292 -3.0471 -0.8512 -0.4356 -0.30550.4671 -1.4219 -0.8454 -0.8270 0.4122 2.2082 -1.1770 1.7775 -2.2344 -2.7813 1.4979 0.7486 -2.0593 0.6376 1.0721 1.0434 1.7490 0.8703 -2.2315 -1.1221 0.2848 -2.0522 -0.6975 1.7191 -0.2852 0.8879 0.9950 -0.5555 0.8842 -0.0360 1.0050 0.4158 0.5061 0.9602 0.4672 6.8358-2.0806 -3.3798 -1.0452 -0.2075 2.0970 -0.4477 0.2080 0.9532 1.6203 0.6653 0.0036 1.0825 0.6396 -0.2154 -0.0794 0.7108 1.8007 -4.0326 -0.3872 2.3847 0.3923 -0.4323 -1.5340 0.4023 -1.0396 -1.0326 0.3776 0.2101 -1.0944 -0.7513 -0.2894 0.4303 0.1864 0.4922 0.4844 0.5191 -0.2378 0.9528 0.4817 3.7925 -0.3941 -4.4610 -0.4930 0.4651 0.3404 -0.5493 0.1470 0.5065 0.2566 0.3394 -1.1529 -0.0391 -0.8800 -0.4712 0.9672 0.5457 -0.3639 -0.3085 0.5637 4.6522 0.7188 -1.1787 -0.8996 0.3360 0.4531 -0.1935 0.3328 -0.8640 -1.6679 -0.8056 -2.1993 0.1808 0.0163 -0.9520 -0.8982 0.6603 3.6451 1.2412 1.9621 8.8218 -0.8168 -2.5101 1.1197 -0.8673 -1.2336 0.0768 0.1943 -0.7629 -0.1222 0.3786 1.1611 0.2326 0.3415 -0.3327 -0.3792 1.7554 0.2526 -2.1574 -0.0193

Transform Documents into LDA Topic Mixtures

Open Live Script

要在此示例中重现结果，请集rng至'default'。

rng('default'）

文件名="sonnetsPreprocessed.txt";str = extractfiletext（filename）;textdata = split（str，newline）;documents = tokenizedDocument（textData）;

Create a bag-of-words model using小词。

bag = bagofwords（文档）

Bag =带有属性的Bagofword：计数：[154x3092 double]词汇：[“ fairest”“最”“生物”“ desire” ...] numwords：3092 numdocuments：154

适合具有五个主题的LDA型号。

numTopics = 5; mdl = fitlda(bag,numTopics)

初始主题分配以0.102958秒采样。============================================================================= |迭代|时间|相对|培训|主题|主题||| iteration | change in | perplexity | concentration | concentration | | | (seconds) | log(L) | | | iterations | ===================================================================================== | 0 | 0.00 | | 1.212e+03 | 1.250 | 0 | | 1 | 0.01 | 1.2300e-02 | 1.112e+03 | 1.250 | 0 | | 2 | 0.02 | 1.3254e-03 | 1.102e+03 | 1.250 | 0 | | 3 | 0.01 | 2.9402e-05 | 1.102e+03 | 1.250 | 0 | =====================================================================================

mdl =具有属性的LDAMODEL：数字：5 Word Concencentration：1 topicconcencentration：1.2500 polocustopicProbabilitions：[0.2000 0.2000 0.2000 0.2000 0.2000 0.2000 0.2000] documentTopicProbabilitions：[154x5双double] toop wordprobilities：[154x5 double] toop wordprobibility：[3092x5 double] double double double factorder ... creatorder ... creatorder ... creatorder ... creactrorder ...：'初始合适性'Fitinfo：[1x1 struct]

Usetransform将文档转换为主题概率的向量。您可以使用堆叠的条形图可视化这些混合物。查看前10个文档的主题混合物。

主题mixtures = transform（mdl，文档（1:10））;图barh（主题混合物，“堆积”）xlim（[0 1]）标题（“主题混合物”）xlabel(“主题概率”）ylabel(“文档”） 传奇（"Topic "+ string(1:numTopics),'Location'，，，，“东北”）

图包含一个轴对象。带有标题主题混合物的轴对象包含5个类型栏的对象。这些对象代表主题1，主题2，主题3，主题4，主题5。

将单词计数矩阵转换为LDA主题混合物

Open Live Script

加载示例数据。sonnetsCounts.matcontains a matrix of word counts and a corresponding vocabulary of preprocessed versions of Shakespeare's sonnets.

加载sonnetsCounts.matsize(counts)

ans =1×2154 3092

适合具有20个主题的LDA型号。要在此示例中重现结果，请集rng至'default'。

rng('default'）数量= 20;mdl = fitlda（计数，麻格）

初始主题分配以0.13535秒采样。============================================================================= |迭代|时间|相对|培训|主题|主题||| iteration | change in | perplexity | concentration | concentration | | | (seconds) | log(L) | | | iterations | ===================================================================================== | 0 | 0.03 | | 1.159e+03 | 5.000 | 0 | | 1 | 0.05 | 5.4884e-02 | 8.028e+02 | 5.000 | 0 | | 2 | 0.04 | 4.7400e-03 | 7.778e+02 | 5.000 | 0 | | 3 | 0.03 | 3.4597e-03 | 7.602e+02 | 5.000 | 0 | | 4 | 0.03 | 3.4662e-03 | 7.430e+02 | 5.000 | 0 | | 5 | 0.03 | 2.9259e-03 | 7.288e+02 | 5.000 | 0 | | 6 | 0.03 | 6.4180e-05 | 7.291e+02 | 5.000 | 0 | =====================================================================================

mdl =具有属性的LDAMODEL：数字：20 Word Concencentration：1 topicconconcencentration：5 polocustopicProbibilitions：[0.0500 0.0500 0.0500 0.0500 0.0500 0.0500 ...] documentTopicProbibilities：[154x20 double] toble wording wording wording wording wording wording wording wording wording wording wording wording wording wording wording wording wording wordingprobibilities：[3092x20 double] double vowsabulary。“” 4“” 5“ ...]主题订单：'初始合适性'fitinfo：[1x1 struct]

Usetransform将文档转换为主题概率的向量。

主题mixtures = transform（mdl，counts（1:10，:)）

至picMixtures =10×200.0167 0.0035 0.1645 0.0977 0.0433 0.0833 0.0987 0.0033 0.0299 0.0234 0.0033 0.0345 0.0235 0.0958 0.0667 0.0167 0.0300 0.0519 0.0833 0.0300 0.0711 0.0544 0.0116 0.0044 0.0033 0.0033 0.0431 0.0053 0.0145 0.0421 0.0971 0.0033 0.0040 0.1632 0.1784 0.0937 0.0683 0.0398 0.0954 0.0037 0.0293 0.0482 0.1078 0.0322 0.0036 0.0036 0.0464 0.0036 0.0064 0.0612 0.0036 0.0176 0.0036 0.0464 0.0906 0.1169 0.0888 0.1115 0.1180 0.0607 0.0055 0.0962 0.2403 0.0033 0.0296 0.1613 0.0164 0.0955 0.0163 0.0045 0.0172 0.0033 0.0415 0.0404 0.0342 0.0176 0.0417 0.0642 0.0033 0.0676 0.0341 0.0224 0.0341 0.0645 0.0948 0.0038 0.0189 0.1099 0.0187 0.0560 0.1045 0.0356 0.0668 0.1196 0.0038 0.0931 0.0493 0.0038 0.0038 0.0626 0.0445 0.0035 0.1167 0.0034 0.0446 0.0583 0.1268 0.0169 0.0034 0.1135 0.0034 0.0034 0.0047 0.0993 0.0909 0.0582 0.0308 0.0887 0.0856 0.0034 0.1720 0.0764 0.0090 0.0180 0.0325 0.1213 0.0036 0.0036 0.0505 0.0472 0.0348 0.0477 0.0039 0.0038 0.0122 0.0041 0.0036 0.1605 0.1487 0.0465 0.0043 0.0033 0.1248 0.0033 0.0299 0.0033 0.0690 0.1699 0.0695 0.0982 0.0033 0.0039 0.0620 0.0833 0.0040 0.0700 0.0033 0.1479 0.0033 0.0433 0.0412 0.0387 0.0555 0.0165 0.0166 0.0433 0.0033 0.0038 0.0048 0.0033 0.0473 0.0474 0.1290 0.1107 0.0089 0.0112 0.0167 0.1555 0.2423 0.0040 0.0362 0.0035 0.1117 0.0304 0.0034 0.1248 0.0439 0.0340 0.0168 0.0714 0.0034 0.0214 0.0056 0.0449 0.1438 0.0036 0.0290 0.1437 0.0980 0.0304

输入参数

全部收缩

`LSAMDL`-输入LSA模型
`lsaModel`目的

输入LSA模型，指定为lsaModel目的。

`ldamdl`-Input LDA model
`ldamodel`目的

输入LDA模型，指定为ldamodel目的。

`文件`-输入文档
`至kenizedDocument`array|单词字符串阵列|字符向量的单元格数

输入文档，指定为至kenizedDocument数组,字符串数组的话,或细胞的数组character vectors. If文件是一个至kenizedDocument，那么它必须是列向量。如果文件是一个string array or a cell array of character vectors, then it must be a row of the words of a single document.

小费

为了确保该函数不会丢弃有用的信息，您必须首先使用用于预处理训练模型的文档的相同步骤进行预处理。

`包`-输入模型
`小词`目的|`Bagofngrams`目的

输入袋或n-grams型号，指定为小词目的or aBagofngrams目的。如果包是一个Bagofngrams对象，然后该函数将每个n-gram视为一个单词。

`计数`-单词的频率计数
非负整数矩阵

单词的频率计数，指定为非负整数的矩阵。如果指定“文件”成为“行”，然后值计数(i,j)对应于次数j词汇的词出现在一世TH文件。否则，值计数(i,j)对应于次数一世词汇的词出现在jTH文件。

名称值参数

指定可选的逗号分隔对名称，价值arguments.姓名是参数名称和Value一世s the corresponding value.姓名必须出现在引号中。您可以按任何顺序指定几个名称和值对参数姓名1,Value1,...,NameN,ValueN。

Example:“迭代限制”，，，，200将迭代限制设置为200。

笔记

这些名称值对仅当输入模型是一个ldamodel目的。

`文件`-Orientation of documents
`“行”`（默认）|`'列'`

单词计数矩阵中文档的方向，指定为逗号分隔对“文件”以及以下一个：

“行”- 输入是单词计数的矩阵，其行与文档相对应。
'列'– Input is a transposed matrix of word counts with columns corresponding to documents.

This option only applies if you specify the input documents as a matrix of word counts.

笔记

如果you orient your word count matrix so that documents correspond to columns and specify“文档”，“列”，那么您可能会大大减少优化执行时间。

`迭代限制`-最大迭代次数
`100`（默认）|正整数

最大迭代次数，指定为逗号分隔对“迭代限制”and a positive integer.

Example:“迭代限制”，，，，200

`LogLikelihoodTolerance`-Relative tolerance on log-likelihood
`0.0001`（默认）|正标量

Relative tolerance on log-likelihood, specified as the comma-separated pair consisting of“ loglikelihoodhoodtolerance'and a positive scalar. The optimization terminates when this tolerance is reached.

Example:“ LoglikelihoodTolerance”，0.001

输出参数

全部收缩

`DSCORES`- 输出文档分数
矩阵

Output document scores, returned as a matrix of score vectors.

也可以看看

话题

在R2017b中引入

transform

句法

描述

例子

Transform Documents into LSA Semantic Space

Transform Documents into LDA Topic Mixtures

将单词计数矩阵转换为LDA主题混合物

输入参数

`LSAMDL`-输入LSA模型
`lsaModel`目的

`ldamdl`-Input LDA model
`ldamodel`目的

`文件`-输入文档
`至kenizedDocument`array|单词字符串阵列|字符向量的单元格数

`包`-输入模型
`小词`目的|`Bagofngrams`目的

`计数`-单词的频率计数
非负整数矩阵

名称值参数

`文件`-Orientation of documents
`“行”`（默认）|`'列'`

`迭代限制`-最大迭代次数
`100`（默认）|正整数

`LogLikelihoodTolerance`-Relative tolerance on log-likelihood
`0.0001`（默认）|正标量

输出参数

`DSCORES`- 输出文档分数
矩阵

也可以看看

话题

文本分析工具箱文档

金宝app

开始使用MATLAB中的文本分析

transform

句法

描述

例子

Transform Documents into LSA Semantic Space

Transform Documents into LDA Topic Mixtures

将单词计数矩阵转换为LDA主题混合物

输入参数

LSAMDL-输入LSA模型lsaModel目的

ldamdl-Input LDA modelldamodel目的

文件-输入文档至kenizedDocumentarray|单词字符串阵列|字符向量的单元格数

包-输入模型小词目的|Bagofngrams目的

计数-单词的频率计数非负整数矩阵

名称值参数

文件-Orientation of documents“行”（默认）|'列'

迭代限制-最大迭代次数100（默认）|正整数

LogLikelihoodTolerance-Relative tolerance on log-likelihood0.0001（默认）|正标量

输出参数

DSCORES- 输出文档分数矩阵

也可以看看

话题

文本分析工具箱文档

金宝app

开始使用MATLAB中的文本分析

`LSAMDL`-输入LSA模型
`lsaModel`目的

`ldamdl`-Input LDA model
`ldamodel`目的

`文件`-输入文档
`至kenizedDocument`array|单词字符串阵列|字符向量的单元格数

`包`-输入模型
`小词`目的|`Bagofngrams`目的

`计数`-单词的频率计数
非负整数矩阵

`文件`-Orientation of documents
`“行”`（默认）|`'列'`

`迭代限制`-最大迭代次数
`100`（默认）|正整数

`LogLikelihoodTolerance`-Relative tolerance on log-likelihood
`0.0001`（默认）|正标量

`DSCORES`- 输出文档分数
矩阵