主要内容

使用深度学习的字词文本生成

此示例显示如何培训深度学习LSTM网络以生成文本逐个字。

要为逐字文本生成训练深度学习网络,请将序列到序列的LSTM网络列入以一系列单词以预测下一个单词。要培训网络来预测下一个单词,请指定要在一次执行的输入序列的响应。

这个例子读取来自网站的文本。它读取并解析HTML代码以提取相关文本,然后使用一个定制的迷你批处理数据存储DemodentGenerationDataStore.将文档输入到网络中作为较小批次的序列数据。数据存储区将文档转换为数字字指数的序列。深度学习网络是包含单词嵌入层的LSTM网络。

迷你批处理数据存储是数据存储的实现,支持批量读取数据。金宝app您可以使用迷你批处理数据存储作为深度学习应用程序的培训,验证,测试和预测数据集来源。使用迷你批量数据存储读取内存数据或在阅读数据批次时执行特定的预处理操作。

您可以自定义小批量数据存储DemodureGenerationDatastoreore.m.通过自定义函数来对您的数据。有关展示如何创建自己的自定义迷你批处理数据存储的示例,请参阅开发自定义小批量数据存储

负荷训练数据

加载培训数据。阅读html代码刘易斯·卡罗尔的《爱丽丝梦游仙境》古登堡计划。

url =“https://www.gutenberg.org/files/11/11-h/11-h.htm”;代码= webread (url);

解析HTML代码

HTML代码包含内部相关文本< p >(段落)要素。通过解析HTML代码来提取相关文本htmltree.然后找到具有元素名称的所有元素“p”

树= htmlTree(代码);选择器=“p”;子树= FindElement(树,选择器);

从HTML子树中提取文本数据extracthtmltext.并查看前10段。

textData = extractHTMLText(子树);TextData(1:10)
ans =10×1字符串数组"" "" "" "" "" "" " 爱丽丝开始变得非常累了坐在她妹妹在银行和无关:一次或两次这本书她窥探到她的妹妹正在读,但它没有照片或对话,“什么是使用一本书,”爱丽丝想的没有图片或谈话吗?”“所以她正在考虑在自己的头脑(以及她可以,炎热的一天让她觉得非常疲倦和愚蠢),,做一只雏菊花环的乐趣是否值得的问题挑选雏菊,突然一只粉红眼睛的白兔跑靠近她。”“这并没有什么了不起的。爱丽丝并没有感到奇怪,甚至于听到兔子自言自语地说:“噢,亲爱的!噢,亲爱的!我要迟到了!(后来她想了想,觉得她当时应该感到奇怪,不过当时这一切都显得很自然);但当兔子把一块手表的背心口袋里,看了看,然后匆忙,爱丽丝开始她的脚,因为它划过她的心,她从未见过一只兔子与一个背心口袋里,或手表,和好奇心,燃烧后,她跑过田野,幸运的是,刚好看到它跳进了树篱下的一个大兔子洞里。“爱丽丝也跟着掉了下去,根本没想过怎么才能出来。”

删除空段落并查看前10段。

textData (textData = =)= [];TextData(1:10)
ans =10×1字符串数组“Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, ‘and what is the use of a book,’ thought Alice ‘without pictures or conversations?’ " "So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her. " "There was nothing so very remarkable in that; nor did Alice think it so very much out of the way to hear the Rabbit say to itself, ‘Oh dear! Oh dear! I shall be late!’ (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when the Rabbit actually took a watch out of its waistcoat-pocket, and looked at it, and then hurried on, Alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it, and burning with curiosity, she ran across the field after it, and fortunately was just in time to see it pop down a large rabbit-hole under the hedge. " "In another moment down went Alice after it, never once considering how in the world she was to get out again. " "The rabbit-hole went straight on like a tunnel for some way, and then dipped suddenly down, so suddenly that Alice had not a moment to think about stopping herself before she found herself falling down a very deep well. " "Either the well was very deep, or she fell very slowly, for she had plenty of time as she went down to look about her and to wonder what was going to happen next. First, she tried to look down and make out what she was coming to, but it was too dark to see anything; then she looked at the sides of the well, and noticed that they were filled with cupboards and book-shelves; here and there she saw maps and pictures hung upon pegs. She took down a jar from one of the shelves as she passed; it was labelled ‘ORANGE MARMALADE’, but to her great disappointment it was empty: she did not like to drop the jar for fear of killing somebody, so managed to put it into one of the cupboards as she fell past it. " "‘Well!’ thought Alice to herself, ‘after such a fall as this, I shall think nothing of tumbling down stairs! How brave they’ll all think me at home! Why, I wouldn’t say anything about it, even if I fell off the top of the house!’ (Which was very likely true.) " "Down, down, down. Would the fall never come to an end! ‘I wonder how many miles I’ve fallen by this time?’ she said aloud. ‘I must be getting somewhere near the centre of the earth. Let me see: that would be four thousand miles down, I think-’ (for, you see, Alice had learnt several things of this sort in her lessons in the schoolroom, and though this was not a very good opportunity for showing off her knowledge, as there was no one to listen to her, still it was good practice to say it over) ‘-yes, that’s about the right distance-but then I wonder what Latitude or Longitude I’ve got to?’ (Alice had no idea what Latitude was, or Longitude either, but thought they were nice grand words to say.) " "Presently she began again. ‘I wonder if I shall fall right through the earth! How funny it’ll seem to come out among the people that walk with their heads downward! The Antipathies, I think-’ (she was rather glad there was no one listening, this time, as it didn’t sound at all the right word) ‘-but I shall have to ask them what the name of the country is, you know. Please, Ma’am, is this New Zealand or Australia?’ (and she tried to curtsey as she spoke-fancy curtseying as you’re falling through the air! Do you think you could manage it?) ‘And what an ignorant little girl she’ll think me for asking! No, it’ll never do to ask: perhaps I shall see it written up somewhere.’ " "Down, down, down. There was nothing else to do, so Alice soon began talking again. ‘Dinah’ll miss me very much to-night, I should think!’ (Dinah was the cat.) ‘I hope they’ll remember her saucer of milk at tea-time. Dinah my dear! I wish you were down here with me! There are no mice in the air, I’m afraid, but you might catch a bat, and that’s very like a mouse, you know. But do cats eat bats, I wonder?’ And here Alice began to get rather sleepy, and went on saying to herself, in a dreamy sort of way, ‘Do cats eat bats? Do cats eat bats?’ and sometimes, ‘Do bats eat cats?’ for, you see, as she couldn’t answer either question, it didn’t much matter which way she put it. She felt that she was dozing off, and had just begun to dream that she was walking hand in hand with Dinah, and saying to her very earnestly, ‘Now, Dinah, tell me the truth: did you ever eat a bat?’ when suddenly, thump! thump! down she came upon a heap of sticks and dry leaves, and the fall was over. "

可视化单词云中的文本数据。

图wordcloud (textData);标题(“爱丽丝在仙境中的冒险”

准备培训数据

创建一个包含用于培训使用的数据的数据存储DemodentGenerationDataStore.。要创建数据存储,请首先保存自定义迷你批处理数据存储DemodureGenerationDatastoreore.m.的路径。对于预测器,该数据存储使用单词编码将文档转换为单词索引序列。每个文档的第一个单词索引对应于一个“start of text”标记。“start of text”标记由字符串给出“startOfText”。对于响应,数据存储返回由一个转移的单词的分类序列。

使用授权文本数据tokenizedDocument

文件= tokenizedDocument (textData);

使用标记化的文档创建文档生成数据存储。

ds = documentGenerationDatastore(文件);

为了减少添加到序列中的填充量,可以按序列长度对数据存储中的文档进行排序。

DS = SORT(DS);

创建和训练LSTM网络

定义LSTM网络架构。要将序列数据输入到网络中,需要包含一个序列输入层,并设置输入大小为1。接下来,包括一个维度为100的单词嵌入层和与单词编码相同的单词数量。接下来,包含一个LSTM层并指定隐藏的大小为100。最后,添加与类数相同大小的全连接层、softmax层和分类层。类数是指词汇表中的单词数加上“文本结束”类的额外类。

inputSize = 1;embeddingDimension = 100;numWords =元素个数(ds.Encoding.Vocabulary);numClasses = numWords + 1;layers = [sequenceInputLayer(inputSize) wordEmbeddingLayer(embeddingDimension,numWords) lstmLayer(100) dropoutLayer(0.2) fulllyconnectedlayer (numClasses) softmaxLayer classificationLayer];

指定培训选项。指定求解器'亚当'。300个时代的火车,学习速度0.01。将迷你批量大小设置为32.以保持按顺序长度排序的数据,请设置“洗牌”选择'绝不'。要监控培训进度,请设置'plots'选择'培训 - 进步'。要抑制详细输出,请设置“详细”

选择= trainingOptions ('亚当'......'maxepochs', 300,......“InitialLearnRate”, 0.01,......'minibatchsize',32,......“洗牌”'绝不'......'plots''培训 - 进步'......“详细”,错误的);

培训网络使用trainNetwork

net = trainnetwork(ds,图层,选项);

生成新文本

通过根据训练数据中的文本的第一个单词从概率分布中采样单词来生成文本的第一个单词。通过使用训练的LSTM网络生成剩余的单词来预测使用所生成的当前文本的下一个时间步骤。一逐一点一直将单词保持一,直到网络预测“文本结束”字。

要使用网络进行第一次预测,输入表示“start of text”标记的索引。方法查找索引word2ind使用文档数据存储使用单词编码的函数。

enc = ds.encoding;WordIndex = Word2ind(ENC,“startOfText”
WordIndex = 1

对于剩下的预测,根据网络的预测分数来示例下一个单词。预测得分表示下一个单词的概率分布。使用网络输出层的类名给出的词汇表中的单词。

词汇=字符串(net.Layers(结束). class);

逐字逐句地进行预测predictandanddatestate.。对于每个预测,输入前一词的索引。停止预测网络当网络预测到文本字的结尾或生成的文本长时间为500个字符时。对于大量数据,长序列或大型网络,GPU的预测通常比CPU上的预测更快地计算成计算。否则,对CPU的预测通常更快以计算。有关单时间步骤预测,请使用CPU。要使用CPU进行预测,请设置'executionenvironment'选择predictandanddatestate.'中央处理器'

生成图=;最大长度= 500;strlength (generatedText) <最大长度预测下一个单词的分数。[净,wordScores] = predictAndUpdateState (wordIndex净,'executionenvironment''中央处理器');%样本下一个单词。Newword = DataMple(词汇,1,'重量',单词cores);停止预测文本的结尾。如果newWord = =“Endoftext”打破结束将单词添加到生成的文本中。生成ext =生成的文本+" "+ newWord;找到下一个输入的单词索引。wordIndex = word2ind (enc, newWord);结束

生成过程在每个预测之间引入空白字符,这意味着一些标点字符出现时前后带有不必要的空格。通过删除适当的标点字符前后的空格来重建生成的文本。

删除出现在指定标点字符前的空格。

punctimycharacters = [“。”","“)”“:”“?”“啊!”];生成图=替换(生成的文本," "+ punctuationCharacters punctuationCharacters);

删除出现在指定标点字符后的空格。

punctimycharacters = [“(”];generatedText = replace(generatedText,标点字符+" "punctuationCharacters)
generatedText = " '当然,这是一个好的海龟!王后低声微弱地说。

若要生成多段文本,请使用重置静止

net = ResetState(网络);

也可以看看

||||(文本分析工具箱)|(文本分析工具箱)|(文本分析工具箱)|(文本分析工具箱)|(文本分析工具箱)|(文本分析工具箱)|(文本分析工具箱)

相关话题