主要内容

Analyze Text Data Containing Emojis

此示例显示了如何分析包含表情符号的文本数据。

表情符号是在文本中出现内联的绘画符号。当在智能手机和平板电脑等移动设备上撰写文本时,人们会使用表情符号来保持短文并传达情感和感受。

You also can use emojis to analyze text data. For example, use them to identify relevant strings of text or to visualize the sentiment or emotion of the text.

在使用文本数据时,表情符号可能会不可预测。根据您的系统字体,您的系统可能无法正确显示一些表情符号。因此,如果表情符号未正确显示,则数据不一定会丢失。您的系统可能无法在当前字体中显示表情符号。

Composing Emojis

In most cases, you can read emojis from a file (for example, by usingextractFileText,extractHTMLText, or可读取) or by copying and pasting them directly into MATLAB®. Otherwise, you must compose the emoji using Unicode UTF16 code units.

Some emojis consist of multiple Unicode UTF16 code units. For example, the "smiling face with sunglasses" emoji ( with code point U+1F60E) is a single glyph but comprises two UTF16 code units"D83D""DE0E"。Create a string containing this emoji using the撰写function, and specify the two code units with the prefix"\x"

emoji = compose(“ \ xd83d \ xde0e”)
表情符号=“”

First get the Unicode UTF16 code units of an emoji. Usechar要获取表情符号的数字表示,然后使用DEC2HEX获取相应的十六进制值。

codeUnits = dec2hex(char(emoji))
codeUnits =2×4 char array'D83D' 'DE0E'

使用strjoinfunction with the empty delimiter""

FormatsPec = strjoin("\x"+ codeUnits,"")
formatSpec = "\xD83D\xDE0E"
emoji = compose(formatSpec)
表情符号=“”

Import Text Data

提取文件中的文本数据weekendUpdates.xlsxusing可读取。The fileweekendUpdates.xlsxcontains status updates containing the hashtags“#周末”"#vacation"

文件名="weekendUpdates.xlsx"; tbl = readtable(filename,'TextType','string');head(tbl)
ans =8×2 tableID TextData __ __________________________________________________________________________________ 1 "Happy anniversary! ❤ Next stop: Paris! ✈ #vacation" 2 "Haha, BBQ on the beach, engage smug mode!   ❤  #vacation" 3 "getting ready for Saturday night  #yum #weekend " 4 "Say it with me - I NEED A #VACATION!!! ☹" 5 " Chilling  at home for the first time in ages…This is the life!  #weekend" 6 "My last #weekend before the exam  ." 7 "can’t believe my #vacation is over  so unfair" 8 "Can’t wait for tennis this #weekend  "

从现场提取文本数据TextData和view the first few status updates.

textData = tbl.TextData; textData(1:5)
ans =5×1 string“周年快乐!❤下一站:巴黎!✈#vacation”“哈哈,海滩上的烧烤,参与自鸣得意的模式!##vacation”“为周六晚上做好准备#Vacation !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Visualize the text data in a word cloud.

图WordCloud(TextData);

Filter Text Data by Emoji

使用使用containsfunction. Find the indices of the documents containing the "smiling face with sunglasses" emoji ( with code U+1F60E). This emoji comprises the two Unicode UTF16 code units"D83D"和 ”de0e”

emoji = compose(“ \ xd83d \ xde0e”);idx = contains(textdata,emoji);textdatasunglasses = textdata(idx);Textdatasunglasses(1:5)
ans =5×1 string"Haha, BBQ on the beach, engage smug mode!   ❤  #vacation" "getting ready for Saturday night  #yum #weekend " " Chilling  at home for the first time in ages…This is the life!  #weekend" " Check the out-of-office crew, we are officially ON #VACATION!! " "Who needs a #vacation when the weather is this good ☀ "

在单词云中可视化提取的文本数据。

图WordCloud(TextDatasunglasses);

Extract and Visualize Emojis

Visualize all the emojis in text data using a word cloud.

Extract the emojis. First tokenize the text using象征性文档,然后查看前几个文档。

documents = tokenizedDocument(textData); documents(1:5)
ans = 5×1 tokenizedDocument: 11 tokens: Happy anniversary ! ❤ Next stop : Paris ! ✈ #vacation 16 tokens: Haha , BBQ on the beach , engage smug mode !   ❤  #vacation 9 tokens: getting ready for Saturday night  #yum #weekend  13 tokens: Say it with me - I NEED A #VACATION ! ! ! ☹ 19 tokens:  Chilling  at home for the first time in ages … This is the life !  #weekend

The象征性文档功能自动检测表情符号并分配令牌类型"emoji"。使用该文档的前几个令牌详细信息tokenDetailsfunction.

tdetails = tokendetails(文档);头(tdetails)
ans =8×5 tableToken DocumentNumber LineNumber Type Language _____________ ______________ __________ ___________ ________ "Happy" 1 1 letters en "anniversary" 1 1 letters en "!"1 1标点符号en“❤” 1 1 Emoji en“下一个”“ 1 1 Letters en”停止“ 1 1 Letters en”:“ 1 1标点en”“ Paris” 1 1 Letters en

Visualize the emojis in a word cloud by extracting the tokens with token type"emoji"并进入输入它们wordcloudfunction.

idx = tdetails.type =="emoji"; tokens = tdetails.Token(idx); figure wordcloud(tokens); title("Emojis")

See Also

||

相关话题