Loren on the Art of MATLAB

Turn ideas into MATLAB

String Things

Working with text in MATLAB has evolved over time. Way back, text data was stored in double arrays with an internal flag to denote that it was meant to be text. We then transformed this representation so character arrays were their very own type. And I mentionedearlierthat we introduced astringdatatype to make working with text data more efficient and natural. Let me show you a little more.

Contents

How to Compare Text: the Olden Days

Early on in MATLAB, we used the functionstrcmpto compare strings. A big caveat for many people is thatstrcmpdoes not behave the same way as its C-language counterpart. We then added over time a few more comparison functions:

to allow case-insensitive matches and to constrain the match to at mostncharacters.

Let's do some comparisons now. First on cell arrays of strings...

cellChars = {'Mercury','Venus','Earth','Mars'}
cellChars = 1×4 cell array {'Mercury'} {'Venus'} {'Earth'} {'Mars'}
TF = strcmp('fred',cellChars)
TF = 1×4逻辑array 0 0 0 0
TF = strcmp('Venus',cellChars)
TF = 1×4逻辑array 0 1 0 0
TF = strncmp('Mars', cellChars, 2)
TF = 1×4逻辑array 0 0 0 1
TF = strncmp('Marvelous', cellChars, 2)
TF = 1×4逻辑array 0 0 0 1
TF = strncmp('Marvelous', cellChars, 4)
TF = 1×4逻辑array 0 0 0 0
TF = strcmpi('mars', cellChars)
TF = 1×4逻辑array 0 0 0 1
TF = strcmpi('mar', cellChars)
TF = 1×4逻辑array 0 0 0 0

More Modern, Not Identical Use

We also introducedcategoricalarrays for cases where limiting the set of string choices was appropriate. When usingcategoricalvariables, you may use==for comparisons.

catStr = categorical(cellChars)
catStr = 1×4 categorical array Mercury Venus Earth Mars
TF ='Mars'== catStr
TF = 1×4逻辑array 0 0 0 1

String Comparisons Circa 2020

And now forstringcomparisons.

str = string(cellChars)% or ["Mercury","Venus","Earth","Mars"]
str = 1×4 string array "Mercury" "Venus" "Earth" "Mars"

I can still use thestr*cmp*functions. But we are not restricted to them.

TF = strcmp ('Mars', str)
TF = 1×4逻辑array 0 0 0 1

We can now use==and related operators without worrying about indexing issues that might arise with character arrays.

TF = str ~="Mars"
TF = 1×4逻辑array 1 1 1 0

And most recently, we introduced the functionmatches.

TF = matches(str,"Earth")
TF = 1×4逻辑array 0 0 1 0

It's got some nice features that allow for handling string arrays very nifty. Like looking for planets with an orbit inside Earth.

TF = matches(str,["Mercury","Venus"])
TF = 1×4逻辑array 1 1 0 0

And I can, of course, ignore case, with code that, to me, appears less cryptic.

TF = matches(str,"earth","IgnoreCase",true)
TF = 1×4逻辑array 0 0 1 0

As is true in all of these cases, we can index into the original array with the logical output to extract the relevant item(s).

str(TF)
ans = "Earth"

My Advice: Err on the Side of Code Readability

我还没touched on performance here, but one of the drivers for the recentstringdatatype is efficiency and performance. We've worked hard to overlay that with functions that make your code highly readable. This makes code maintenance and code transfer go much more smoothly. I tend to favor this over eking out the last fractional second of speed. In the case of strings, you may not even need to make that tradeoff.

String Adoption

Have you seen enough evidence that string are the future for working with textual data in MATLAB? Tell us what you thinkhere.




Published with MATLAB® R2019b

|
  • print
  • send email

Comments

To leave a comment, please clickhereto sign in to your MathWorks Account or create a new one.