主要内容

tall

Create tall array

描述

example

t=高大(ds)creates a tall array on top of datastoreds.

  • 如果ds是表格数据的数据存储(所以readandreadallmethods of datastore return tables or timetables), thentis a tall table or tall timetable, depending on what the datastore is configured to return. Tabular data is data that is arranged in a rectangular fashion with each row having the same number of entries.

  • Otherwise,tis a tall cell array.

example

t=高大(A)转换内存阵列Ainto a tall array. The underlying data type oft是相同的class(A). This syntax is useful when you need to quickly create a tall array, such as for debugging or prototyping algorithms.

In R2019b and later, you can cast in-memory arrays into tall arrays for more efficient operations on the array. After you convert into a tall array, MATLAB®避免制作整个阵列的临时副本,并在较小的块中工作。这使您可以在数组上执行更广泛的操作,而不耗尽内存。

例子

全部收缩

Convert a datastore into a tall array.

First, create a datastore for the data set. You can specify either a full or relative file location for the data set usingdatastore(location)to create the datastore. Thelocationargument can specify:

  • A single file, such as'airlinesmall.csv'

  • 具有相同扩展的几个文件,例如'*.csv'

  • An entire folder of files, such as'C:\MyData'

tabularTextDatastorealso has several options to specify file and text format properties when you create the datastore.

创建一个数据存储airlinesmall.csvdata set. Treat'na'值作为缺失数据,以便替换它们NaNvalues. Select a small subset of the variables to work with.

varnames = {'arrdelay','DepDelay','起源','Dest'}; ds = tabularTextDatastore('airlinesmall.csv','TreatAsMissing','na',...'selectedvariablenames', varnames);

采用tall为数据存储区中的数据创建高阵列。自数据以来dsis tabular, the result is a tall table. If the data is not tabular, thentall而不是创建一个高级电池阵列。

T = tall(ds)
T = Mx4 tall table ArrDelay DepDelay Origin Dest ________ ________ _______ _______ 8 12 {'LAX'} {'SJC'} 8 1 {'SJC'} {'BUR'} 21 20 {'SAN'} {'SMF'} 13 12 {'BUR'} {'SJC'} 4 -1 {'SMF'} {'LAX'} 59 63 {'LAX'} {'SJC'} 3 -2 {'SAN'} {'SFO'} 11 -1 {'SEA'} {'LAX'} : : : : : : : :

您可以使用许多常见的MATLAB®操作员和功能与高阵列一起使用。要查看函数是否与高阵列一起工作,请检查Extended Capabilitiessection at the bottom of the function reference page.

将数据存储区转换为高表,使用延迟计算计算其大小,然后执行计算并返回内存中的结果。

First, create a datastore for theairlinesmall.csvdata set. Treat'na'值作为缺失数据,以便替换它们NaNvalues. Set the text format of a few columns so that they are read as a cell array of character vectors. Convert the datastore into a tall table.

ds = tabularTextDatastore('airlinesmall.csv','TreatAsMissing','na'); ds.SelectedFormats{strcmp(ds.SelectedVariableNames,'TailNum')} ='%s'; ds.SelectedFormats{strcmp(ds.SelectedVariableNames,'取消代码')} ='%s';
T = tall(ds)
T = Mx29高表年月DayofMonth DayOfWeek DepTime CRSDepTime ArrTime CRSArrTime UniqueCarrier FlightNum TailNum ActualElapsedTime CRSElapsedTime AirTime ArrDelay DepDelay Origin Dest Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted CarrierDelay WeatherDelay NASDelay SecurityDelay LateAircraftDelay ____ _____ __________ _________ _______ __________ _______ __________ _____________ _________ _______ _________________ ______________ _______ ________ ________ _______ _______ ________ ______ _______ _________ ________________ ________ ____________ ____________ ________ _____________ _________________ 1987 10 21 3 642 630 735 727 {'PS'} 1503 {'NA'} 53 57 NaN 8 12 {'LAX'} {'SJC'} 308 NaN NaN 0 {'NA'} 0 NaN NaN NaN NaN NaN 1987 10 26 1 1021 1020 1124 1116 {'PS'} 1550 {'NA'} 63 56 NaN 8 1 {'SJC'} {'BUR'} 296 NaN NaN 0 {'NA'} 0 NaN NaN NaN NaN NaN 1987 10 23 5 2055 2035 2218 2157 {'PS'} 1589 {'NA'} 83 82 NaN 21 20 {'SAN'} {'SMF'} 480 NaN NaN 0 {'NA'} 0 NaN NaN NaN NaN NaN 1987 10 23 5 1332 1320 1431 1418 {'PS'} 1655 {'NA'} 59 58 NaN 13 12 {'BUR'} {'SJC'} 296 NaN NaN 0 {'NA'} 0 NaN NaN NaN NaN NaN 1987 10 22 4 629 630 746 742 {'PS'} 1702 {'NA'} 77 72 NaN 4 -1 {'SMF'} {'LAX'} 373 NaN NaN 0 {'NA'} 0 NaN NaN NaN NaN NaN 1987 10 28 3 1446 1343 1547 1448 {'PS'} 1729 {'NA'} 61 65 NaN 59 63 {'LAX'} {'SJC'} 308 NaN NaN 0 {'NA'} 0 NaN NaN NaN NaN NaN 1987 10 8 4 928 930 1052 1049 {'PS'} 1763 {'NA'} 84 79 NaN 3 -2 {'SAN'} {'SFO'} 447 NaN NaN 0 {'NA'} 0 NaN NaN NaN NaN NaN 1987 10 10 6 859 900 1134 1123 {'PS'} 1800 {'NA'} 155 143 NaN 11 -1 {'SEA'} {'LAX'} 954 NaN NaN 0 {'NA'} 0 NaN NaN NaN NaN NaN : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

高表的显示表示MATLAB®尚不知道表中有多少行数据。

Calculate the size of the tall table. Since calculating the size of a tall array requires a full pass through the data, MATLAB does not immediately calculate the value. Instead, like most operations with tall arrays, the result is an unevaluated tall array whose values and size are currently unknown.

s =尺寸(t)
s = 1x2高大的双排矢量?还

采用thegatherfunction to perform the deferred calculation and return the result in memory. The result returned bysize是一个琐碎的1-by-2向量,它适合内存。

sz = gather(s)
Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 2.1 sec Evaluation completed in 2.5 sec
sz =1×2123523 29

如果你使用gather在未发出的高阵列上,那么结果可能不适合内存。如果您不确定结果是否返回gather可以适合内存,使用gather(head(X))orgather(tail(X))只需将一小部分计算结果转换为内存。

创建一个内存的随机数阵列,然后将其转换为高阵列。以这种方式从内存阵列中创建高阵列对于调试或原型设计新程序非常有用。内存阵列仍然受到正常内存约束的绑定,即使在转换为高大的数组之后,它也不能超过存储器的极限。

A = rand(100,4); tA = tall(A)
tA = 100x4 tall double matrix 0.8147 0.1622 0.6443 0.0596 0.9058 0.7943 0.3786 0.6820 0.1270 0.3112 0.8116 0.0424 0.9134 0.5285 0.5328 0.0714 0.6324 0.1656 0.3507 0.5216 0.0975 0.6020 0.9390 0.0967 0.2785 0.2630 0.8759 0.8181 0.5469 0.6541 0.5502 0.8175 : : : : : : : :

In R2019b and later releases, when you convert in-memory arrays into tall arrays, you can perform calculations on the array without requiring extra memory for temporary copies of the data. For example, this code normalizes the data in a large matrix and then calculates the sum of all the rows and columns. An in-memory version of this calculation needs to not only store the 5GB array but also have enough memory available to create temporary copies of the array.

N = 25000; tA = tall(rand(N)); tB = tA - mean(tA); S = gather(sum(tB, [1,2]))
Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 2: Completed in 6.7 sec - Pass 2 of 2: Completed in 16 sec Evaluation completed in 23 sec
S = -3.0786e-10

输入参数

全部收缩

输入数据存储,指定为数据存储对象。看Datastorefor more information on creating a datastore object for your data set.

高阵列仅使用确定性的数据存储。也就是说,如果你使用readon the datastore, reset the datastore with重置,然后再次读取数据存储,然后在两种情况下返回的数据必须相同。涉及不是确定性的数据存储的高阵列计算可以产生不可预测的结果。看选择文件格式或应用程序的数据存储了解更多信息。

例子:ds = tabularTextDatastore('airlinesmall.csv')指定单个文件。

例子:ds = tabulartextdataStore('*。csv')specifies a collection of.csv.files.

例子:ds = spreadsheetDatastore('C:\MyData')指定电子表格文件的文件夹。

例子:ds = datastore('hdfs:///data/')指定HDFS文件系统中的数据集。

In-memory variable, specified as an array.

Data Types:single|双倍的|int8|int16|INT32.|INT64.|uint8|uint16|uint32|uint64|logical|桌子|timetable|string|细胞|分类|约会时间|duration|日历
Complex Number Support:Yes

Output Arguments

全部收缩

高大的数组,作为这些类型之一返回:

  • When converting a datastore,tis a tall table or tall timetable for tabular datastores. Otherwise,tis a tall cell array.

  • When converting an in-memory array, the underlying data type oft是相同的class(A).

Deferred Evaluation of Tall Arraysfor information about how to effectively work with tall arrays.

提示

  • Extend Tall Arrays with Other Products有关如何使用高阵列的信息:

    • Statistics and Machine Learning Toolbox™

    • Parallel Computing Toolbox™

    • 马铃薯草Parallel Server™

    • Database Toolbox™

    • Matlab Compiler™

Extended Capabilities

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

Introduced in R2016b