主要内容

Histograms of Tall Arrays

This example shows how to use直方图直方图2to analyze and visualize data contained in a tall array.

Create Tall Table

使用AirlinesMall.CSVdata set. Treat'na'值为缺少数据,以便将它们替换为NaNvalues. Select a subset of the variables to work with. Convert the datastore into a tall table.

varnames = {'arrdelay','DepDelay',“年”,'Month'}; ds = tabularTextDatastore('airlinesmall.csv','TreatAsMissing','na',...'SelectedVariableNames',varnames);T =高(DS)
T = Mx4 tall table ArrDelay DepDelay Year Month ________ ________ ____ _____ 8 12 1987 10 8 1 1987 10 21 20 1987 10 13 12 1987 10 4 -1 1987 10 59 63 1987 10 3 -2 1987 10 11 -1 1987 10 : : : : : : : :

Plot Histogram of Arrival Delays

绘制直方图ArrDelay可变以检查到达延迟的频率分布。

h = histogram(T.ArrDelay);
使用本地MATLAB会话评估高高的表达: - 通过2:完成0.87秒-Pass 2 of 2:在2.2秒内完成的0.71秒评估完成
title(“航班到达延误,1987 - 2008”)xlabel(“到达延迟(分钟)”)ylabel('频率')

图包含一个轴对象。The axes object with title Flight arrival delays, 1987 - 2008 contains an object of type histogram.

The arrival delay is most frequently a small number near 0, so these values dominate the plot and make it difficult to see other details.

Adjust Bin Limits of Histogram

Restrict the histogram bin limits to plot only arrival delays between -50 and 150 minutes. After you create a histogram object from a tall array, you cannot change any properties that would require recomputing the bins, includingBinWidthBinLimits. Also, you cannot use莫尔宾斯或者fewerbins调整垃圾箱的数量。在这些情况下,使用直方图从高阵列中的原始数据重建直方图。

图直方图(T.arrdelay,“二手限制”,[-50,150])
使用本地MATLAB会话评估高高的表达: - 通过2:完成0.51秒 - 第2秒:完成在0.37秒的评估中,以1.3秒完成
title(“飞行到达延误在-50到150分钟之间,1987年至2008年')xlabel(“到达延迟(分钟)”)ylabel('频率')

图包含一个轴对象。The axes object with title Flight arrival delays between -50 and 150 minutes, 1987 - 2008 contains an object of type histogram.

From this plot, it appears that long delays might be more common than initially expected. To investigate further, find the probability of an arrival delay that is one hour or greater.

Probability of Delays One Hour or Greater

The original histogram returned an objecth其中包含bin值Valuesproperty and the bin edges in the薄荷property. You can use these properties to perform in-memory calculations.

Determine which bins contain arrival delays of one hour (60 minutes) or more. Remove the last bin edge from the logical index vector so that it is the same length as the vector of bin values.

idx = h.binedges> = 60;idx(end)= [];

利用idxto retrieve the value associated with each selected bin. Add the bin values together, divide by the total number of samples, and multiply by 100 to determine the overall probability of a delay greater than or equal to one hour. Since the total number of samples is computed from the original data set, usegatherto explicitly evaluate the calculation and return an in-memory scalar.

N = numel(T.ArrDelay); P = gather(sum(h.Values(idx))*100/N)
P = 4.4809

总体而言,到达一小时或更长时间到达的几率约为4.5%。

按月绘制延迟的双变量直方图

绘制到达延迟的双变量直方图,该延迟为60分钟或更长时间。该图研究了季节性如何影响到达延迟。

figure h2 = histogram2(T.Month,T.ArrDelay,[12 50],'ybinlimits',[60 1100],...'Normalization','可能性','FaceColor','flat');
Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 0.71 sec Evaluation completed in 0.87 sec Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 0.79 sec Evaluation completed in 0.86 sec
title('Probability of arrival delays 1 hour or greater (by month)')xlabel(“月(1-12)”)ylabel(“到达延迟(分钟)”) zlabel('可能性') xticks(1:12) view(-126,23)

图包含一个轴对象。带有标题到达概率的轴对象延迟了1小时或更大(按月)包含一个直方图的对象。

Delay Statistics by Month

利用the bivariate histogram object to calculate the probability of having an arrival delay one hour or greater in each month, and the mean arrival delay for each month. Put the results in a table with the variablePcontaining the probability information and the variableMeanByMonthcontaining the mean arrival delay.

monthNames = {'扬','feb','Mar','apr','May','Jun',...'Jul','Aug','Sep','oct','Nov','dec'}';g = FindGroup(t.month);m = splitapply(@(x)平均值(x,'omitnan'),t.arrdelay,g);delaybymonth =表(月名,sum(h2.values,2)*100,收集(m),,...'VariableNames',{'Month','P',``})
使用本地MATLAB会话评估高高的表达: - 通过2:Of 2:在0.41秒完成 - 第2秒:完成在2秒内完成的0.99秒评估完成
delayByMonth=12×3 tableMonth P MeanByMonth _______ ______ ___________ {'Jan'} 9.6497 8.5954 {'Feb'} 7.7058 7.3275 {'Mar'} 9.0543 7.5536 {'Apr'} 7.2504 6.0081 {'May'} 7.4256 5.2949 {'Jun'} 10.35 10.264 {'Jul'} 10.228 8.7797 {'Aug'} 8.5989 7.4522 {'Sep'} 5.4116 3.6308 {'Oct'} 6.042 4.6059 {'Nov'} 6.9002 5.2835 {'Dec'} 11.384 10.571

结果表明,12月假期的航班有11.4%的延迟时间超过一个小时,但平均延迟了10.5分钟。紧随其后的是六月和七月的夏季,大约有10%的机会被延迟一个小时或更长时间,平均延迟约为9或10分钟。

See Also

||

相关话题