主要内容

高阵列的可视化

可视化大数据集要求以某种方式汇总,收集或采样数据以减少屏幕上绘制的点数。在某些情况下,诸如直方图馅饼bin the data to reduce the size, while other functions such asplot分散use a more complex approach that avoids plotting duplicate pixels on the screen. For problems where the pixel overlap is relevant to the analysis, thebinscatter功能还提供可视化密度模式的有效方法。

Visualizing tall arrays doesnot需要使用gather。MATLAB®immediately evaluates and displays visualizations of tall arrays. Currently, you can visualize tall arrays using the functions and methods in this table.

功能 Required Toolboxes Notes
plot

These functions plot in iterations, progressively adding to the plot as more data is read. During the updates, a progress indicator shows the proportion of data that has been plotted. Zooming and panning is supported during the updating process, before the plot is complete. To stop the update process, press the pause button in the progress indicator.

分散
binscatter
直方图
直方图2
馅饼

仅用于可视化分类数据。

binscatterplot.(Statistics and Machine Learning Toolbox) 统计和机器学习工具箱™

Figure contains a slider to control the brightness and color detail in the image. The slider adjusts the value of the伽玛图像校正参数。

ksdity(Statistics and Machine Learning Toolbox) 统计和机器学习工具箱

Produces a probability density estimate for the data, evaluated at 100 points for univariate data, or 900 points for bivariate data.

datasample(Statistics and Machine Learning Toolbox) 统计和机器学习工具箱

datasampleenables you to extract a subsample of a tall array in a statistically sound way compared to simple indexing. If the subset of data is small enough to fit in memory, then you can use plotting and fitting functions on the subset that do not directly support tall arrays.

Tall Array Plotting Examples

此示例显示了几种不同的方式,您可以可视化高阵列。

创建一个数据存储Airlinesmall.csv.data set, which contains rows of airline flight data. Select a subset of the table variables to work with and remove rows that contain missing values.

ds = tabularTextDatastore('airlinesmall.csv','TreatAsMissing','na');ds.selectedvariablenames = {“年”,'Month','arrdelay','DepDelay','起源','dest'}; T = tall(ds); T = rmmissing(T)
T = Mx6 tall table Year Month ArrDelay DepDelay Origin Dest ____ _____ ________ ________ _______ _______ 1987 10 8 12 {'LAX'} {'SJC'} 1987 10 8 1 {'SJC'} {'BUR'} 1987 10 21 20 {'SAN'} {'SMF'} 1987 10 13 12 {'BUR'} {'SJC'} 1987 10 4 -1 {'SMF'} {'LAX'} 1987 10 59 63 {'LAX'} {'SJC'} 1987 10 3 -2 {'SAN'} {'SFO'} 1987 10 11 -1 {'SEA'} {'LAX'} : : : : : : : : : : : :

航班饼图逐个月

Convert the numericMonthvariable into a categorical variable that reflects the name of the month. Then plot a pie chart showing how many flights are in the data for each month of the year.

T.Month = categorical(T.Month,1:12,{'1月','2月','Mar','APR','May','Jun','Jul','Aug','Sep','OCT','Nov','Dec'})
T = MX6高表年份Arrdelay Depdelay Origin __________________________________________ 1987 10月12日'san'} {'smf'} 1987年10月13日{'bur'} 1987年10月4日-1 {'smf'} {'lax'} 1987年10月59 63 {'lax'} {'sjc'} 1987年10月3日-2 {'SAN'} {'SFO'} 1987年10月11日-1 {'SEA'} {'LAX'} :::::::::::::::::::::::::
馅饼(T.Month)
使用本地MATLAB会话评估高表达: - 通过1的第1条:完成在1.3秒 - 通过2的第2条:1秒评估完成3.1秒

Histogram of Delays

Plot a histogram of the arrival delays for each flight in the data. Since the data has a long tail, limit the plotting area using theBinLimitsname-value pair.

直方图(T.Arrdelay,'binlimits',[-50 150])
使用本地MATLAB会话评估高表达: - 通过2:在2.3秒内完成的第1条 - 通过2:在0.86秒的评估中完成3.9秒

图包含轴对象。The axes object contains an object of type histogram.

散布延迟

Plot a scatter plot of arrival and departure delays. You can expect a strong correlation between these variables since flights that leave late are also likely to arrive late.

When operating on tall arrays, theplot,分散, 和binscatter函数在迭代中绘制数据,随着读取更多数据,逐步添加到曲线。在更新期间,绘图的顶部有一个进度指示符,显示绘制了多少数据。在绘图完成之前,在更新期间支持缩放和平移。金宝app

散射(T.Arrdelay,T.Depdelay)Xlabel('抵达延迟')ylabel('Departure Delay')XLIM([ -  140 1000])ylim([ -  140 1000])

图包含轴对象。The axes object contains an object of type scatter.

The progress bar also includes aPause/Resume按钮。使用按钮以提前停止绘图更新一次。

适合趋势线

Use thepolyfit多尔functions to overlay a linear trend line on the plot of arrival and departure delays.

holdp = polyfit(t.arrdelay,t.depdelay,1);x =排序(t.Arrdelay,1);yp = polyval(p,x);绘图(x,yp,'r-') 抓住离开

图包含轴对象。The axes object contains 2 objects of type scatter, line.

Visualize Density

点的散点图是有帮助的到某个点,但如果点广泛重叠,则可以很难从绘图中解密信息。在这种情况下,它有助于可视化绘图中点的密度到现货趋势。

Use thebinscatter功能可视化到达和出发延迟情节中的点密度。

binscatter(T.ArrDelay,T.DepDelay,'xlimits',[-100 1000],'YLimits',[ -  100 1000])XLIM([ -  100 1000])ylim([ -  100 1000])Xlabel('抵达延迟')ylabel('Departure Delay')

图包含轴对象。轴对象包含Binscatter类型的对象。

调整clim轴的属性使得大于150的所有BIN值都是相同的。这可以防止几个垃圾箱占据绘图的非常大的值。

AX = GCA;ax.clim = [0 150];

图包含轴对象。轴对象包含Binscatter类型的对象。

See Also

||

相关话题