Main Content

与自组织地图的聚类数据

聚类数据是神经网络的另一个绝佳应用。This process involves grouping data by similarity. For example, you might perform:

  • 通过根据他们的购买方式对人分组的市场细分

  • 通过将数据划分为相关子集来挖掘数据

  • 通过将基因与相关表达模式分组,生物信息学分析

假设您想根据花瓣长度,花瓣宽度,萼片长度和萼片宽度聚集花型。您有150个示例案例,您有这四个测量值。

As with function fitting and pattern recognition, there are two ways to solve this problem:

Defining a Problem

要定义聚类问题,只需安排Q输入向量以列为列作为输入矩阵中(请参阅“数据结构”for a detailed description of data formatting for static and time series data). For instance, you might want to cluster this set of 10 two-element vectors:

inputs = [7 0 6 2 6 5 6 1 0 1; 6 2 5 0 7 5 5 1 2 2]

The next section shows how to train a network using thenctoolGUI。

Using the Neural Network Clustering App

  1. Open the Neural Network Clustering app usingnctool

    nctool

  2. 点击Next。出现“选择数据窗口”。

  3. 点击加载示例数据集。The Clustering Data Set Chooser window appears.

  4. 在this window, select简单的簇,然后单击Import。您返回“选择数据”窗口。

  5. 点击Next要继续到网络尺寸窗口,如下图所示。

    For clustering problems, the自组织特征图(SOM)是最常用的网络,因为在训练网络后,有许多可视化工具可用于分析所得群集。该网络有一层,在网格中组织了神经元。(有关SOM的更多信息,请参阅“自组织特征地图”。) When creating the network, you specify the numbers of rows and columns in the grid. Here, the number of rows and columns is set to10。The total number of neurons is 100. You can change this number in another run if you want.

  6. 点击Next。The Train Network window appears.

  7. 点击Train

    The training runs for the maximum number of epochs, which is 200.

  8. 对于SOM训练,与每个神经元相关的重量矢量移动成为输入向量簇的中心。此外,在拓扑中相邻的神经元也应在输入空间中彼此移动,因此可以在网络拓扑的两个维度中可视化高维输入空间。在vestigate some of the visualization tools for the SOM. Under thePlotspane, clickSOM Sample Hits

    The default topology of the SOM is hexagonal. This figure shows the neuron locations in the topology, and indicates how many of the training data are associated with each of the neurons (cluster centers). The topology is a 10-by-10 grid, so there are 100 neurons. The maximum number of hits associated with any neuron is 31. Thus, there are 31 input vectors in that cluster.

  9. You can also visualize the SOM by displaying weight planes (also referred to ascomponent planes)。点击SOM Weight Planesin the Neural Network Clustering App.

    该图显示了输入矢量的每个元素的权重平面(在这种情况下为两个)。它们是将每个输入连接到每个神经元的权重的可视化。(较深的颜色代表更大的权重。)如果两个输入的连接模式非常相似,则可以假设输入高度相关。在这种情况下,输入1的连接与输入2的连接非常不同。

  10. 在神经网络聚类应用程序中,单击Next评估网络。

    此时,您可以根据新数据测试网络。

    如果您对网络对原始或新数据的性能不满意,则可以增加神经元的数量,或者获得更大的培训数据集。

  11. When you are satisfied with the network performance, clickNext

  12. Use this panel to generate a MATLAB function or Simulink diagram for simulating your neural network. You can use the generated code or diagram to better understand how your neural network computes outputs from inputs or deploy the network with MATLAB Compiler tools and other MATLAB and Simulink code generation tools.

  13. 使用此屏幕上的按钮保存结果。

    • You can click简单脚本orAdvanced Script创建MATLAB®可以用来从命令行复制所有前一步的代码。如果您想学习如何使用工具箱的命令行功能来自定义培训过程,则创建MATLAB代码可能会有所帮助。在使用命令行函数,您将更详细地研究生成的脚本。

    • You can also save the network asnet在工作区。您可以对其进行其他测试,也可以将其用于新输入。

  14. 生成脚本并保存结果后,请单击Finish

使用命令行函数

The easiest way to learn how to use the command-line functionality of the toolbox is to generate scripts from the GUIs, and then modify them to customize the network training. As an example, look at the simple script that was created in step 14 of the previous section.

%Solve a Clustering Problem with a Self-Organizing MapNCTool生成的%脚本%This script assumes these variables are defined:%simpleclusterInputs - input data.inputs = simpleclusterInputs;%Create a Self-Organizing Mapdimension1 = 10;dimension2 = 10;net = selforgmap([[dimension1 dimension2]);%Train the Network[net,tr] = train(net,输入);%测试网络outputs = net(inputs);%查看网络view(net)%Plots%Uncomment these lines to enable various plots.百分比,图(net)%, plotsomnc(净)百分比,图(net)%figure, plotsomplanes(net)%图,图(净,输入)%figure, plotsompos(net,inputs)

您可以保存脚本,然后将其从命令行运行以重现上一个GUI会话的结果。您还可以编辑脚本以自定义培训过程。在这种情况下,让我们遵循脚本中的每个步骤。

  1. The script assumes that the input vectors are already loaded into the workspace.To show thecommand-line operations, you can use a different data set than you used for the GUI operation. Use the flower data set as an example. The iris data set consists of 150 four-element input vectors.

    loadiris_datasetinputs = irisInputs;
  2. 创建一个网络。在此示例中,您使用自组织地图(SOM)。该网络有一层,在网格中组织了神经元。(有关更多信息,请参阅“自组织特征地图”。) When creating the network withselforgmap,您指定网格中的行和列数:

    dimension1 = 10;dimension2 = 10;net = selforgmap([[dimension1 dimension2]);
  3. Train the network. The SOM network uses the default batch SOM algorithm for training.

    [net,tr] = train(net,输入);
  4. During training, the training window opens and displays the training progress. To interrupt training at any point, clickStop Training

  5. 测试网络。在网络训练ed, you can use it to compute the network outputs.

    outputs = net(inputs);
  6. 查看网络图。

    view(net)

  7. 对于SOM训练,与每个神经元相关的重量矢量移动成为输入向量簇的中心。此外,在拓扑中相邻的神经元也应在输入空间中彼此移动,因此可以在网络拓扑的两个维度中可视化高维输入空间。默认SOMtopology is hexagonal; to view it, enter the following commands.

    figure, plotsomtop(net)

    在this figure, each of the hexagons represents a neuron. The grid is 10-by-10, so there are a total of 100 neurons in this network. There are four elements in each input vector, so the input space is four-dimensional. The weight vectors (cluster centers) fall within this space.

    Because this SOM has a two-dimensional topology, you can visualize in two dimensions the relationships among the four-dimensional cluster centers. One visualization tool for the SOM is theweight distance matrix(也称为U-matrix)。

  8. 要查看U-Matrix,请单击SOM邻居距离在训练窗口中。

    在this figure, the blue hexagons represent the neurons. The red lines connect neighboring neurons. The colors in the regions containing the red lines indicate the distances between neurons. The darker colors represent larger distances, and the lighter colors represent smaller distances. A band of dark segments crosses from the lower-center region to the upper-right region. The SOM network appears to have clustered the flowers into two distinct groups.

为了获得更多在命令行操作方面的经验,请尝试其中一些任务:

  • During training, open a plot window (such as the SOM weight position plot) and watch it animate.

  • Plot from the command line with functions such asplotsomhits,,,,plotsomnc,,,,plotsomnd,,,,plotsomplanes,,,,plotsompos,,,,and。(有关使用这些功能的更多信息,请参见其参考页。)

Also, see the advanced script for more options, when training from the command line.