来自系列:机器学习简介
Seth DeLand, MathWorks
概述无监督的机器学习,它在数据集中寻找没有标记响应的数据集模式。当您想要探索您的数据但尚未拥有特定目标时,您会使用此技术,或者您不确定数据包含的信息。It’s also a good way to reduce the dimensionality of your data.
Most unsupervised learning techniques are a form of cluster analysis. Clustering algorithms fall into two broad groups:
此视频使用示例来说明硬群和软群算法,它显示为什么要使用无监督的机器学习,以减少数据集中的功能数量。
Unsupervised machine learning looks for patterns in datasets that don’t have labeled responses.
当您想要探索您的数据但尚未拥有特定目标时,您会使用此技术,或者您不确定数据包含的信息。
这也是减少数据维度的好方法。
As we’ve previously discussed, most unsupervised learning techniques are a form of cluster analysis, which separates data into groups based on shared characteristics.
聚类算法落入两组广泛的组:
For context, here’s a hard clustering example:
说你是一名工程师建设手机塔。您需要决定在哪里以及塔楼的位置。为了确保您提供最佳的信号接收,您需要在人群中找到塔。
要启动,您需要在群集数量的次数中初次猜测。为此,比较有三个塔楼和四座塔的场景,看看每个都能提供服务。
因为手机一次只能与一座塔通话,这是一个硬的聚类问题。
For this, you could use k-means clustering, because the k-means algorithm treats each observation in the data as an object having a location in space. It finds cluster centers, or means, that reduce the total distance from data points to their cluster centers.
所以,这是艰苦的聚类。让我们看看如何在现实世界中使用软聚类算法。
Pretend you’re a biologist analyzing the genes involved in normal and abnormal cell division. You have data from two tissue samples, and you want to compare them to determine whether certain patterns of gene features correlate to cancer.
因为相同的基因可以参与若干生物学过程,所以没有单个基因仅可能属于一种簇。
Apply a fuzzy c-means algorithm to the data, and then visualize the clusters to see which groups of genes behave in similar ways.
然后,您可以使用此模型来帮助了解与正常或异常单元分区相关的功能。
这涵盖了两个主要技术(硬群和软群),用于探索具有未标记响应的数据。
Remember though, that you can also use unsupervised machine learning to reduce the number of features, or the dimensionality, of your data.
您可以执行此操作,使您的数据更加复杂 - 特别是如果您正在使用具有数百或数千个变量的数据。通过降低数据的复杂性,您可以专注于重要的功能并获得更好的见解。
让我们来看看3个常见的维度减少算法:
In this video, we took a closer look at hard and soft clustering algorithms, and we also showed why you’d want to use unsupervised machine learning to reduce the number of features in your dataset.
As for your next steps:
Unsupervised learning might be your end goal. If you’re just looking to segment data, a clustering algorithm is an appropriate choice.
On the other hand, you might want to use unsupervised learning as a dimensionality reduction step for supervised learning. In our next video we’ll take a closer look at supervised learning.
现在,它包装了这个视频。不要忘记查看以下描述以获取更多资源和链接。