C# | KMeans聚类算法的实现,轻松将数据点分组成具有相似特征的簇
发布人:shili8
发布时间:2023-07-01 10:01
阅读次数:49
KMeans聚类算法是一种常用的无监督学习算法,用于将数据点分组成具有相似特征的簇。在本文中,我们将使用C#语言实现KMeans聚类算法,并提供一些代码示例和注释来帮助您理解算法的实现过程。
首先,我们需要定义一个数据点的类,用于存储数据点的特征和所属的簇。代码示例如下:
csharp
public class DataPoint
{
public double[] Features { get; set; }
public int Cluster { get; set; }
}
接下来,我们需要实现KMeans聚类算法的主要逻辑。代码示例如下:
csharp
public class KMeans
{
private int k; // 簇的数量
private List dataPoints; // 数据点集合
private List centroids; // 质心集合
public KMeans(int k List dataPoints)
{
this.k = k;
this.dataPoints = dataPoints;
this.centroids = new List();
}
public void Cluster()
{
// 初始化质心
InitializeCentroids();
bool converged = false;
while (!converged)
{
// 分配数据点到最近的质心
AssignDataPointsToCentroids();
// 更新质心的位置
converged = UpdateCentroids();
}
}
private void InitializeCentroids()
{
// 随机选择k个数据点作为初始质心
Random random = new Random();
for (int i = 0; i < k; i++)
{
int index = random.Next(dataPoints.Count);
centroids.Add(dataPoints[index]);
}
}
private void AssignDataPointsToCentroids()
{
foreach (DataPoint dataPoint in dataPoints)
{
double minDistance = double.MaxValue;
int minIndex = -1;
for (int i = 0; i < k; i++)
{
double distance = CalculateDistance(dataPoint.Features centroids[i].Features);
if (distance < minDistance)
{
minDistance = distance;
minIndex = i;
}
}
dataPoint.Cluster = minIndex;
}
}
private bool UpdateCentroids()
{
bool converged = true;
for (int i = 0; i < k; i++)
{
List clusterDataPoints = dataPoints.Where(dp => dp.Cluster == i).ToList();
if (clusterDataPoints.Count > 0)
{
double[] newCentroid = new double[dataPoints[0].Features.Length];
for (int j = 0; j < dataPoints[0].Features.Length; j++)
{
double sum = 0;
foreach (DataPoint dataPoint in clusterDataPoints)
{
sum += dataPoint.Features[j];
}
newCentroid[j] = sum / clusterDataPoints.Count;
}
if (!centroids[i].Features.SequenceEqual(newCentroid))
{
centroids[i].Features = newCentroid;
converged = false;
}
}
}
return converged;
}
private double CalculateDistance(double[] features1 double[] features2)
{
double sum = 0;
for (int i = 0; i < features1.Length; i++)
{
sum += Math.Pow(features1[i] - features2[i] 2);
}
return Math.Sqrt(sum);
}
}
现在,我们可以使用上述代码来进行数据点的聚类。代码示例如下:
csharp ListdataPoints = new List { new DataPoint { Features = new double[] { 1 2 } } new DataPoint { Features = new double[] { 2 1 } } new DataPoint { Features = new double[] { 5 6 } } new DataPoint { Features = new double[] { 6 5 } } new DataPoint { Features = new double[] { 10 12 } } new DataPoint { Features = new double[] { 12 10 } } }; KMeans kMeans = new KMeans(2 dataPoints); kMeans.Cluster(); foreach (DataPoint dataPoint in dataPoints) { Console.WriteLine($Data point: [{string.Join( dataPoint.Features)}] Cluster: {dataPoint.Cluster}); }
上述代码中,我们创建了一个包含6个数据点的列表,并使用KMeans聚类算法将数据点分为2个簇。最后,我们打印每个数据点的特征和所属的簇。
希望本文能够帮助您理解和实现KMeans聚类算法。请注意,上述代码示例仅为演示目的,可能需要根据实际需求进行适当的修改和优化。

