## 1 Introduction

*et al.*, 2016; Benati

*et al.*, 2017; Truong

*et al.*, 2017; Pham

*et al.*, 2018; Motlagh

*et al.*, 2019; Borg and Boldt, 2016; Mokhtari and Salmasnia, 2015).

*k*-means (MacQueen, 1967; Mehdizadeh

*et al.*, 2017) or hierarchical clustering method (Johnson, 1967), each data object can only be partitioned into one cluster. While fuzzy c-means (FCM) (Bezdek

*et al.*, 1984; Zhao

*et al.*, 2013) introduced the concept of membership degree so that each object can belong to two or more clusters with a certain membership degree value. FCM is the extension of hard

*k*-means clustering, and the rich information conveyed by the membership degree and fuzzifier in FCM further expanded its application areas. FCM algorithm was first proposed by Dunn and generalized by Bezdek (Dunn, 1973; Bezdek, 1981), and it has become a popular and widely used fuzzy clustering method in pattern recognition (Ahmed

*et al.*, 2002; Dembélé and Kastner, 2003; Park, 2009; Hou

*et al.*, 2007).

*et al.*, 1986; Hall

*et al.*, 1992; Shen

*et al.*, 2001; Ozkan and Turksen, 2004, 2007; Wu, 2012). However, there is still not one generally accepted criterion and few theoretical guides for the selection of fuzzifier in FCM (Fadili

*et al.*, 2001). In many cases, users subjectively select the value of fuzzifier while using FCM clustering.

*et al.*, 2012). It has been demonstrated that clustering performance is always affected by data distributions (Xiong

*et al.*, 2009; Wu

*et al.*, 2009c). In our previous work (Zhou and Yang, 2016), we have also found that FCM has the uniform effect similar to

*k*-means clustering. The clustering results of FCM can be significantly influenced by the cluster size distributions. Therefore, to improve the performance of FCM for data sets with different cluster size distributions, it is important to select the appropriate value of fuzzifier. In this study, a new fuzzifier selection criterion and a corresponding algorithm called CSD-m algorithm are proposed from the perspective of cluster size distribution. The cluster size distribution mainly refers to the variation of cluster sizes. First, we use the coefficient of variance (CV) to measure the variation of data in cluster sizes. Then, the values of DCV, which indicate the change of variation in cluster sizes after FCM clustering, are calculated iteratively with different fuzzifier values within an initial search interval. Finally, according to the minimum absolute value of DCV, the optimal value of fuzzifier is determined. Our experiments on both synthetic data sets and real-world data sets illustrate the effectiveness of the proposed criterion and CSD-m algorithm. The experimental results also reveal that the widely used fuzzifier value $m=2$ is not optimal for many data sets, especially for data sets with large variation in cluster sizes.

*m*in FCM, is an important parameter which can significantly influence the performance of FCM clustering. Currently, there have been considerable studies on fuzzifier selection. Bezdek proposed a range interval of fuzzifier, $1.1\leqslant m\leqslant 5$, based on experience (Bezdek, 1981). Pal and Bezdek presented a heuristic criteria for the selection of optimal fuzzifier value, and the interval they suggested was $[1.5,2.5]$ (Pal and Bezdek, 1995). They also pointed out that the median, namely $m=2$, can be selected when there is no other specific constraints. Some studies (Cannon

*et al.*, 1986; Hall

*et al.*, 1992; Shen

*et al.*, 2001) presented the similar suggestion as the work of Pal and Bezdek (1995). In addition, Bezdek studied the physical interpretation of FCM when $m=2$ and pointed out that $m=2$ was the best selection (Bezdek, 1976). The study of Bezdek

*et al.*further demonstrated that the value of

*m*should be greater than $n/(n-2)$, where

*n*is the total number of sample objects (Bezdek

*et al.*, 1987). Based on their work of word recognition, Chan and Cheung suggested that the value range of

*m*should be $[1.25,1.75]$ (Chan and Cheung, 1992). However, Choe and Jordan pointed out that the performance of FCM is not sensitive to the value of

*m*based on the fuzzy decision theory (Choe and Jordan, 1992). Ozkan and Turksen presented an entropy assessment for

*m*considering the uncertainty contained (Ozkan and Turksen, 2004). To obtain the uncertainty generated by

*m*in FCM, Ozkan and Turksen also identified the upper and lower values of

*m*as 1.4 and 2.6, respectively, (Ozkan and Turksen, 2007). Wu proposed a new guideline for the selection of

*m*based on a robust analysis of FCM, and suggested implementing FCM with $m\in [1.5,4]$ (Wu, 2012).

*et al.*, 2004). In most practical applications, the value of fuzzifier is always subjectively selected by users, and $m=2$ is the most common selection (Pal and Bezdek, 1995; Cannon

*et al.*, 1986; Hall

*et al.*, 1992; Shen

*et al.*, 2001). Indeed, this selection may not be always the optimal, and inappropriate selection of fuzzifier value can significantly affect the clustering results of FCM. Additionally, few of the above researches have focused on the cluster size distribution while studying the related issue of fuzzifier selection. The characteristics of cluster size distribution may have an impact on the performance of FCM clustering. Fuzzifier is a key parameter that influences the clustering results of FCM. Furthermore, in some studies, only the range intervals of empirical reference values were presented without specific criterion and method for the selection of optimal fuzzifer value in practical applications. Therefore, the motivation of this study is to explore the influence and measure the influence extent of fuzzifier value on FCM clustering results, and further investigate the fuzzifier selection from a cluster size distribution perspective. The main contributions of this study are as follows. First, the mechanism that fuzzifier influences the FCM clustering result is revealed. Second, we point out that the widely used fuzzifier value $m=2$ is not optimal for many data sets with large variation in cluster sizes. Third, a criterion and a CSD-m algorithm for fuzzifier selection in FCM is presented from cluster size distribution perspective.

## 2 FCM Clustering

*et al.*, 1984; Bezdek, 1981) starts with determining the number of clusters followed by guessing the initial cluster centres. Then every sample point is assigned a membership degree for each cluster. Each cluster centre’s point and corresponding membership degrees are updated iteratively by minimizing the objective functions until the stopping criteria are met. The stopping criteria mainly include the iterations

*t*reach the maximum number ${t_{\max }}$, or the difference of the cluster centres between two consecutive iterations is within a small enough threshold

*ε*, i.e. $\| {v_{i,t}}-{v_{i,t-1}}\| \leqslant \varepsilon $. The objective function of FCM algorithm is defined as: where

*U*is the membership degree matrix.

*V*represents the cluster centre’s matrix.

*n*is the total number of data objects in the data set.

*c*is the number of clusters.

*m*is the fuzzifier. ${\mu _{ij}}$ is the membership degree of the

*j*th data object ${x_{j}}$ to the

*i*th cluster ${C_{i}}$. ${v_{i}}$ is the cluster centre of ${C_{i}}$. ${d_{ij}^{2}}$ is the squared Euclidean distance between ${x_{j}}$ and the cluster centre ${v_{i}}$, and ${d_{ij}^{2}}=\| {x_{j}}-{v_{i}}{\| ^{2}}$.

## 3 Fuzzifier Selection Method from Cluster Size Distribution Perspective

### 3.1 Measure of Cluster Size Distribution

*et al.*, 2009; Wu

*et al.*, 2009c).

##### Definition 1 *(Coefficient of Variance,* $\mathit{CV}$*).*

*c*is the number of clusters, ${n_{i}}$ is the number of objects in cluster ${C_{i}}$, $\bar{n}$ is the average size of all the clusters, and

*σ*is the standard deviation of the cluster size distribution.

##### Definition 2 *(DCV).*

*et al.*, 2009a, 2009b). From the perspective of cluster size distribution, a clustering partition which results in minor change of variation in cluster sizes (i.e. a smaller absolute value of $\mathit{DCV}$) refers to a steady state of clustering result. Based on this, we propose a criterion for fuzzifier selection in FCM from cluster size distribution perspective.

##### Criterion 1 *(Fuzzifier selection criterion from cluster size distribution perspective).*

### 3.2 CSD-m Algorithm for Fuzzifer Selection

*m*selection algorithm (CSD-m algorithm), as described in Algorithm 2.

## 4 Experimental Study

### 4.1 Experimental Setup

*m*value for each data set, and the average values are obtained as the final results.

##### Table 1

Dataset | No. of clusters | No. of dimensions | Cluster centre bounds | Std. of each cluster |

SD21000 | 2 | 2 | (2, 4); (4, 4) | 0.4; 0.3 |

SD20550 | 3 | 2 | (1, 1); (2, 3); (4, 2) | 0.4; 0.4; 0.4 |

SD21800 | 4 | 2 | (2, 2); (2, 7); (5, 2); (6, 7) | 0.7; 0.8; 0.4; 0.5 |

SD21950 | 5 | 2 | (2, 2); (2, 6); (6, 2); (6, 6); (4, 4) | 0.5; 0.4; 0.4; 0.4; 0.6 |

SD31500 | 2 | 3 | (2, 2, 2); (4, 4, 3) | 0.5; 0.5 |

SD32050 | 3 | 3 | (2, 2, 2); (4, 4, 3); (5, 3, 2) | 0.6; 0.4; 0.4 |

SD32800 | 4 | 3 | (2, 2, 2); (4, 4, 3); (5, 3, 2); (6, 6, 4) | 0.7; 0.4; 0.4; 0.7 |

SD34000 | 5 | 3 | (2, 2, 2); (4, 4, 3); (5, 3, 2); (6, 6, 4); (6, 7, 2) | 0.7; 0.4; 0.5; 0.6; 0.5 |

*abalone*data set is a real-world data set to predict the age of abalone from physical measurements. The

*balance-scale*data set contains information about balance scale weight and distance. The

*breast-cancer*data set includes the original Wisconsin breast cancer related information of 699 instances. The

*page-blocks*data set measures the blocks of the page layout of a document that has been detected by a segmentation process.

##### Table 2

Data sets | # Objects | # Features | # classes | MinSize | MaxSize | AvgSize | ${\mathit{CV}_{0}}$ | |

Synthetic data sets | SD21000 | 1000 | 2 | 2 | 100 | 900 | 500 | 1.131 |

SD20550 | 550 | 2 | 3 | 50 | 350 | 183 | 0.833 | |

SD21800 | 1800 | 2 | 4 | 200 | 950 | 450 | 0.754 | |

SD21950 | 1950 | 2 | 5 | 100 | 1200 | 390 | 1.176 | |

SD31500 | 1500 | 3 | 2 | 200 | 1300 | 750 | 1.037 | |

SD32050 | 2050 | 3 | 3 | 200 | 1500 | 683 | 1.041 | |

SD32800 | 2800 | 3 | 4 | 200 | 1500 | 700 | 0.849 | |

SD34000 | 4000 | 3 | 5 | 200 | 2000 | 800 | 0.923 | |

Real-world data sets | abalone | 4177 | 8 | 29 | 1 | 689 | 144 | 1.414 |

balance-scale | 625 | 4 | 3 | 49 | 288 | 208 | 0.662 | |

breast-cancer | 699 | 10 | 8 | 17 | 367 | 87 | 1.320 | |

pageblocks | 5473 | 10 | 5 | 28 | 4913 | 1095 | 1.953 |

### 4.2 Results and Discussion

##### Table 3

Data sets | ${\mathit{CV}_{0}}$ | ${\mathit{CV}_{1}}$ | ||||||||||

$m=1.2$ | $m=1.4$ | $m=1.6$ | $m=1.8$ | $m=2.0$ | $m=2.2$ | $m=2.4$ | $m=2.6$ | $m=2.8$ | $m=3.0$ | |||

Synthetic data sets | SD21000 | 1.131 | 1.095 | 1.081 | 1.064 | 1.027 | 1.001 | 0.950 | 0.857 | 0.713 | 0.619 | 0.580 |

SD20550 | 0.833 | 0.824 | 0.824 | 0.824 | 0.819 | 0.819 | 0.819 | 0.819 | 0.814 | 0.814 | 0.814 | |

SD21800 | 0.754 | 0.738 | 0.736 | 0.736 | 0.735 | 0.732 | 0.732 | 0.732 | 0.732 | 0.730 | 0.728 | |

SD21950 | 1.176 | 1.075 | 1.072 | 1.069 | 1.063 | 0.640 | 0.623 | 0.610 | 0.600 | 0.593 | 0.589 | |

SD31500 | 1.037 | 1.033 | 1.033 | 1.033 | 1.031 | 1.030 | 1.030 | 1.030 | 1.020 | 1.015 | 1.005 | |

SD32050 | 1.041 | 0.162 | 0.162 | 0.162 | 0.163 | 0.168 | 0.180 | 0.187 | 0.188 | 0.187 | 0.194 | |

SD32800 | 0.849 | 0.739 | 0.790 | 0.725 | 0.716 | 0.704 | 0.170 | 0.171 | 0.174 | 0.179 | 0.180 | |

SD34000 | 0.923 | 0.489 | 0.308 | 0.307 | 0.306 | 0.306 | 0.305 | 0.303 | 0.301 | 0.299 | 0.299 | |

Real-world data sets | abalone | 1.414 | 0.661 | 0.564 | 0.558 | 0.509 | 0.511 | 0.453 | 0.406 | 0.355 | 0.378 | 0.354 |

balance-scale | 0.662 | 0.183 | 0.083 | 0.030 | 0.023 | 0.145 | 0.316 | 0.211 | 0.294 | 0.227 | 0.287 | |

breast-cancer | 1.320 | 0.929 | 0.966 | 0.978 | 0.802 | 0.747 | 0.858 | 0.901 | 0.850 | 0.831 | 0.879 | |

pageblocks | 1.953 | 1.547 | 1.547 | 1.564 | 1.518 | 1.485 | 1.562 | 1.474 | 1.277 | 1.233 | 1.276 |

*m*and DCV values are not the simple linear relationship. Nevertheless, for most data sets which have large variation in clusters sizes, smaller fuzzifier values tend to produce better clustering results. Generally, small clusters tend to merge with parts of the large clusters with the increase of fuzzifier values, as illustrated in Fig. 2.

*m*, to measure the influence of fuzzifier parameter

*m*on FCM clustering results. The ICF indicator is defined as With the change of

*m*, if the change of ${\mathit{CV}_{1}}$ is large, then the value of $\mathit{ICF}$ indicator is large. It demonstrates that the influence of

*m*on FCM clustering is large. In contrast, within the similar threshold of

*m*, a smaller $\Delta {\mathit{CV}_{1}}$ value indicates the influence of

*m*on FCM clustering is relatively small.

*m*values from 1.2 to 3.0, and then the $\mathit{ICF}$ values on the 12 experimental data sets can be obtained. To discover the different influences of fuzzifier value on different data sets, the relationship between $\mathit{ICF}$ values and ${\mathit{CV}_{0}}$ values are fitted as shown in Fig. 7.

*et al.*, 2000; Kersten, 1999). However, the focus of this study is the influence of fuzzifer values in FCM. Without modifying the FCM algorithm itself, the small clusters can be effectively identified with an appropriate fuzzifier value using our proposed CSD-m algorithm. Therefore, our method also contributes to the identification of noises and outliers when using traditional FCM clustering.