實驗報告材料 聚類分析報告
《實驗報告材料 聚類分析報告》由會員分享,可在線閱讀,更多相關《實驗報告材料 聚類分析報告(28頁珍藏版)》請在裝配圖網上搜索。
1、word 實驗報告 聚類分析 實驗原理:K均值聚類、中心點聚類、系統(tǒng)聚類和EM算法聚類分析技術。 實驗題目:用鳶尾花的數(shù)據(jù)集,進行聚類挖掘分析。 實驗要求:探索鳶尾花數(shù)據(jù)的基本特征,利用不同的聚類挖掘方法,獲得基本結論并簡明解釋。 實驗題目--分析報告:data(iris) > rm(list=ls()) > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 431730 23.1 929718 49.7 607591 32.5 Vcells 787605 6.1 8388608 6
2、4.0 1592403 12.2 > data(iris) > data<-iris > head(data) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 s
3、etosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa #Kmean聚類分析 > newiris <- iris > newiris$Species <- NULL > (kc <- kmeans(newiris, 3)) K-means clustering w
4、ith 3 clusters of sizes 62, 50, 38 Cluster means: Sepal.Length Sepal.Width Petal.Length Petal.Width 1 5.901613 2.748387 4.393548 1.433871 2 5.006000 3.428000 1.462000 0.246000 3 6.850000 3.073684 5.742105 2.071053 Clustering vector: [1] 2 2 2
5、2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [41] 2 2 2 2 2 2 2 2 2 2 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 [81] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 3 3 3 3 1 3 3 3 3 3 3 1 1 3 3 3 3 1 [121] 3 1 3 1 3 3 1 1 3 3 3 3 3 1 3 3 3 3 1 3 3 3 1
6、 3 3 3 1 3 3 1 Within cluster sum of squares by cluster: [1] 39.82097 15.15100 23.87947 (between_SS / total_SS = 88.4 %) Available ponents: [1] "cluster" "centers" "totss" "withinss" "tot.withinss" [6] "betweenss" "size" "iter" "ifault" >
7、table(iris$Species, kc$cluster) 1 2 3 setosa 0 50 0 versicolor 48 0 2 virginica 14 0 36 > plot(newiris[c("Sepal.Length", "Sepal.Width")], col = kc$cluster) > points(kc$centers[,c("Sepal.Length", "Sepal.Width")], col = 1:3, pch = 8, cex=2) #K-Mediods
8、 進行聚類分析 > install.packages("cluster") > library(cluster) > iris.pam<-pam(iris,3) > table(iris$Species,iris.pam$clustering) 1 2 3 setosa 50 0 0 versicolor 0 3 47 virginica 0 49 1 > layout(matrix(c(1,2),1,2)) > plot(iris.pam) > layout(matrix(1)) #hc
9、> iris.hc <- hclust( dist(iris[,1:4])) > plot( iris.hc, hang = -1) > plclust( iris.hc, labels = FALSE, hang = -1) > re <- rect.hclust(iris.hc, k = 3) > iris.id <- cutree(iris.hc, 3) #利用剪枝函數(shù)cutree()參數(shù)h控制輸出height=18時的系譜類別 > sapply(unique(iris.id), + function(g)iris$Species[iris.id==g])
10、 [[1]] [1] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa [12] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa [23] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa [34] setosa setosa setosa setosa setosa se
11、tosa setosa setosa setosa setosa setosa [45] setosa setosa setosa setosa setosa setosa Levels: setosa versicolor virginica [[2]] [1] versicolor versicolor versicolor versicolor versicolor versicolor versicolor [8] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
12、 [15] versicolor versicolor versicolor versicolor versicolor versicolor versicolor [22] versicolor versicolor virginica virginica virginica virginica virginica [29] virginica virginica virginica virginica virginica virginica virginica [36] virginica virginica virginica virginica v
13、irginica virginica virginica [43] virginica virginica virginica virginica virginica virginica virginica [50] virginica virginica virginica virginica virginica virginica virginica [57] virginica virginica virginica virginica virginica virginica virginica [64] virginica vi
14、rginica virginica virginica virginica virginica virginica [71] virginica virginica Levels: setosa versicolor virginica [[3]] [1] versicolor versicolor versicolor versicolor versicolor versicolor versicolor [8] versicolor versicolor versicolor versicolor versicolor versicolor versico
15、lor [15] versicolor versicolor versicolor versicolor versicolor versicolor versicolor [22] versicolor versicolor versicolor versicolor versicolor versicolor virginica Levels: setosa versicolor virginica > plot(iris.hc) > rect.hclust(iris.hc,k=4,border="light grey")#用淺灰色矩形框出4分類聚類結果 > rect.hclu
16、st(iris.hc,k=3,border="dark grey")#用淺灰色矩形框出3分類聚類結果 > rect.hclust(iris.hc,k=7,which=c(2,6),border="dark grey") # DBSCAN #基于密度的聚類 > install.packages("fpc") > library(fpc) > ds1=dbscan(iris[,1:4],eps=1,MinPts=5)#半徑參數(shù)為1,密度閾值為5 > ds1 dbscan Pts=150 MinPts=5 eps=1 1 2 border 0 1 s
17、eed 50 99 total 50 100 > ds2=dbscan(iris[,1:4],eps=4,MinPts=5) > ds3=dbscan(iris[,1:4],eps=4,MinPts=2) > ds4=dbscan(iris[,1:4],eps=8,MinPts=2) > par(mfcol=c(2,2)) > plot(ds1,iris[,1:4],main="1: MinPts=5 eps=1") > plot(ds3,iris[,1:4],main="3: MinPts=2 eps=4") > plot(ds2,iris[,1:4],main="2
18、: MinPts=5 eps=4") > plot(ds4,iris[,1:4],main="4: MinPts=2 eps=8") > d=dist(iris[,1:4])#計算數(shù)據(jù)集的距離矩陣d > max(d);min(d)#計算數(shù)據(jù)集樣本的距離的最值 [1] 7.085196 [1] 0 > install.packages("ggplot2") > library(ggplot2) > interval=cut_interval(d,30) > table(interval) interval [0,0.236] (0.236,0.472] (0.4
19、72,0.709] (0.709,0.945] (0.945,1.18] (1.18,1.42] 88 585 876 891 831 688 (1.42,1.65] (1.65,1.89] (1.89,2.13] (2.13,2.36] (2.36,2.6] (2.6,2.83] 543 369 379 339 335
20、 406 (2.83,3.07] (3.07,3.31] (3.31,3.54] (3.54,3.78] (3.78,4.01] (4.01,4.25] 458 459 465 480 468 505 (4.25,4.49] (4.49,4.72] (4.72,4.96] (4.96,5.2] (5.2,5.43] (5.43,5.67] 349 385
21、 321 291 187 (5.67,5.9] (5.9,6.14] (6.14,6.38] (6.38,6.61] (6.61,6.85] (6.85,7.09] 97 92 78 50 18 4 > which.max(table(interval)) (0.709,0.945] 4 > for(i in 3:
22、5) + { for(j in 1:10) + { ds=dbscan(iris[,1:4],eps=i,MinPts=j) + print(ds) + } + } dbscan Pts=150 MinPts=1 eps=3 1 seed 150 total 150 dbscan Pts=150 MinPts=2 eps=3 1 seed 150 total 150 dbscan Pts=150 MinPts=3 eps=3 1 seed 150 total 150 dbscan Pt
23、s=150 MinPts=4 eps=3 1 seed 150 total 150 dbscan Pts=150 MinPts=5 eps=3 1 seed 150 total 150 dbscan Pts=150 MinPts=6 eps=3 1 seed 150 total 150 dbscan Pts=150 MinPts=7 eps=3 1 seed 150 total 150 dbscan Pts=150 MinPts=8 eps=3 1 seed 150
24、total 150 dbscan Pts=150 MinPts=9 eps=3 1 seed 150 total 150 dbscan Pts=150 MinPts=10 eps=3 1 seed 150 total 150 dbscan Pts=150 MinPts=1 eps=4 1 seed 150 total 150 dbscan Pts=150 MinPts=2 eps=4 1 seed 150 total 150 dbscan Pts=150 MinPts=3 eps=4
25、 1 seed 150 total 150 dbscan Pts=150 MinPts=4 eps=4 1 seed 150 total 150 dbscan Pts=150 MinPts=5 eps=4 1 seed 150 total 150 dbscan Pts=150 MinPts=6 eps=4 1 seed 150 total 150 dbscan Pts=150 MinPts=7 eps=4 1 seed 150 total 150 dbscan Pts=1
26、50 MinPts=8 eps=4 1 seed 150 total 150 dbscan Pts=150 MinPts=9 eps=4 1 seed 150 total 150 dbscan Pts=150 MinPts=10 eps=4 1 seed 150 total 150 dbscan Pts=150 MinPts=1 eps=5 1 seed 150 total 150 dbscan Pts=150 MinPts=2 eps=5 1 seed 150 to
27、tal 150 dbscan Pts=150 MinPts=3 eps=5 1 seed 150 total 150 dbscan Pts=150 MinPts=4 eps=5 1 seed 150 total 150 dbscan Pts=150 MinPts=5 eps=5 1 seed 150 total 150 dbscan Pts=150 MinPts=6 eps=5 1 seed 150 total 150 dbscan Pts=150 MinPts=7 eps=5
28、 1 seed 150 total 150 dbscan Pts=150 MinPts=8 eps=5 1 seed 150 total 150 dbscan Pts=150 MinPts=9 eps=5 1 seed 150 total 150 dbscan Pts=150 MinPts=10 eps=5 1 seed 150 total 150 #30次dbscan的聚類結果 > ds5=dbscan(iris[,1:4],eps=3,MinPts=2) > ds6=dbscan(iris[,1
29、:4],eps=4,MinPts=5) > ds7=dbscan(iris[,1:4],eps=5,MinPts=9) > par(mfcol=c(1,3)) > plot(ds5,iris[,1:4],main="1: MinPts=2 eps=3") > plot(ds6,iris[,1:4],main="3: MinPts=5 eps=4") > plot(ds7,iris[,1:4],main="2: MinPts=9 eps=5") # EM 期望最大化聚類 > install.packages("mclust") > library(mclust) > fit
30、_EM=Mclust(iris[,1:4]) fitting ... |===========================================================================| 100% > summary(fit_EM) ---------------------------------------------------- Gaussian finite mixture model fitted by EM algorithm ------------------------------------------------
31、---- Mclust VEV (ellipsoidal, equal shape) model with 2 ponents: log.likelihood n df BIC ICL -215.726 150 26 -561.7285 -561.7289 Clustering table: 1 2 50 100 > summary(fit_EM,parameters=TRUE) ---------------------------------------------------- Gaussian f
32、inite mixture model fitted by EM algorithm ---------------------------------------------------- Mclust VEV (ellipsoidal, equal shape) model with 2 ponents: log.likelihood n df BIC ICL -215.726 150 26 -561.7285 -561.7289 Clustering table: 1 2 50 100 Mix
33、ing probabilities: 1 2 0.3333319 0.6666681 Means: [,1] [,2] Sepal.Length 5.0060022 6.261996 Sepal.Width 3.4280049 2.871999 Petal.Length 1.4620007 4.905992 Petal.Width 0.2459998 1.675997 Variances: [,,1] Sepal.Length Sepal.Width Pe
34、tal.Length Petal.Width Sepal.Length 0.15065114 0.13080115 0.02084463 0.01309107 Sepal.Width 0.13080115 0.17604529 0.01603245 0.01221458 Petal.Length 0.02084463 0.01603245 0.02808260 0.00601568 Petal.Width 0.01309107 0.01221458 0.00601568 0.01042365 [,,2]
35、Sepal.Length Sepal.Width Petal.Length Petal.Width Sepal.Length 0.4000438 0.10865444 0.3994018 0.14368256 Sepal.Width 0.1086544 0.10928077 0.1238904 0.07284384 Petal.Length 0.3994018 0.12389040 0.6109024 0.25738990 Petal.Width 0.1436826 0.07284384 0.2573899 0.168
36、08182 > plot(fit_EM)#對EM聚類結果作圖 Model-based clustering plots: 1: BIC 2: classification 3: uncertainty 4: density Selection:(下面顯示選項) #選1 #選2 #選3 #選4 Selection: 0 > iris_BIC=mclustBIC(iris[,1:4]) fitting ... |==================================================================
37、=========| 100% > iris_BICsum=summary(iris_BIC,data=iris[,1:4]) > iris_BICsum #獲取數(shù)1據(jù)集iris在各模型和類別數(shù)下的BIC值 Best BIC values: VEV,2 VEV,3 VVV,2 BIC -561.7285 -562.5522369 -574.01783 BIC diff 0.0000 -0.8237748 -12.28937 Classification table for model (VEV,2):
38、 1 2 50 100 > iris_BIC Bayesian Information Criterion (BIC): EII VII EEI VEI EVI VVI EEE 1 -1804.0854 -1804.0854 -1522.1202 -1522.1202 -1522.1202 -1522.1202 -829.9782 2 -1123.4117 -1012.2352 -1042.9679 -956.2823 -1007.3082 -857.5515
39、 -688.0972 3 -878.7650 -853.8144 -813.0504 -779.1566 -797.8342 -744.6382 -632.9647 4 -893.6140 -812.6048 -827.4036 -748.4529 -837.5452 -751.0198 -646.0258 5 -782.6441 -742.6083 -741.9185 -688.3463 -766.8158 -711.4502 -604.8131 6 -715.7136 -705.7811 -693.7908 -676.1697 -774
40、.0673 -707.2901 -609.8543 7 -731.8821 -698.5413 -713.1823 -680.7377 -813.5220 -766.6500 -632.4947 8 -725.0805 -701.4806 -691.4133 -679.4640 -740.4068 -764.1969 -639.2640 9 -694.5205 -700.0276 -696.2607 -702.0143 -767.8044 -755.8290 -653.0878 EVE VEE VVE
41、 EEV VEV EVV VVV 1 -829.9782 -829.9782 -829.9782 -829.9782 -829.9782 -829.9782 -829.9782 2 -657.2263 -656.3270 -605.1841 -644.5997 -561.7285 -658.3306 -574.0178 3 -666.5491 -605.3982 -636.4259 -644.7810 -562.5522 -656.0359 -580.8396 4 -705.5435 -604.8371 -639.7078 -699.8684 -
42、602.0104 -725.2925 -630.6000 5 -723.7199 NA -632.2056 -652.2959 -634.2890 NA -676.6061 6 -661.9497 -609.5584 -664.8224 -664.4537 -679.5116 NA -754.7938 7 -699.5102 NA -690.6108 -709.9530 -704.7699 -809.8276 -806.9277 8 -700.4277 -654.8237 -709.9392 -735.4463 -712.8788
43、 -831.7520 -830.6373 9 -729.6651 NA -734.2997 -758.9348 -748.8237 -882.4391 -883.6931 Top 3 models based on the BIC criterion: VEV,2 VEV,3 VVV,2 -561.7285 -562.5522 -574.0178 > par(mfcol=c(1,1)) > plot(iris_BIC,G=1:7,col="yellow") > mclust2Dplot(iris[,1:2], +
44、 classification=iris_BICsum$classification, + parameters=iris_BICsum$parameters,col="yellow") > iris_Dens=densityMclust(iris[,1:2])# 對每一個樣本進行密度估計 fitting ... |===========================================================================| 100% > iris_Dens 'densityMclust' model objec
45、t: (VEV,2) Available ponents: [1] "call" "data" "modelName" "n" [5] "d" "G" "BIC" "bic" [9] "loglik" "df" "hypvol" "parameters" [13] "z" "classification" "un
46、certainty" "density" > plot(iris_Dens,iris[,1:2],col="yellow",nlevels=55) ##輸入1或2 Model-based density estimation plots: 1: BIC 2: density Selection:(下面顯示選項) #選1 #選2 Selection: 0 > plot(iris_Dens,type = "persp",col = grey(0.8)) Model-based density estimation plots: 1: BIC 2: density Selection:(下面顯示選項) #選1 #選2 Selection: 0 28 / 28
- 溫馨提示:
1: 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2: 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
3.本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
5. 裝配圖網僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。