五. 结点连接图(node link plot)
(源代码:NodeLinks.m)
有时,我们需要绘制出不同结点之间的连通关系,即结点连接图。以下以绘制美国128座城市之间的连通关系为例,介绍两种结点连接图的画法。
1) 定义每座城市与距它最近的城市连通,与其余视为不连通,然后根据连通性,利用gplot命令,直观的绘出结点连接图。(图9)
- %% #1
- %% 定义数据
- [XYCoord] = xlsread('inter_city_distances.xlsx','Sheet3');
- [intercitydist citynames] = xlsread('inter_city_distances.xlsx','Distances');
-
- % 避免将自身判定为最近的城市
- howManyCities = 128;
- for i =1:howManyCities;intercitydist(i,i)=Inf;end
-
- %% 定义临接矩阵
- % n阶方阵,1表示相连,0表示不相连; 这是我们规定城市只与它最近的城市相连
- adjacency = zeros(howManyCities,howManyCities);
- for i = 1:howManyCities
- alls = find(intercitydist(i,:)==min(intercitydist(i,:)));
- for j = 1:length(alls)
- adjacency(i,alls(j)) = 1;
- adjacency(alls(j),i) = 1;
- end
- clear alls
- end
- figure('units','normalized','position',[ 0.2813 0.2676 0.3536 0.3889]);
- plot(XYCoord(1:howManyCities,1),XYCoord(1:howManyCities,2),'ro');hold on;
- title('距离最近的城市彼此相连');
- gplot(adjacency,XYCoord);
- xlabel('南北');
- ylabel('东西');
- set(gcf,'Color',[1 1 1],'paperpositionmode','auto');
图 9
2) 定义距离小于100英里的城市为相连,以更紧致的方式绘出连接情况。(图10)
- %% 定义临接关系(距离小于100英里的城市之间认为是相邻的)
- % 重新排列数据,以使距离短的城市大致靠近
- for i = 1:howManyCities; intercitydist(i,i) = 0; end
- [balh I] = sort(intercitydist(1,:));
- citynames = citynames(I);
- XYCoord = XYCoord(I,:);
- %% 重新计算距离矩阵
- for i = 1:howManyCities;
- for j = 1:howManyCities
- if i==j
- intercitydist(i,i) = Inf;
- else
- intercitydist(i,j) = sqrt((XYCoord(i,1)-XYCoord(j,1))^2 + (XYCoord(i,2)-XYCoord(j,2))^2);
- end
- end
- end
- %% 计算临接矩阵
- adjacency = zeros(howManyCities,howManyCities);
- for i = 1:howManyCities
- alls = find(intercitydist(i,:)<100);
- for j = 1:length(alls)
- adjacency(i,alls(j)) = 1;
- adjacency(alls(j),i) = 1;
- end
- clear alls
- end
-
- %% 图像大小和位置
- figure('units','normalized','position',[0.0844 0.2259 0.8839 0.4324]);
- axes('Position',[0.0371 0.2893 0.9501 0.6296]);
- xlim([1 howManyCities]);
- ylim([0 100]);
- hold on;
- %% 在X轴上标注城市名称
- set(gca,'xtick',1:howManyCities,'xticklabel',citynames,...
- 'ticklength',[0.001 0]);
- box on;
- rotateXLabels(gca,90);
- %% 为不同弧线分配不同的颜色(距离越近,颜色越深)
- m = colormap(pink(howManyCities+1));
- cmin = min(min(intercitydist));
- cmax = 150;
- %% 绘制弧线
- for i = 1:howManyCities
- for j = 1:howManyCities
- if adjacency(i,j)==1
- x=[i (i+j)/2 j];
- y=[0 intercitydist(i,j) 0];
- pol_camp=polyval(polyfit(x,y,2),linspace(i,j,25));
- plot(linspace(i,j,25),pol_camp,'Color',m(fix((intercitydist(i,j)-cmin)/(cmax-cmin)*howManyCities)+1,:),'linewidth',100/intercitydist(i,j));
- end
- end
- end
- %% 标注
- title('公路距离小于100英里的城市','fontsize',14);
- ylabel('城市间距离');
- set(gca,'Position',[0.0371 0.2893 0.9501 0.6296]);
- %% 绘制水平网格以增加可读性
- line(repmat(get(gca,'xlim'),9,1)',[linspace(10,90,9); linspace(10,90,9)],'Color',[.8 .8 .8]);
- set(gcf,'Color',[1 1 1],'paperpositionmode','auto');
图 10
六. 日历热图(Heat Map)
(源代码:CalendarHeatMap.m)
数据大小可以用颜色的深浅直观的表示,这可以通过imagesc或pcolor命令获得(二者参考点的位置不同)。以下即是用Imagesc命令,绘制Google公司10到11年的股价数据。(图11)
- %% Add path to utilities
- addpath('.\utilities');
-
-
- %% 数据预处理
- [GOOG dateGOOG] = xlsread('GOOG_090784_012412.csv');
- dateGOOG = datenum({dateGOOG{2:end,1}});
- dateGOOG = dateGOOG(end:-1:1);
- GOOG = GOOG(end:-1:1,:);
- % 选择最后两年的数据
- newData = [];
- newDateData=[];
- for i =1:numel(dateGOOG)-1
- if abs(dateGOOG(i+1)-dateGOOG(i))==1
- newData = [newData; GOOG(i)];
- newDateData = [newDateData dateGOOG(i)];
- else
- delta = abs(dateGOOG(i+1)-dateGOOG(i));
- for j = 1:delta
- newData = [newData; NaN];
- newDateData = [newDateData dateGOOG(i)+j-1];
- end
- end
- end
- newData = [newData; GOOG(end)];
- newDateData = [newDateData dateGOOG(end)];
- idx = find(newDateData<=datenum('12/31/2011')&newDateData>=datenum('1/1/2010'));
- newData=newData(idx);
- newDateData = newDateData(idx);
-
-
- %% 日历布局
- % 1行6个月,两年共4行
- figure('units','normalized','Position',[ 0.3380 0.0889 0.6406 0.8157]);
- colormap('summer');
- xs = [0.03 .03+.005*1+1*.1525 0.03+.005*2+2*.1525 0.03+.005*3+3*.1525 0.03...
- +.005*4+4*.1525 0.03+.005*5+5*.1525];
- ys = [0.14 .14+0.04*1+1*.165 .14+0.04*2+2*.165 .14+0.04*3+3*.165];
-
-
- % 估计每月的天数
- isthereALeapyear = find(~(mod(unique(str2num(datestr(newDateData,'yyyy'))),4)| ...
- mod(unique(str2num(datestr(newDateData,'yyyy'))),400)));
- if isempty(isthereALeapyear)
- D = [31 28 31 30 31 30; 31 31 30 31 30 31;31 28 31 30 31 30; 31 31 30 31 30 31];
- else
- if isthereALeapyear==1
- D = [31 29 31 30 31 30; 31 31 30 31 30 31;31 28 31 30 31 30; 31 31 30 31 30 31];
- else
- D = [31 28 31 30 31 30; 31 31 30 31 30 31;31 29 31 30 31 30; 31 31 30 31 30 31];
- end
- end
-
- %% 开始绘图
- Dcnt=0;
- for i = 1:4
- for j = 1:6
- % 绘制月视图
- axes('Position',[xs(j) ys(i) .1525 0.165]);
- % 计算所在的月份
- idx = find(newDateData>=datenum([datestr(newDateData(1)+Dcnt,'mm') ...
- '/01/' datestr(newDateData(1)+Dcnt,'yyyy')]) & ...
- newDateData<=datenum([datestr(newDateData(1)+Dcnt,'mm') '/31/' ...
- datestr(newDateData(1)+Dcnt,'yyyy')]));
- % 得到当月信息
- A = calendar(newDateData(1)+Dcnt);
- % 填入股价数据
- data = NaN(size(A));
- for k = 1:max(max(A))
- [xx yy] = find(A==k);
- data(xx,yy) = newData(idx(k));
- end
- % 上色
- imagesc(data); alpha(.4);hold on;set(gca,'fontweight','bold');
- xlim([.5 7.5]); ylim([0 6.5]);
- for m = 1:6
- for n= 1:7
- if A(m,n)~=0
- text(n,m,num2str(A(m,n)));
- end
- end
- end
- % 添加日历头
- text(.75,.25,'S','fontweight','bold'); text(1.75,.25,'M','fontweight','bold');
- text(2.75,.25,'T','fontweight','bold');
- text(3.75,.25,'W','fontweight','bold');text(4.75,.25,'R','fontweight','bold');
- text(5.75,.25,'F','fontweight','bold');text(6.75,.25,'S','fontweight','bold');
-
- title([datestr(newDateData(1)+Dcnt,'mmm') datestr(newDateData(1)+Dcnt,'yy')]);
- set(gca,'xticklabel',[],'yticklabel',[],'ticklength',[0 0]);
- line([-.5:7.5; -.5:7.5], [zeros(1,9); 6.5*ones(1,9)],'Color',[.8 .8 .8]);
- line([zeros(1,9); 7.5*ones(1,9)],[-.5:7.5; -.5:7.5], 'Color',[.8 .8 .8]);
- box on;
- Dcnt=Dcnt+D(i,j);
- end
- end
-
-
- %%增加图标和标题
- colorbar('Location','SouthOutside','Position',[ 0.1227 0.0613 0.7750 0.0263]);
- alpha(.4);
- annotation('textbox',[0.30 0.9354 0.8366 0.0571],...
- 'String','2010年1月到2011年12月Google股价日记录',...
- 'LineStyle','none','Fontsize',14);
-
-
- set(gcf,'Color',[1 1 1],'paperpositionmode','auto');
图 11
七. 数据分布的可视化分析
(源代码:DistributionAnalysis.m)
1) 首先利用散点图和直方图对数据进行初步观察。(图 12)
- %% 加载数据
- load distriAnalysisData;
-
- %% 观测数据分布
- % 绘制直方图
- figure('units','normalized','position',[0.2099 0.6269 0.4354 0.2778]);
- subplot(1,2,1);plot(sort(B),'.');xlabel('序号');ylabel('观察值');title('一维散点图');
- subplot(1,2,2);hist(B);xlabel('间隔');ylabel('观测频率');title('直方图');
- set(gcf,'Color',[1 1 1],'paperpositionmode','auto');
图 12
2) 进一步优化直方图,对数据进行更细致考查。(图13)
- %% 优化直方图
- figure('units','normalized','position',[0.2099 0.6269 0.4354 0.2778]);
- [N c] = hist(B,round(sqrt(length(B))));
- bar(c,N);
- title('间隔大小 = sqrt(n)');
- xlabel('间隔');ylabel('观测频率');title('优化直方图间隔以发现隐藏的结构');
- set(gcf,'Color',[1 1 1],'paperpositionmode','auto');
图 13
3) 利用样条插值,为直方图添加包络线。(图14)
- %% 利用样条插值绘制直方图的包络
- [N c] = hist(B,200);
- % 用样条计算包络
- env = interp1(c,N,c,'spline');
- % 绘制归一化的包络
- figure('units','normalized','position',[ 0.2099 0.6269 0.4354 0.2778]);
- bar(c,N./max(N));hold;
- plot(c,env./max(env),'r','Linewidth',3);
- xlabel('间隔'); ylabel({'归一化后的包络','间隔数为200'});
- title('直方图包络是数据经验分布建模的有力工具');
- set(gcf,'Color',[1 1 1],'paperpositionmode','auto');
图 14
4) 绘制QQ图,观察数据分布的正态性。(图15)
- %% MATLAB qqplot 命令
- figure;qqplot(B);box on
- title({get(get(gca,'title'),'String'),'数据点越接近曲线,数据分布的正态性越好'});
- set(gcf,'Color',[1 1 1],'paperpositionmode','auto');
图 15
5) 残差分析,评估模型对数据真实分布的近似程度。(图16)
- %% 分析残差
- figure;
- sigma_ampl = [79.267229 8.121365 5 6.254915 5.062882 11.117357 577.45966 ...
- 531.38438 962.45674 1800 800 357.92132];
- mu=[29 38 51 70 103 133];
- % 高斯混合模型
- f_sum=0;x=1:200;
- for i=1:6
- f_sum=f_sum+sigma_ampl(i+6)./(sigma_ampl(i)).*exp(-(x-mu(i)).^2./(2*sigma_ampl(i).^2));
- end
- subplot(2,1,1);
- clear h;
- h(1)=plot(c,env,'Linewidth',1.5);hold on;
- h(2)=plot(c,f_sum,'r','Linewidth',1.5); axis tight
- legendflex(h,{'直方图轮廓','高斯混合模型'},'ref',gcf,'anchor',{'ne','ne'},'xscale',.5,'buffer',[-50 -50]);
- title({'对数据分布建模后,通过残差分析评估','模型对数据描述的符合程度'});
- subplot(2,1,2);
- plot(c,env-f_sum,'.');axis tight;
- title(['残差 = 观测值 - 拟合值, 均方误差 = ' num2str(sqrt(sum(abs(env-f_sum).^2)))]);
- set(gcf,'Color',[1 1 1],'paperpositionmode','auto');
图 16
八. 时间序列分析
(源代码:TimeSeriesAnalysis.m)
1) 绘制散点图。(图17)
- load timeseriesAnalysis;
-
- %% 数据散点图
- figure;
- subplot(2,1,1);plot(x,ydata1);title('样本1实时心率(采样间隔0.5秒)');xlabel('时间(秒)');ylabel('心率(心跳次数/分钟)');
- subplot(2,1,2);plot(x,ydata2);title('样本2实时心率(采样间隔0.5秒)');xlabel('时间(秒)');ylabel('心率(心跳次数/分钟)');
- set(gcf,'color',[1 1 1],'paperpositionmode','auto');
图 17
2) 过滤信号中的线性成分(detrend)。(图18)
- %% 过滤数据的线性趋势(detrend)
- figure;
- y_detrended1 = detrend(ydata1);
- y_detrended2 = detrend(ydata2);
- subplot(2,1,1);plot(x, ydata1,'-',x, ydata1-y_detrended1,'r');title('去除线性成分后的信号1');
- legend({'信号','线性成分'});
- xlabel('时间 (秒)');ylabel('心率(心跳次数/分钟)');
- subplot(2,1,2);plot(x, ydata2,'-',x, ydata2-y_detrended2,'r');title('去除线性成分后的信号2');
- legend({'信号','线性成分'});
- xlabel('时间(秒)');ylabel('心率(心跳次数/分钟)');
- set(gcf,'color',[1 1 1],'paperpositionmode','auto');
图 18
3) 计算自相关函数,绘制自相关图(correlogram)。(图19)
- %% 自相关函数
- figure;
- y_autoCorr1 = acf(subplot(2,1,1),ydata1,100);
- set(get(gca,'title'),'String','心率数据自相关函数(样本1)');
- set(get(gca,'xlabel'),'String','间期(秒)');
- tt = get(gca,'xtick');
- for i = 1:length(tt); ttc{i} = sprintf('%.2f ',0.5*tt(i)); end
- set(gca,'xticklabel',ttc);
- y_autoCorr2 = acf(subplot(2,1,2),ydata2, 100);
- set(gca,'xticklabel',ttc);
- set(get(gca,'title'),'String','心率数据自相关函数(样本2)');
- set(get(gca,'xlabel'),'String','间期(秒)');
- set(gcf,'color',[1 1 1],'paperpositionmode','auto');
图 19
4) 利用傅里叶变换寻找周期成分。(图20)
- %% 利用傅里叶变换寻找周期成分
- figure;
-
- nfft = 2^(nextpow2(length(x)));
- % 快速傅里叶变换(信号长度小于nfft时补零)
- ySpectrum1 = fft(y_detrended1,nfft);
- ySpectrum2 = fft(y_detrended2,nfft);
-
- NumUniquePts = ceil((nfft+1)/2);
- % FFT is symmetric, throw away second half and use the magnitude of the coeeicients only
- powerSpectrum1 = abs(ySpectrum1(1:NumUniquePts));
- powerSpectrum2 = abs(ySpectrum2(1:NumUniquePts));
- % Scale the fft so that it is not a function of the length of x
- powerSpectrum1 = powerSpectrum1./max(powerSpectrum1);
- powerSpectrum2 = powerSpectrum2./max(powerSpectrum2);
- powerSpectrum1 = powerSpectrum1.^2;
- powerSpectrum2 = powerSpectrum2.^2;
- % Since we dropped half the FFT, we multiply the coeffixients we have by 2 to keep the same energy.
- % The DC component and Nyquist component, if it exists, are unique and should not be multiplied by 2.
- if rem(nfft, 2) % odd nfft excludes Nyquist point
- powerSpectrum1(2:end) = powerSpectrum1(2:end)*2;
- powerSpectrum2(2:end) = powerSpectrum2(2:end)*2;
- else
- powerSpectrum1(2:end -1) = powerSpectrum1(2:end -1)*2;
- powerSpectrum2(2:end -1) = powerSpectrum2(2:end -1)*2;
- end
- % This is an evenly spaced frequency vector with NumUniquePts points.
- % Sampling frequency
- Fs = 1/(x(2)-x(1));
- f = (0:NumUniquePts-1)*Fs/nfft;
- plot(f,powerSpectrum1,'-',f,powerSpectrum2,'r');
- title('心率信号能量谱');
- xlabel('频率(赫兹)'); ylabel('能量');
- xlim([0 .25]);
- annotation_pinned('textarrow',[.15,.085],[.25,.03],'String',{'0.1Hz处的尖峰很可能','是该样本呼吸的频率'});
- annotation_pinned('textarrow',[.1,.02],[.75,.87],'String',{'0.02 Hz处的尖峰很可能','是该样本呼吸的频率'});
- set(gcf,'color',[1 1 1],'paperpositionmode','auto');
图 20
5) 对信号平滑处理,去掉频谱上系数较小的成分。(图21)
- %% 去除能量较小的频率成分
- figure;
- % 不使用0垫位
- ySpectrum1 = fft(ydata1);
- ySpectrum2 = fft(ydata2);
- % 去掉很小的系数
- freqInd1=find(abs(ySpectrum1)<400);
- freqInd2=find(abs(ySpectrum2)<400);
- ySpectrum1(freqInd1)=0;
- ySpectrum2(freqInd2)=0;
- % 重建信号
- y_cyclic1=ifft(ySpectrum1);
- y_cyclic2=ifft(ySpectrum2);
- subplot(2,1,1);
- h(1)= plot(x,ydata1,'b');hold on;h(2)=plot(x,y_cyclic1,'r','linewidth',1.5);
- title('心率信号1');axis tight;
- legendflex(h,... %handle to plot lines
- {'原始信号','平滑后信号'},... %corresponding legend entries
- 'ref', gcf, ... %which figure
- 'anchor', {'e','e'}, ... %location of legend box
- 'buffer',[-10 0], ... % an offset wrt the location
- 'fontsize',8,... %font size
- 'xscale',.5); %a scale factor for actual symbols
- xlabel('时间(秒)');ylabel('心率(心跳次数/分钟)');
- subplot(2,1,2);
- h(1) = plot(x,ydata2,'b');hold on;h(2)=plot(x,y_cyclic2,'r','linewidth',1.5);
- title('心率信号2');axis tight;
- xlabel('时间(秒)');ylabel('心率(心跳次数/分钟)');
- set(gcf,'color',[1 1 1],'paperpositionmode','auto');
图 21
|