一文读懂合成控制法及Matlab操作及应用

liyu_sun 2020-05-03

展开全文

Synth MATLAB Code (11/07/2006) written for MATLAB 7.0

by Alberto Abadie, Alexis Diamond, and Jens Hainmueller (all Harvard University)

Contact: jhainm@harvard.edu

关于合成控制法简介请详细阅读一文读懂合成控制法 (Synthetic Control Method)操作及Stata应用

Synth MATLAB使用Abadie和Gardeazabal(2003)以及Abadie、Diamond和Hainmueller(2006)中开发的汇总数据，在比较案例研究中实现因果推理的合成控制方法。

合成控制法Matlab操作之数据介绍

39 by 39 data matrix to run the example:

First row contains state numbers:Alabama 1; Arkansas 2; Colorado 4; Connecticut 5; Delaware 6;Georgia 7; Idaho 8; Illinois 9; Indiana 10; Iowa 11; Kansas 12;Kentucky 13; Louisiana 14; Maine 15; Minnesota 16; Mississippi 17; Missouri 18; Montana 19; Nebraska 20; Nevada 21; New Hampshire 22; New Mexico 23; North Carolina 24; North Dakota 25; Ohio 26; Oklahoma 27;Pennsylvania 28; Rhode Island 29; South Carolina 30; South Dakota 31; Tennessee 32; Texas 33; Utah 34; Vermont 35; Virginia 36; West Virginia 37; Wisconsin 38; Wyoming 39; California

Predictors are stored in rows 2 to 8:

row 2: income, row 3: retail price, row 4: percent_15-19; row 5: beer

consumption (all averaged 1980 to 1988);row 6: smoking 1988, row 7: smoking 1980; row 8: smoking 1975;

Outcome Data (smoking consumption in packs per capita) is stored in

rows 9 to 39 for the years 1970, 1971,...,2000

合成控制法Matlab操作代码

1、导入数据

load MLAB_data.txt;data = MLAB_data;

2、Built Indices (see data description in readme file)

% California is state no 3, stored in the last column no 39index_tr  = [39];
% 38 Control states are 1,2 & 4,5,...,38, stored in columns 1 to 38index_co  = [1:38];% Predcitors are stored in rows 2 to 8index_predict = [2:8];% Outcome Data is stored in rows 9 to 39; for 1970, 1971,...,2000index_Y = [9:39];

3、Define Matrices for Predictors

% X0 : 7 X 38 matrix (7 smoking predictors for 38 control states)X0 = data(index_predict,index_co);

% X1 : 10 X 1 matrix (10 crime predictors for 1 treated states)X1 = data(index_predict,index_tr);

% Normalization (probably could be done more elegantly)bigdata = [X0,X1];divisor = std(bigdata');scamatrix = (bigdata' * diag(( 1./(divisor) * eye(size(bigdata,1))) ))';X0sca = scamatrix([1:size(X0,1)],[1:size(X0,2)]);X1sca = scamatrix(1:size(X1,1),[size(scamatrix,2)]);X0 = X0sca;X1 = X1sca;clear divisor X0sca X1sca scamatrix bigdata;

4、Define Matrices for Outcome Data

% Y0 : 31 X 38 matrix (31 years of smoking data for 38 control states)Y0 = data(index_Y,index_co);% Y1 : 31 X 1 matrix (31 years of smoking data for 1 treated state)Y1 = data(index_Y,index_tr);
% Now pick Z matrices, i.e. the pretreatment period% over which the loss function should be minmized% Here we pick Z to go from 1970 to 1988  % Z0 : 19 X 38 matrix (31 years of pre-treatment smoking data for 38 control states)Z0 = Y0([1:19],1:38);% Z1 : 19 X 1 matrix (31 years of pre-treatment smoking data for 1 treated state)Z1 = Y1([1:19],1);

5、Now we implement Optimization

% Check and maybe adjust optimization settings if necessaryoptions = optimset('fmincon')

% Get Starting Values s = std([X1 X0]')';s2 = s; s2(1)=[];s1 = s(1);v20 =((s1./s2).^2);

[v2,fminv,exitflag] = fmincon('loss_function',v20,[],[],[],[],... zeros(size(X1)),[],[],options,X1,X0,Z1,Z0);display(sprintf('%15.4f',fminv));v = [1;v2];% V-weightsv

% Now recover W-weightsD = diag(v);H = X0'*D*X0;f = - X1'*D*X0;options = optimset('quadprog')[w,fval,e]=quadprog(H,f,[],[],ones(1,length(X0)),1,zeros(length(X0),1),ones(length(X0),1),[],options);w = abs(w);

% W-weightsw

6、绘制结果图

Y0_plot = Y0*w;years = [1970:2000]'; plot(years,Y1,'-', years,Y0_plot,'--');axis([1970 2000 0 150]);xlabel('year');ylabel('smoking consumtion per capita (in packs)');legend('Solid Real California','Dashed Synthetic California',4);

7、结果为：