【原】周末统计问题(17): 多因素线性回归的作用（这题有点难哦）

妙趣横生统计学 2021-07-23

展开全文

《英国医学会杂志》（BMJ）自2008年9月开始至2015年由两位流行病与统计学专家不间断地出了300多期statistical question系列。在这个系列中，两位学者每次出一道统计学选择题，进行选择并解释。现在我精选300道Statistical Question，形成中文版，请有兴趣的朋友们进行回答。

统计问题（17）：多因素线性回归的作用（多选，这题有点难哦）

*请大家在看问题答案解析之前，先“投票”，也看看大家的结果

Question

Which, if any, of the following statements are true of multiple linear regression?

a) It maybe used to control confounding in cohort studies

b) It can only assess a linear (straight line) relationship between variables

c) It gives misleading results when two or more independent variables are highly correlated

d) It can be used with dichotomous or categorical independent variables

e) It can be used with dichotomous or categorical dependent variables

Answer

An example of a linear regression equation would bey=ax1 +bx2 +c. This type of equation can be used to express the relationshipbetween independent variables on the right hand side and one dependent variablethat is on the left hand side.

The dependent variable must be able to take on a wide range of values and ideally should be acontinuous variable such as height or blood pressure. The independent variables, by contrast may be of almost any type including, for example, dichotomous (sex), categorical (eye colour) or continuous (height).

Controlling for confounding is a major use of regression equations in medical statistics. Whereas correlation assesses a straight line relationship, regression may take on more complex curves by using polynomials or smoothed functions for example.

When two independent variables that are highly correlated are included in a regression equation, they will compete for statistical significance and may not appear asin dependent predictors of outcome. For example, in a cohort study of risk factors for osteoarthritis of the knee, including the possession of running shoes and participation in regular exercise may result in unreliable results from a regression analysis.

中文解释：

线性回归方程的示例为y = ax1 + bx2 + c。这种类型的方程式可用于表达右侧的自变量与左侧的一个因变量之间的关系。

因变量必须是一个具有一定范围观察值的变量，并且理想情况下应为连续性变量，例如身高或血压。相反，自变量几乎可以是任何类型，包括例如二分类变量（性别），分类变量（眼睛颜色）或连续变量（身高）。

控制混杂是医学统计中回归方程的主要用途。相关性评估直线关系，而回归可以通过使用多项式或平滑函数来绘制更复杂的曲线。

当两个高度相关的自变量包含在回归方程中时，它们将争夺统计学意义，并且可能不会作为结果的独立预测因子出现。例如，在一项关于膝关节骨关节炎危险因素的队列研究中，包括拥有跑鞋和参加定期运动，可能导致回归分析的结果不可靠。

所以答案是选择 ACD

原文出处：BMJ杂志

更多信息

本公众号作为医学数据分析公众号，提供一些免费医学统计学学习资源下载，欢迎点击下载。

1.免费下载！统计初学者的福音！《妙趣横生统计学》视频，生动有趣的统计学！

2.医学研究样本量如何计算？原创高清教程视频来了，完全免费下载！

3.绝对值得收藏！原创高清SPSS 操作视频免费下载