研究方法 | 如何在顶级管理学期刊发表论文（五）：方法与结果

liyu_sun 2020-09-05

展开全文

来源：南开管理评论

一旦选择一个有趣而适当的选题，设计并执行一个合理的数据收集，制定一个引人注目的“伏笔”和发展一个坚实的理论，这些艰巨而又令人兴奋的工作完成后，人们就很容易坐视不前、麻痹放松，在各种方法和结果中游走。接下来的工作似乎很简单直接，也许有点按部就班——向读者报告：（1）如何获得数据以及为什么获得数据；（2）如何分析数据以及发现了什么。对于AMJ这一系列的关于如何在AMJ发表的叙述性文章，确实有许多读者在等待着它的出版。因此，如果我们这篇文章缺乏说服力，我们希望它至少能提供一些信息。

作为作者，我们不得不承认，在写这些章节的时候，我们已经屈服于放松注意力的诱惑。我们听到同事们说，他们把这些部分交给研究团队的初级成员，让他们在草稿写作中“练手”，好像这些部分的重要性不如开头部分、假设发展和讨论部分那么重要。也许的确如此。但作为过去两年来现任编辑团队的成员，我们面对这样一个现实：“方法”和“结果”部分，即使不是最关键的部分，也往往在审稿人如何评价稿件方面发挥着重要作用。如果这些章节并没有对数据收集程序和结果提供清晰、详细的描述，反而常常让审稿人感到困惑，并会就作者使用的研究程序和发现，提出比他们回答更多的问题。相比之下，一个有效的陈述可以对作者说服读者相信他们的理论论点（或其中的一部分）得到支持的程度产生至关重要的影响。高质量的“方法”和“结果”部分也传达了表现作者责任心的积极信号。知道他们在准备这些章节时是谨慎而严谨的，对于讨论是建议拒绝还是建议修改请求的外审来说可能会有所不同。

为了更好地理解审稿人共同关心的问题，我们在任期内对每一封被拒稿件的决定信进行了评估。我们发现有几个问题在被拒绝的手稿中比在要求修改的手稿中出现的频率要高得多。我们的评估结果，如果不令人惊讶的话，揭示了这两个部分的一系列非常一致的主要问题，我们总结为“三个度”（3C:Completeness，Clarity，Credibility）：完整度、清晰度和可信度。

方法

完整度

在审查我们的决定书时，可能与“方法”部分相关的最常见的问题是作者未能提供他们获得数据的方式、他们使用的构念的操作化以及他们进行的分析类型的完整描述。当作者收集了他们的数据（主要的数据收集）时，他们不仅要详细解释发生了什么，而且要详细解释他们为什么要做出这些决定。Bommer、Dierdorff和Rubin（2007）关于群组层面的公民行为和工作绩效的研究就是一个很好的例子。我们在他们的方法中了解了如何联系参与者（如，在现场，由研究的第一作者）、如何获得数据（如，在现场培训室，从20-30名员工组成的小组中）、鼓励参与的方式（如公司总裁和研究人员的信件）以及模型中不同构念由谁报告的信息（即员工、主管和主管的经理）。此外，这些作者还报告了有关其数据收集的其他相关信息。例如，他们指出，员工和他们的主管从来没有被安排在同一个房间里完成他们的问卷调查。此外，他们还报告了一套“制衡”制度，以确保主管报告所有直接下属的绩效。提供这些细节，除了在个人和团队层面对分析样本的特征进行全面描述之外，还允许审稿人评估研究设计的优缺点。尽管强调自己研究的优势是合理的，但是报告足够的关于数据收集的优势和潜在弱点的细节比隐藏重要细节的方法更可取，因为某些妥协或缺陷也会带来好处。考虑使用雪球抽样方法在两个有几个月间隔的时间段中收集数据的方法。这种方法的一个缺点可能是，如果研究人员只联系第一拨参与者参与第二拨，那么在两个阶段上匹配的样本将比产生的样本要小。但是，这种方法也有一定的优势。特别是，可以使用大量的单阶段参与者（即参与第一波或第二拨的参与者）直接解决响应偏差和代表性问题。

在其他许多情况下，研究的数据是从档案资料里获得的。在这里，研究人员可能无法获得数据收集过程的所有细节，但报告的完整性同样重要。大多数（如果不是全部）存档数据集都附带有技术报告或使用手册，这些报告或手册提供了大量详细信息。有了这些，研究人员可以尝试复制原始数据收集中发现的数据收集过程和度量的细节。一个很好的例子是使用全国纵向调查和青年群体数据（NLSY79），见Lee，Gerhart，Weller和Trevor（2008）。对于其他存档数据集，作者自己构建数据集，可能是通过对公司文件、媒体帐户进行编码，或者从其他来源构建变量。在这些情况下，有必要完整地描述他们如何识别样本，有多少观察结果因不同原因丢失，他们如何进行编码，以及进行了哪些判断调用。

不管研究人员使用了什么类型的数据集，这一部分的目标都是相同的。第一，作者应该披露研究过程的方式、目的和原因。例如，包含一个完整的度量列表（和适当的项目）的附录通常是一个不错的选择。第二，完整度使读者能够评估所采用方法的优缺点，总的来说，这会给研究带来更积极的印象。第三，方法部分的一个主要目标应该是提供足够的信息，如果某人使用完全相同的程序和数据，他们可以复制研究并得到相同的结果。在阅读了“方法”部分之后，读者应该有信心，他们可以复制主要的数据收集或编译与作者报告的相同的存档数据库。

清晰度

太多的时候，作者没有清楚地解释他们所做的事情。尽管有许多潜在的例子，但一个典型的、非常常见的问题涉及测量的描述。审稿人经常关注诸如“我们调整了项目”或“我们使用了多个来源的项目”这样的语言。事实上，在评估我们的决定书时，不报告措施是如何调整的，是与测量有关的模式问题。理想情况下，作者可以通过使用完整的、有效的构念测量来避免这些问题。当这不可能时，就必须为修改提供理由，理想情况下，还需要对改变的措施提供额外的经验验证。如果最初没有包括这些信息，审稿人总是会要求它；预先提供这些信息可以提高论文获得修改的机会。

另一个非常常见的清晰度问题涉及变量编码的合理性。编码决策几乎在每一个定量研究中都有，但在涉及档案数据集、实验设计和基于定性反应的数字代码分配的研究中可能最常见。例如，Ferrier（2001）使用结构化内容分析编码新闻标题，以测量竞争性攻击。在一个清晰的例子中，Ferrier以一种有组织的方式，用直截了当的语言描述了研究团队如何为每个维度做出编码决策，以及这些决策如何导致与竞争攻击维度的构成定义相匹配的操作。

可信度

作者可以在他们的“方法”部分做一些简单的事情来增强可信度。第一，重要的是要说明为什么要选择一个特定的样本。审稿人经常质疑为什么要使用某个特定的样本，尤其是在不明显的情况下，为什么感兴趣的现象在所使用的样本中很重要。例如，在Tangirala和Ramanujam关于建言、个人控制和组织身份认同的研究中，作者通过描述他们为什么选择抽样一线医院护士来测试他们的假设来打开“方法”，并指出：（1）“他们完全有能力观察病人护理中不安全状况的早期迹象，并将报告以引起医院的注意”和（2）“人们越来越认识到护士愿意说出护理过程中的问题对于提高患者安全性和减少可避免的医疗错误（如使用错误的药物）至关重要，这是美国患者伤亡的主要原因”（2008:1,193）。第二，在描述一个构念所使用的测量之前，最好先对其概念定义进行总结。这不仅使读者避免在论文中翻来覆去地寻找构成性定义，而且如果做得好，还将减少读者对论文所提出的理论是否与所进行的测试相匹配的担忧。第三，解释为什么使用特定的操作性定义总是很重要的。例如，组织绩效有许多维度。有些可能与文章的假设有关，有些则与之无关。我们经常看到作者在没有正当理由的情况下引入某些维度，从而使外审们“大吃一惊”。在有替代措施的情况下，作者应报告他们考虑了哪些其他措施以及为什么没有选择这些措施。如果数据集中有替代措施，通常最好报告使用这些替代措施时所获得的结果。第四，证明模型规范和数据分析方法的合理性是至关重要的。我们经常看到作者引入了控制变量，却没有充分说明为什么要对它们进行控制。对于某些类型的数据，存在多种可能的分析方法。作者需要证明为什么使用某种方法而不是其他方法。例如，面板数据可以使用固定效应模型或随机效应模型进行分析。多事件历史分析方法可以分析存活数据。每种方法都有其特定的假设。在某些情况下，需要进行额外的分析来做出选择（例如，通过Hausman检验在面板数据的固定效应模型和随机效应模型之间进行选择）。

结果

完整度

有效地撰写“结果”部分并不是一件容易的事，尤其是当一个人的理论框架和/或研究设计很复杂时，完整度就显得尤为重要。对于初学者来说，包括均值、标准差和相关性的表格是一个“低风险的果实”。这个表格中的信息可能没有直接检验假设，但它描绘了数据的总体情况，这对于判断研究结果的可信度至关重要。例如，变量之间的高相关性常常引起人们对多重共线性的担忧。相对于变量平均值的较大标准差可能会引起对异常值的关注。事实上，在数据分析过程中，检查数据范围和异常值是一个很好的做法，以避免主要由少数异常值驱动的显著结果。表中报告的变量的分布特性（如平均值、最小值和最大值）本身就是信息。例如，在一项关于CEO继任的研究中，衡量不同类型CEO接班人的变量方法可以告诉样本中来自不同来源的新CEO的分布情况。这些分布性质描述了CEO接班现象，具有重要的现实意义。

在报告结果时，重要的是指定分析单位、样本量和每个模型中使用的因变量。当这些信息因模型而异时，这一点尤为重要。以Arthaud-Day、Certo、Dalton和Dalton（2006）为例。这些作者研究了公司财务重述后的执行官和董事的离职率。他们有四个因变量：CEO离职率、CFO离职率、外部董事离职率和审计承诺成员离职率。在CEO和CFO离职模型中，由于他们能够识别离职月份，因此他们以“CEO/CFO”为分析单元构建数据，并使用Cox模型来检验高管离职的时机。CEO离职模型的样本量为485，CFO离职模型的样本量为407。相比之下，在检验外部董事和审计委员会成员的更替时，由于Arthaud-Day和她的同事无法确定外部董事和审计委员会成员离职的月份，他们以董事/审计委员会成员年数为分析单位构建数据，并用Logistic回归分析其离职的可能性。外部董事离职模型的样本量为2668，审计委员会成员离职的样本量为1327。可以效仿的是，像Arthaud-Day和同事们提供的那些细致的描述，可以帮助读者校正他们对结果的解释，并防止外审提出关于澄清的问题。

清晰度

“结果”部分的目的是回答已经提出的研究问题，并为假设提供经验证据（或者解释证据不足）。然而，我们经常看到，作者并不把他们的发现与研究的假设联系起来。我们还看到作者在结果部分报告了结果，但在讨论部分讨论了结果与假设的联系，或者相反，过早地开始讨论结果中发现的含义，而不是在讨论中这样做。在这些情况下，作者未能以清晰的方式描述结果对研究重点话题的启示。为了避免这个问题，在报告相关结果之前先对每个假设进行总结是有帮助的。试试这个格式：“假设X表明……我们发现……在模型中……在表中……因此，假设X是（或不）支持的。”尽管这种格式看起来可能机械，甚至无聊，但它是一种非常有效的清楚报告结果的方法（另见Bem，1987）。我们鼓励并欢迎作者尝试用新颖而清晰的方法来呈现结果。我们还建议作者按顺序报告与他们的假设相关的结果，从第一个假设开始，然后继续到最后一个假设，除非有一些令人信服的理由表明不按照顺序报告更好。

在许多研究中，结果并不支持所有的假设。然而，那些没有统计学意义的结果和那些与预测相反的结果，同那些得到支持的结果一样重要。然而，正如一位编辑指出的那样：“如果结果与预期相反，我发现作者往往会试图将其‘扫地出门’。”不用说，有时这样的结果反映了不充分的理论（例如，假设是错误的，或者至少表明存在其他的论点和预测）。然而，其他时候，不受支持的结果是讨论部分新鲜的、批判性思维的重要素材。关键是，所有的结果是否重要——支持或反对假设——都需要有直接的和清楚的应对。

以相同的顺序跨章节引用变量，也是一个很好的做法。例如，在“方法”部分中描述它们的测量，在表中列出它们，并在“结果”部分中以相同的顺序讨论结果。这种一致性提高了论述的清晰度，有助于读者既能跟踪稿件，又能快速地找到信息。它还为作者提供了一个查验清单，以便他们记住需要覆盖相关信息（例如，模型中包含的变量在方法部分和/或相关矩阵中没有提及）。

可信度

尽管论文的每一个部分都会让读者对其可信度产生积极或消极的影响（例如，充分的理论分析和严谨的研究设计），“结果”部分对此仍有用武之地，作者可以在此处着力以增强研究发现被感知到的可信度：第一，向读者展示为什么某人对结果的解释是正确的。例如，交互作用项的负系数可能表明，随着调节因子值的增加，预测因子的正效应减弱、消失，甚至变为负。绘制一个显著的交互效应有助于我们把发现可视化，从而证明发现是否与预期假设一致。Aiken和West（1991）提供了一些关于如何在回归中绘制交互效应的“黄金法则”。除此之外，确定简单斜率在统计上是否显著，在评估一个人的结果是否完全支持假设时通常很重要；由Preacher, Curran, Bauer (2006)开发的技术在这些计算中很有帮助。

第二，如果研究中可以使用替代测量、方法和/或模型规范，但作者只使用一种可能的选择来报告结果，读者可能会有这样的印象：作者“精心挑选”了与假设相符的发现。补充分析和稳健性检验可以解决这些问题。例如，Tsai和Ghoshal（1998）研究了企业内部网络中业务部门位置的价值创造作用。尽管他们在单个业务单元层面提出了假设，但他们从二元层面的数据中生成了几个业务单元属性的测量。这些步骤引起了对分析水平和结果可靠性的一些担忧。为了解决这些问题，他们还分析了二元层面的数据，得到了一致的结果。

第三，即使结果在统计学上有显著性，读者还是会问，那又怎样？统计上显著的影响不一定是实际重要的影响。作者通常在“讨论”中讨论研究的实际意义；然而，他们可以在结果中进行和报告额外的分析，以证明研究结果的实际相关性。Barnett和King（2008）关于溢出危害的研究就是一个很好的例子。这些作者提出了以下假设：“一家公司的错误会损害同一行业的其他公司。”（Barnett & King，2008:1,153）。除了报告预测因子的统计显著性外，作者还提供了信息来传达这种溢出的平均规模。他们报告说，“在发生平均3.5名员工受伤的事故后，与事故发生地同一行业的化工企业预计将损失其股票价格的0.15%”，以及“在发生导致员工死亡的事故后，该公司预计将再损失0.83%”（Barnett & King，2008:1,160）。在一些情况下，作者可能想讨论小效应规模的含义，也许是注意到了解释给定因变量中的方差有多困难，或者，在这种情况下，一个实验，注意到即使对自变量的操纵非常小，也发现了显著的影响（Prentice & Miller，1992）。

结论

改进“方法”和“结果”部分听起来可能并不令人兴奋或具有挑战性。因此，作者在写作时往往不太注意。有时，这些章节的写作任务被委派给研究小组的初级成员。然而，在编辑的经验中，我们发现这些部分通常在审稿人对稿件的评价中起着重要的，甚至是关键的作用。我们敦促作者在完成这些部分时要更加小心。在这方面，3C规则的完整度、清晰度和可信度是一个值得借鉴的诀窍。

作者：

Yan (Anthea) Zhang

Rice University

Jason D. Shaw

University of Minnesota

校译：

《南开管理评论》编辑部周轩

原文出处：

Academy of Management Journal 2012, Vol. 55, No. 1, 8-12.

英文原文：

FROM THE EDITORS

PUBLISHING IN AMJ—PART 5: CRAFTING THE METHODS AND RESULTS

Once the arduous, but exciting, work of selecting an intriguing and appropriate topic, designing and executing a sound data collection, crafting a compelling “hook,” and developing a solid theory is finished, it is tempting to sit back, relax, and cruise through the Methods and Results. It seems straightforward, and perhaps a little mundane, to report to the readers (1) how and why the data were obtained; (2) how the data were analyzed and what was found. Indeed, it is unlikely that many readers of AMJ have waited with bated breath for an entertaining narrative in this installment of the Publishing in AMJ editorial series. If we fall short of being compelling, therefore, we hope to at least be informative.

As authors ourselves, we have, admittedly, succumbed to the temptation of relaxing our concentration when it is time to write these sections. We have heard colleagues say that they pass off these sections to junior members of their research teams to “get their feet wet” in manuscript crafting, as though these sections were of less importance than the opening, hypothesis development, and Discussion sections. Perhaps this is so. But as members of the current editorial team for the past two years, we have come face-to-face with the reality that the Methods and Results sections, if not the most critical, often play a major role in how reviewers evaluate a manuscript. Instead of providing a clear, detailed account of the data collection procedures and findings, these sections often leave reviewers perplexed and raise more questions than they answer about the research procedures and findings that the authors used. In contrast, an effective presentation can have a crucial impact on the extent to which authors can convince their audiences that their theoretical arguments (or parts of them) are supported. High-quality Methods and Results sections also send positive signals about the conscientiousness of the author(s). Knowing that they were careful and rigorous in their preparation of these sections may make a difference for reviewers debating whether to recommend a rejection or a revision request.

To better understand the common concerns raised by reviewers, we evaluated each of our decision letters for rejected manuscripts to this point in our term. We found several issues arose much more frequently in rejected manuscripts than they did in manuscripts for which revisions were requested. The results of our evaluation, if not surprising, revealed a remarkably consistent set of major concerns for both sections, which we summarize as “the three C’s”: completeness, clarity, and credibility.

THE METHODS

Completeness

In the review of our decision letters, perhaps the most common theme related to Methods sections was that the authors failed to provide a complete description of the ways they obtained the data, the operationalizations of the constructs that they used, and the types of analyses that they conducted. When authors have collected their data—a primary data collection—it is important for them to explain in detail not only what happened, but why they made certain decisions. A good example is found in Bommer, Dierdorff, and Rubin’s (2007) study of group-level citizenship behaviors and job performance. We learn in their Methods how the participants were contacted (i.e., on site, by the study’s first author), how the data were obtained (i.e., in an on-site training room, from groups of 20–30 employees), what kinds of encouragement for participation were used (i.e., letters from both the company president and the researchers), and who reported the information for different constructs in the model (i.e., employees, supervisors, and managers of the supervisors). In addition, these authors reported other relevant pieces of information about their data collection. For example, they noted that employees and their supervisors were never scheduled to complete their questionnaires in the same room together. In addition, they reported a system of “checks and balances” to make sure supervisors reported performance for all of their direct reports. Providing these details, in addition to a full description of the characteristics of the analysis sample at the individual and team levels, allows reviewers to evaluate the strengths and weaknesses of a research design. Although it is reasonable to highlight the strengths of one’s research, reporting sufficient details on the strengths and potential weaknesses of the data collection is preferred over an approach that conceals important details, because certain compromises or flaws can also yield advantages. Consider the example of data collected with a snowball sampling approach in two waves separated by a few months. A disadvantage of this approach would likely be that the sample matched over the two waves will be smaller than the sample resulting if the researchers only contact wave 1 participants to participate in wave 2. But, this approach also has certain advantages. In particular, large numbers of one-wave participants (i.e., those that participated either in the first wave or the second wave) can be used to address response bias and representativeness issues straightforwardly.

In many other cases, the data for a study were obtained from archival sources. Here a researcher may not have access to all the nitty-gritty details of the data collection procedures, but completeness in reporting is no less important. Most, if not all, archival data sets come with technical reports or usage manuals that provide a good deal of detail. Armed with these, the researcher can attempt to replicate the detail of the data collection procedures and measures that is found in primary data collections. For a good example, using the National Longitudinal Survey and Youth Cohort (NLSY79), see Lee, Gerhart, Weller, and Trevor (2008). For other archival data collections, authors construct the dataset themselves, perhaps by coding corporate filings, media accounts, or building variables from other sources. In these cases, a complete description of how they identified the sample, how many observations were lost for different reasons, how they conducted the coding, and what judgment calls were made are necessary.

Regardless of the type of data set a researcher has used, the goals in this section are the same. First, authors should disclose the hows, whats, and whys of the research procedures. Including an Appendix with a full list of measures (and items, where appropriate), for example, is often a nice touch. Second, completeness allows readers to evaluate the advantages and disadvantages of the approach taken, which on balance, creates a more positive impression of the study. Third, a primary goal of the Methods section should be to provide sufficient information that someone could replicate the study and get the same results, if they used exactly the same procedure and data. After reading the Methods section, readers should have confidence that they could replicate the primary data collection or compile the same archival database that the authors are reporting.

Clarity

Far too often, authors fail to clearly explain what they have done. Although there are many potential examples, a typical, very common, problem concerns descriptions of measures. Reviewers are often concerned with language such as “we adapted items” or “we used items from several sources.” Indeed, not reporting how measures were adapted was the modal issue related to measurement in the evaluation of our decision letters. Ideally, authors can avoid these problems simply by using the full, validated measures of constructs when they are available. When this is not possible, it is imperative to provide a justification for the modifications and, ideally, to provide additional, empirical validation of the altered measures. If this information is not initially included, reviewers will invariably ask for it; providing the information up front improves the chances of a revision request.

Another very common clarity issue concerns the justification for variable coding. Coding decisions are made in nearly every quantitative study, but are perhaps most frequently seen in research involving archival data sets, experimental designs, and assignment of numerical codes based on qualitative responses. For example, Ferrier (2001) used structured content analysis to code news headlines for measures of competitive attacks. In an excellent example of clarity, Ferrier described in an organized fashion and with straightforward language how the research team made the coding decisions for each dimension and how these decisions resulted in operationalizations that matched the constitutive definitions of the competitive attack dimensions.

Credibility

Authors can do several uncomplicated things to enhance perceptions of credibility in their Methods sections. First, it is important to address why a particular sample was chosen. Reviewers often question why a particular sample was used, especially when it is not immediately obvious why the phenomenon of interest is important in the setting used. For example, in Tangirala and Ramanujam’s study of voice, personal control, and organizational identification, the authors opened the Methods by describing why they chose to sample front-line hospital nurses to test their hypotheses, noting (1) “they are well positioned to observe early signs of unsafe conditions in patient care and bring them to the attention of the hospital” and (2) “there is a growing recognition that the willingness of nurses to speak up about problems in care delivery is critical for improving patient safety and reducing avoidable medical errors (such as administration of the wrong drug), a leading cause of patient injury and death in the United States” (2008: 1,193). Second, it is always good practice to summarize the conceptual definition of a construct before describing the measure used for it. This not only makes it easier for readers—they don’t have to flip back and forth in the paper to find the constitutive definitions— but when done well will lessen reader concerns about whether the theory a paper presents matches the tests that were conducted. Third, it is always important to explain why a particular operationalization was used. For example, organizational performance has numerous dimensions. Some may be relevant to the hypotheses at hand, and others are not. We have often seen authors “surprise” reviewers by introducing certain dimensions with no justification. In cases in which alternative measures are available, authors should report what other measures they considered and why they were not chosen. If alternative measures are available in the data set, it is often a good idea to report the findings obtained when those alternative measures were used. Fourth, it is crucial to justify model specification and data analysis approaches. We have often seen authors include control variables without sufficiently justifying why they should be controlled for. For some types of data, multiple possible methods for analysis exist. Authors need to justify why one method rather than the other(s) was used. Panel data, for example, can be analyzed using fixed-effect models or random-effect models. Multiple event history analysis methods can analyze survival data. Each method has its specific assumption(s). In some cases, additional analysis is warranted to make the choice (for example, doing a Hausman test to choose between fixed- and random-effect models for panel data).

THE RESULTS

Completeness

Effectively writing a Results section is not an easy task, especially when one’s theoretical framework and/or research design is complex, making completeness all the more important. For starters, including a table for means, standard deviation, and correlations is a piece of “low-hanging fruit.” The information in this table may not have directly tested hypotheses, yet it paints an overall picture of the data, which is critical for judging the credibility of findings. For example, high correlations between variables often raise concerns about multicollinearity. A large standard deviation relative to the mean of a variable can raise concerns about outliers. Indeed it is a good practice to check data ranges and outliers in the process of data analyses so as to avoid having significant findings mainly driven by a few outliers. Distributional properties of variables (such as means and minimum and maximum values) reported in a table are informative by themselves. For example, in a study on CEO succession, means of variables that measured different types of CEO successions can tell the distribution of new CEOs in the sample recruited from different sources. These distributional properties describe the phenomenon of CEO successions and have important practical implications.

In reporting results, it is important to specify the unit of analysis, sample size, and dependent variable used in each model. This is especially crucial when such information varies across models. Take Arthaud-Day, Certo, Dalton, and Dalton (2006) as an example. These authors examined executive and director turnover following corporate financial restatements. They had four dependent variables: CEO turnover, CFO turnover, outside director turnover, and auditing commitment member turnover. In models of CEO and CFO turnover, because they were able to identify the month of the turnover, they constructed the data using “CEO/CFO” as the unit of analysis and used a Cox model to examine the timing of the executive turnover. The sample size of the model on CEO turnover was 485, and the sample size of the model on CFO turnover was 407. In comparison, in examining turnover of outside directors and audit committee members, because Arthaud-Day and her colleagues were unable to determine the month in which outside directors and audit committee members left office, they constructed the data using director/auditing committee member-year as the unit of analysis and used logistic regression to examine the likelihood of their turnover. The sample size of the model on outside director turnover was 2,668, and the sample size for auditing committee member turnover was 1,327. The take-away here is that careful descriptions such as those Arthaud-Day and colleagues provided help readers calibrate their interpretations of results and prevent reviewers from raising questions about clarification.

Clarity

The purpose of a Results section is to answer the research questions that have been posed and provide empirical evidence for the hypotheses (or note that evidence is lacking). We often see, however, that authors do not relate their findings to the study’s hypotheses. We also see that authors report the results in the Results section, but discuss their linkage with hypotheses in the Discussion section or, conversely, begin to discuss the implications of the findings in the Results prematurely, rather than doing this in the Discussion. In these cases, the authors fail to describe what the results indicate with respect to the focal topic of the study in a clear manner. To avoid this problem, it helps to summarize each hypothesis before reporting the related results. Try this format: “Hypothesis X suggests that . . . We find that . . . in model . . . in Table . . . Thus, Hypothesis X is (or isn’t) supported.” Although this format may sound mechanical or even boring, it is a very effective way to clearly report results (see also Bem, 1987). We encourage and welcome authors to experiment with novel and clearways to present results. We also suggest that authors report the results associated with their hypotheses in order, beginning with the first hypothesis and continuing sequentially to the last one, unless some compelling reasons suggest that a different order is better.

In many studies, the results do not support all the hypotheses. Yet results that are not statistically significant and those with signs opposite to prediction are just as important as those that are supported. However, as one editor noted, “If the results are contrary to expectations, I find authors will often try to ‘sweep them under the rug.’” Of course, reviewers will catch this immediately. Needless to say, sometimes such results reflect inadequate theorizing (e.g., the hypotheses are wrong, or at least there are alternative arguments and predictions). Other times, however, unsupported results are great fodder for new, critical thinking in a Discussion section. The point is that all results—significant or not, supporting or opposite to hypotheses— need to be addressed directly and clearly.

It is also a good practice to reference variables across sections in the same order—for example, describe their measures in the Methods section, list them in tables, and discuss results in the Results section all in the same order. Such consistency improves the clarity of exposition and helps readers to both follow the manuscript and find information easily. It also provides authors with a checklist so that they will remember to include relevant information (e.g., a variable included in the models is not mentioned in the Methods section and/or in the correlation matrix).

Credibility

Although every part of a paper plays an important role in helping or hurting its credibility (e.g., adequate theorizing and rigorous research design), there are some things authors can do in their Results sections to enhance the perceived credibility of findings. First, it is crucial to demonstrate to readers why one’s interpretations of results are correct. For example, a negative coefficient for an interaction term may suggest that the positive effect of the predictor became weaker, or disappeared, or even became negative as the value of the moderator increased. Plotting a significant interaction effect helps one visualize the finding and thus demonstrate whether the finding is consistent with the intended hypothesis. Aiken and West (1991) provided some “golden rules” on how to plot interaction effects in regressions. Beyond these, determining whether the simple slopes are statistically significant is often important in assessing whether one’s results fully support hypotheses; techniques developed by Preacher, Curran, and Bauer (2006) are helpful in these calculations.

Second, if alternative measurements, methods, and/or model specifications could be used for a study, but authors only report results using one possible choice, readers may have the impression that the authors “cherry-picked” findings that were consistent with the hypotheses. Supplementary analyses and robustness checks can address these concerns. For example, Tsai and Ghoshal (1998) examined the value creation role of a business unit’s position in intrafirm networks. Although they proposed the hypotheses at the individual business unit level, they generated several measures of business units’ attributes from data at the dyadic level. These steps raised some concerns about level of analysis and the reliability of the results. To address these concerns, they also analyzed data at the dyadic level and obtained consistent results.

Third, even if a result is statistically significant, readers may still ask, So what? A statistically significant effect is not necessarily a practically important effect. Authors typically discuss the practical implications of a study in their Discussion; they can, however, conduct and report additional analyses in Results to demonstrate the practical relevance of findings. A good example is found in Barnett and King’s (2008) study of spillover harm. These authors stated the following Hypothesis 1: “An error at one firm harms other firms in the same industry” (Barnett & King, 2008: 1,153). In addition to reporting the statistical significance of the predictor, the authors provided information to communicate the average scale of such spillovers. They reported that “following an accident that injured an average number of employees (3.5), a chemical firm with operations in the same industry as that in which an accident occurred could expect to lose 0.15 percent of its stock price” and that “after an accident that caused the death of an employee, the firm could expect to lose an additional 0.83 percent” (Barnett & King, 2008: 1,160). In other cases, authors may want to discuss the implications of small effect sizes, perhaps by noting how difficult it is to explain variance in a given dependent variable or, in the case, of an experiment, noting that a significant effect was found even though the manipulation of the independent variable was quite minimal (Prentice & Miller, 1992).

Conclusions

Crafting Methods and Results sections may not sound exciting or challenging. As a result, authors tend to pay less attention in writing them. Sometimes these sections are delegated to the junior members of research teams. However, in our experience as editors, we find that these sections often play a major, if not a critical, role in reviewers’ evaluations of a manuscript. We urge authors to take greater care in crafting these sections. The three-C rule—completeness, clarity, and credibility—is one recipe to follow in that regard.

REFERENCES

Aiken, L. S., & West, S. G. 1991. Multiple regression: Testing and interpreting interactions. Newbury Park, CA: Sage.

Arthaud-Day, M. L., Certo, S. T., Dalton, C. M., & Dalton, D. R. 2006. A changing of the guard: Executive and director turnover following corporate financial restatements. Academy of Management Journal, 49: 1119–1136.

Barnett, M. L., & King, A. A. 2008. Good fences make good neighbors: A longitudinal analysis of an industry self-regulatory institution. Academy of Management Journal, 51: 1150–1170.

Bem, D. J. 1987. Writing the empirical journal article. In M. P. Zanna & J. M. Darley, (Eds.), The compleat academic: A practical guide for the beginning social scientist: 171–201. New York: Random House.

Bommer, W. H., Dierdorff, E. C., & Rubin, R. S. 2007. Does prevalence mitigate relevance? The moderating effect of group-level OCB on employee performance. Academy of Management Journal, 50: 1481–1494.

Ferrier, W. J. 2001. Navigating the competitive landscape: The drivers and consequences of competitive aggressiveness. Academy of Management Journal, 44: 858–877.

Lee, T. H., Gerhart, B., Weller, I., & Trevor, C. O. 2008. Understanding voluntary turnover: Path-specific job satisfaction effects and the importance of unsolicited job offers. Academy of Management Journal, 51: 651–671.

Preacher, K. J., Curran, P. J., & Bauer, D. J. 2006. Computational tools for probing interaction effects in multiple linear regression, multilevel modeling, and latent curve analysis. Journal of Educational and Behavioral Statistics, 31: 437–448.

Prentice, D. A., & Miller, D. T. 1992. When small effects are impressive. Psychological Bulletin, 112: 160– 164.

Tangirila, S., & Ramanujam, R. 2008. Exploring nonlinearity in employee voice: The effects of personal control and organizational identification. Academy of Management Journal, 51: 1189–1203.

Tsai, W., & Ghoshal, S. 1998. Social capital and value creation: The role of intrafirm networks.Academy of Management Journal, 41: 464–474.

Yan (Anthea) Zhang

Rice University

Jason D. Shaw

University of Minnesota