Hypothesis Formalization: Empirical Findings, Software Limitations, and Design Implications

Author:

Jun Eunice1ORCID,Birchfield Melissa1ORCID,De Moura Nicole2ORCID,Heer Jeffrey1,Just René1ORCID

Affiliation:

1. University of Washington, Seattle, WA

2. Eastlake High School, Seattle, WA

Abstract

Data analysis requires translating higher level questions and hypotheses into computable statistical models. We present a mixed-methods study aimed at identifying the steps, considerations, and challenges involved in operationalizing hypotheses into statistical models, a process we refer to as hypothesis formalization . In a formative content analysis of 50 research papers, we find that researchers highlight decomposing a hypothesis into sub-hypotheses, selecting proxy variables, and formulating statistical models based on data collection design as key steps. In a lab study, we find that analysts fixated on implementation and shaped their analyses to fit familiar approaches, even if sub-optimal. In an analysis of software tools, we find that tools provide inconsistent, low-level abstractions that may limit the statistical models analysts use to formalize hypotheses. Based on these observations, we characterize hypothesis formalization as a dual-search process balancing conceptual and statistical considerations constrained by data and computation and discuss implications for future tools.

Publisher

Association for Computing Machinery (ACM)

Subject

Human-Computer Interaction

Reference87 articles.

1. Futzing and Moseying: Interviews with Professional Data Analysts on Exploration Practices

2. Fitting linear mixed-effects models using lme4;Bates Douglas;arXiv:1406.5823,2014

3. Package ‘lme4‘;Bates Douglas;CRAN,2019

4. Characterizing Exploratory Visual Analysis: A Literature Review and Evaluation of Analytic Provenance in Tableau

5. Jacques Bertin. 2011. Graphics and Graphic Information Processing. Walter de Gruyter.

Cited by 5 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Technology transfer for sustainable rural development: evidence from homestead withdrawal with compensation in Chengdu–Chongqing;The Journal of Technology Transfer;2023-10-03

2. Model Sketching: Centering Concepts in Early-Stage Machine Learning Model Design;Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems;2023-04-19

3. Empowering domain experts to author valid statistical analyses;The Adjunct Publication of the 35th Annual ACM Symposium on User Interface Software and Technology;2022-10-28

4. Tisane: Authoring Statistical Models via Formal Reasoning from Conceptual and Data Relationships;CHI Conference on Human Factors in Computing Systems;2022-04-29

5. Apéritif: Scaffolding Preregistrations to Automatically Generate Analysis Code and Methods Descriptions;CHI Conference on Human Factors in Computing Systems;2022-04-29

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3