Abstract
ABSTRACTMachine learning (ML) makes it possible to analyze large volumes of data and is an important tool in biomedical research. The use of ML methods can lead to improvements in diagnosis, treatment, and prevention of diseases. During the COVID pandemic, ML methods were used for predictions at the patient and community levels. Given the ubiquity of ML, it is important that future doctors, researchers and teachers get acquainted with ML and its contributions to research. Our goal is to make it easier for students and their professors to learn about ML. The learning module we present here is based on a small but relevant COVID dataset, videos, annotated code and the use of cloud computing platforms. The benefit of cloud computing platforms is that students don’t have to set up a coding environment on their computer. This saves time and is also an important democratization factor – allowing students to use old or borrowed computers (e.g., from a library), tablets or Chromebooks. As a result, this will benefit colleges geared toward underserved populations with limited computing infrastructure. We developed a beginner-friendly module focused on learning the basics of decision trees by applying them to COVID tabular data. It introduces students to basic terminology used in supervised ML and its relevance to research. The module includes two Python notebooks with pre-written code, one with practice exercises and another with its solutions. Our experience with biology students at San Francisco State University suggests that the material increases interest in ML.
Publisher
Cold Spring Harbor Laboratory