This book provides a practical, application-driven guide to using R for public health and health data science, accessible to both beginners and those with some coding experience. Each module starts with data as the driver of analysis before introducing and breaking down the programming concepts needed to tackle the analysis in a step-by-step manner. This book aims to equip readers by offering a practical and approachable programming guide tailored to those in health-related fields. Going beyond simple R examples, the programming principles and skills developed will give readers the ability to apply R skills to their own research needs. Practical case studies in public health are provided throughout to reinforce learning.
Topics include data structures in R, exploratory analysis, distributions, hypothesis testing, regression analysis, and larger scale programming with functions and control flows. The presentation focuses on implementation with R and assumes readers have had an introduction to probability, statistical inference and regression analysis.
Key features:
· Includes practical case studies.
· Explains how to write larger programmes.
· Contains additional information on Quarto.
Alice Paul is an Assistant Professor of Biostatistics and Teaching Scholar, holding a Ph.D. in Operations Research from Cornell University. With six years of teaching experience at the undergraduate, master’s, and Ph.D. levels, she instructed students in diverse fields, including biostatistics, engineering, computer science, and data science at both Brown University and Olin College of Engineering.
Preface Acknowledgments 1 Getting Started with R 1.1 Why R? 1.1.1 Installation of R and RStudio 1.2 The R Console 1.2.1 Basic Computations and Objects 1.2.2 Naming Conventions 1.3 RStudio and Quarto 1.3.1 Panes 1.3.2 Calling Functions 1.3.3 Working Directories and Paths 1.3.4 Installing and Loading Packages 1.4 RStudio Projects and RStudio Global Options 1.5 Tips and Reminders 2 Data Structures in R 2.1 Data Types 2.2 Vectors 2.2.1 Indexing a Vector 2.2.2 Modifying a Vector and Calculations 2.2.3 Practice Question 2.2.4 Common Vector Functions 2.3 Factors 2.4 Matrices 2.4.1 Indexing a Matrix 2.4.2 Modifying a Matrix 2.4.3 Practice Question 2.5 Data Frames 2.5.1 Indexing a Data Frame 2.5.2 Modifying a Data Frame 2.5.3 Practice Question 2.6 Lists 2.7 Exercises 3 Working with Data Files in R 3.1 Importing and Exporting Data 3.2 Summarizing and Creating Data Columns 3.2.1 Column Summaries 3.2.2 Practice Question 3.2.3 Other Summary Functions 3.2.4 Practice Question 3.2.5 Missing, Infinite, and NaN Values 3.2.6 Dates in R 3.3 Using Logic to Subset, Summarize, and Transform 3.3.1 Practice Question 3.3.2 Other Selection Functions 3.4 Exercises I Introduction to R 4 Intro to Exploratory Data Analysis 4.1 Univariate Distributions 4.1.1 Practice Question 4.2 Bivariate Distributions 4.2.1 Practice Question 4.3 Autogenerated Plots 4.4 Tables 4.5 Exercises 5 Data Transformations and Summaries 5.1 Tibbles and Data Frames 5.2 Subsetting Data 5.2.1 Practice Question 5.3 Updating Rows and Columns 5.3.1 Practice Question 5.4 Summarizing and Grouping 5.4.1 Practice Question 5.5 Exercises 6 Case Study: Cleaning Tuberculosis Screening Data 7 Merging and Reshaping Data 7.1 Tidy Data 7.2 Reshaping Data 7.2.1 Practice Question 7.3 Merging Data with Joins 7.3.1 Practice Question 7.4 Exercises 8 Visualization with ggplot2 8.1 Intro to ggplot 8.1.1 Practice Question 8.2 Adjusting the Axes and Aesthetics 8.3 Adding Groups 8.3.1 Practice Question 8.4 Extra Options 8.5 Exercises 9 Case Study: Exploring Early COVID-19 Data 9.1 Pre-processing 9.2 Mobility and Cases Over Time II Exploratory Analysis 10 Probability Distributions in R 10.1 Probability Distributions in R 10.1.1 Random Samples 10.1.2 Density Function 10.1.3 Cumulative Distribution 10.1.4 Quantile Distribution 10.1.5 Reference List for Probability Distributions 10.1.6 Practice Question 10.2 Empirical Distributions and Sampling Data 10.2.1 Practice Question 10.3 Exercises 11 Hypothesis Testing 11.1 Univariate Distributions and One-Sample Tests 11.1.1 Practice Question 11.2 Correlation and Covariance 11.3 Two-Sample Tests for Continuous Variables 11.3.1 Practice Question 11.3.2 Two-Sample Variance Tests 11.4 Two-Sample Tests for Categorical Variables 11.4.1 Practice Question 11.5 Adding Hypothesis Tests to Summary Tables 11.6 Exercises 12 Case Study: Analyzing Blood Lead Level and Hypertension III Distributions and Hypothesis Testing 13 Linear Regression 13.1 Simple Linear Regression 13.1.1 Practice Question 13.2 Multiple Linear Regression 13.3 Diagnostic Plots and Measures 13.3.1 Normality 13.3.2 Homoscedasticity, Linearity, and Collinearity 13.3.3 Practice Question 13.3.4 Leverage and Influence 13.4 Interactions and Transformations 13.4.1 Practice Question 13.5 Evaluation Metrics 13.6 Stepwise Selection 13.7 Exercises 14 Logistic Regression 14.1 Generalized Linear Models in R 14.1.1 Practice Question 14.2 Residuals, Discrimination, and Calibration 14.2.1 Receiver Operating Characteristic (ROC) Curve 14.2.2 Calibration Plot 14.2.3 Practice Question 14.3 Variable Selection and Likelihood Ratio Tests 14.4 Extending Beyond Binary Outcomes 14.5 Exercises 15 Model Selection 15.1 Regularized Regression 15.2 Elastic Net 15.3 Best Subset 15.4 Exercises 16 Case Study: Predicting Tuberculosis Risk 16.1 Model Selection 16.2 Evaluate Model on Validation Data IV Regression 17 Logic and Loops 17.1 Logic and Conditional Expressions 17.1.1 Practice Question 17.2 Loops 17.2.1 Practice Question 17.3 Avoiding Control Flows with Functions 17.4 Exercises 18 Functions 18.1 Components of a Function 18.1.1 Arguments 18.1.2 Practice Question 18.1.3 Return Values 18.1.4 Scope of Objects 18.1.5 Functions within Functions and Returning Functions 18.2 Documenting Functions 18.2.1 Practice Question 18.3 Debugging and Testing 18.3.1 Unit tests 18.3.2 Practice Question 18.4 Exercises 19 Case Study: Designing a Simulation Study 19.1 Outlining Our Approach 19.2 Coding Our Simulation Study 19.3 Results 20 Writing Efficient Code 20.1 Use Fast and Vectorized Functions 20.1.1 Practice Question 20.2 Avoid Copies and Duplicates objects!copies 20.2.1 Practice Question 20.3 Parallel Programming 20.4 Exercises V Writing Larger Programs 21 Expanding your R Skills 21.1 Reading Documentation for New Packages 21.2 Trying Simple Examples 21.3 Deciphering Error Messages and Warnings 21.3.1 Debugging Code 21.4 General Programming Tips 21.5 Exercises 22 Writing Reports in Quarto 22.1 Starting a Quarto file 22.1.1 Adding Code Chunks 22.1.2 Customizing Chunks 22.2 Formatting Text in Markdown 22.3 Formatting Figures and Tables 22.3.1 Using References 22.4 Adding in Equations 22.5 Exercises References
Height:
Width:
Spine:
Weight:453.00