Exploratory data analysis

Motivation In my graduate program we had an excellent text on Regression Methods in Biostatistics (which I will refer to as VGSM). Unfortunately, all of the examples and code from the courses, labs, and text were in Stata. As an R user, I kept having to translate the topics from Stata into R. But this […]

Scraping wikipedia tables

I recently ran into an issue in which I needed to create a long regular expression to match medications (antibiotics, specifically) in a giant medications table from an electronic health record (EHR). I wasn’t given a list of antibiotics to look for, but I did know this would end up as a binary (yes/no) variable […]

Reproduce Tables and Figures (3/3)

Martin Frigaard 11/15/2017 This is the third of three posts for setting up a data project. The first post dealt with creating project folders and downloading files and data from the Internet. The second post went over a bit of data wrangling. In this post I will attempt to recreate come of the table counts […]

Reading and wrangling data (2/3)

Martin Frigaard 11/15/2017 In my last post I set up my project folders and downloaded the data sets for the article, “Contagion in Mass Killings and School Shootings”. Reading data into RStudio I am going to read the data into the RStudio environment. I recommend using read_csv for plain text files (this function is from […]

Setting Up Data Project Folders (1/3)

Martin Frigaard 11/15/2017 I re-wrote and published these after reading David Robinson’s excellent post on varianceexplained.org. Check him out on Twitter and take his new tidyverse course on DataCamp . File and folder organization are topics I was never explicitly taught, and I think it’s tragic. Organizing your project files can help you think through […]

Tools & Resources for Learning Stata

The Resources for learning Stata page has most of the sites I describe below. Unfortunately, their list is also riddled with link rot, and many of the resources use ancient versions/commands. I’ve only included the active sites that I’ve actually used for analysis/projects. *I’ll continue updating as I find more resources.  My first post on this […]

Tools & Resources for Learning R

R is quickly becoming one of the most popular statistical software choices for data science, so it’s probably a good idea for anyone interested in analysis/statistics to become familiar with its commands/interface. It might seem a bit daunting at first, but R has so many resources available for free it seems like the only things […]