As you can see, tribble_paste() parsed this table into a tibble::tribble() function (and it was over 14000 rows!).
But wait…there’s more!
What if I found a problem with the table I just pasta ‘d into R (or another data set in R)? I don’t know about you, but I’m constantly needing to ask questions or share code on Stackoverflow or RStudio Community.
Well, the handy-dandy dpasta() function helps me create excellent reproducible examples.
Let’s assume I needed to share a sample from the height and weight data I just imported (WordHtWt).
First I’ll add some meaningful names to the columns in WordHtWt (using magrittr::set_names()),
Then take a small sample with help from dplyr (it’s a good idea to always use the smallest possible data frame to re-create the problem),
# new names world_height_weight_names <- c("country", "male_avg_ht_m", "male_avg_wt_kg", "male_bmi", "female_avg_ht_m", "female_avg_wt_kg", "female_bmi") # clean and set names WordHtWt %>% # set some better names magrittr::set_names(world_height_weight_names) %>% # get a sample for reprex dplyr::sample_frac(size = 0.10) %>% # PASTA!!! dpasta()
Now I have a nice bit of code I can post (and hopefully get my questions answered).
Friends and alternatives to datapasta
datapasta plays well with the reprex package. If you aren’t sure what reprex does, you should watch the webinar from Jenny Bryan (the package author). If you are looking for the base R alternative to dpasta(), there’s dput(), but the output is not as clean (and it doesn’t have a direct analog to the _paste() functions).
# clean and set names WordHtWt %>% # set some better names magrittr::set_names(world_height_weight_names) %>% # get a sample for reprex dplyr::sample_frac(size = 0.10) %>% # try this with dput() dput()
Check out the vignette for datapasta here, the tidyverse packages, and an excellent description of how to write a reproducible example from Advanced R by Hadley Wickham. Be sure to thank the datapasta author Miles McBain for all the future headaches he just saved you from.
This quick tutorial covers how to set up and query a MySQL database from the command line (Terminal) on macOS Sierra.
What is MySQL?
The SQL in MySQL stands for structured query language. There are half a dozen flavors of SQL, and MySQL is one of the most common. The My comes from the name of co-founder Michael Widenius’s daughter (fun fact: another flavor of SQL, MariaDB, is named after his younger daughter).
MySql is an open source relational database management system. Read more about MySQL on Wikipedia. Or check out the reference manual here.
Download and install the community edition of MySQL. You will be asked to create an account, but you can opt out and just click on “No thanks, just start my download.“
After downloading the dmg, you will be guided through the installation steps. On the Configuration options, I chose Use Strong Password Encryption
and on the next window I entered a password and checked the box for Start MySQL Server once the installation is complete
Or use brew install mysql if you have homebrew installed.
After the install finishes, you should see the MySQL icon in the System Preferences:
MySQL workbench (optional)
Download and install the workbench if you want to use an IDE for querying MySQL (I prefer DataGrip). You should read this documentation on the Workbench.
Install database drivers (using homebrew)
In a future post, I will be using RStudio to query a database using the RMySQL and RMariaDB packages. Follow these instructions here for installing the database drivers on your Mac.
You will be prompted for your password you used to setup MySQL–enter it into the Terminal. You should see this:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 8.0.13 MySQL Community Server - GPL
Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
The MySQL command line is below:
After installing MySQL community edition, you can choose to either run commands from the terminal or within a .sql script in the workbench. Below I demonstrate using MySQL from the command line.
Using MySQL commands in Terminal
To see the User and passwords, enter the following commands into the Terminal. The authentication_string will identify the passwords (but they are encrypted).
NOTE: a semi-colon is needed at the end of each MySQL command.
| User | authentication_string |
| mysql.infoschema | $A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED |
| mysql.session | $A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED |
| mysql.sys | $A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED |
| root | *D932DC725A9210F3B4C903D69F88EDC3AD447A06 |
4 rows in set (0.00 sec)
The MySQL commands are working! Let’s build a database!
Building a MySQL database
The lahman2016 database is freely available full of information on baseball players and teams from 1871 through 2016. You can download it here.
After downloading the zipped database into a local data folder I find the following files.
I recently purchased and read Ty Cobb: A Terrible Beauty by Charles Leerhsen. I decided to get this book after watching Leerhsen’s lecture at Hillsdale college. I’d always thought of Ty Cobb as the racist curmudgeon portrayed by Tommy Lee Jones in the 1994 film Cobb. Even before seeing this film, Ty Cobb’s reputation for being rotten was pervasive–when referring to him in Field of Dreams, Shoeless Joe Jackson states, “No one liked that son of a bitch.“
Unfortunately for Cobb (and anyone interested in the truth), these portrayals of the baseball great’s life and character are highly fictionalized. Most of the popular opinions of Ty Cobb come from two biographies: Charles C. Alexander’s Ty Cobb, and Al Stump’s Cobb. These author’s construct a narrative that depicts Cobb as a drunk, belligerent bully who used to sharpen his cleats and scream racial epithets at his hired help.
Leerhsen does a fantastic job addressing how these stories are more likely to be based on fiction than facts, pushed by the authors to increase their book sales. After all, a baseball star who is a racist jerk will elicit a (well deserved) sense of outrage and disgust, thereby attracting more attention.
From the epilogue,
This Cobb was someone they could shake their head at, denounce, and feel superior to. Spinning stories in a way that made him look immoral was a convenient way to say, “I am not a racist because I reject this man who is.” Cultures change as values change, wars are waged and the harvest waxes and wanes, but a villain who inspires self-congratulation makes for one hell of a tenacious myth.
The tragedy of Ty Cobb’s narrative is the insightful baseball and general life lessons the man had to offer. Leerhsen distills Cobb’s philosophy on baseball into two words: pay attention. Cobb would spend endless hours mentally rehearsing the game, taking notes, and thinking up possible scenarios and plays. He also paid attention to the minds of his opponents.
Another example from Leerhsen,
After [Cobb] noticed how upset the good-hearted Big Train got when he beaned batters, Cobb stood in against him as he did against nobody else, hunching over the plate and sticking his head into the strike zone. He could have gotten killed; instead, very often, he got walked.
Anyone who has read Moneyball and knows the importance of walks and on-base percentage sees the genius of Ty Cobb at work here.
The takeaway lesson I have from this book isn’t actually from the book. It was a woman who stood and gave praise during the Q&A portion of Leerhsen’s lecture,
“…you’ve written a cautionary tale that in a complicated political season has a lesson for us…what happened to Ty Cobb could not happen today because everybody knows everything, but it does happen. So thank you for your courage in writing a book that reminds us that we don’t know everything until we really know somebody and that everything that we think we know we should re-examine several times with a clear conscience and our own integrity before we make those judgments. Thank you very much…”
We should have been more careful in how we used the data to help guide where to report out our stories on inadequate internet, and we were reminded of an important lesson: that just because a data set comes from reputable institutions doesn’t necessarily mean it’s reliable.
An article like this takes courage. In the era of ‘Fake News’ and ‘alternative facts’, it’s refreshing to see this kind of honesty from a media source that relies so much on evidence-based reporting. I imagine an article like this must’ve been painful to write, but I respect the authors more after reading it. That’s when I thought of the comment from Leerhsen’s lecture, and when I noticed how important it is to think about valuing integrity.
Wikipedia defines integrity in ethics as, “the honesty and truthfulness or accuracy of one’s actions.” I tend to think of it as, “doing what you know is right even when no one is looking.”