Quickly import and export data from R with datapasta

EDIT: As usual, Mara Averick was way ahead of me on these two packages 🙂 She wrote a great example (with much cleaner .gifs), so be sure to check out her post too!

https://maraaverick.rbind.io/2018/10/reprex-with-datapasta/
follow her on twitter too! -> https://twitter.com/dataandme

The datapasta package is great for importing and exporting data. It’s copy + paste ability transforms just about any rectangular data you can drag your mouse over into data.frame‘s or tibble‘s.

Using datapasta

Check out the data in this table on male and female height and weights. I can highlight it all, click cmd + c (or ctrl + c),

Now I can head back to RStudio and enter the following in a fresh .R script.

library(datapasta)
library(tidyverse)
datapasta::tribble_paste()

You should see something like this:

ahhh pure bliss!

Pretty slick, huh? Unfortunately, when I try to run this code, I get the following error:

Error in list2(…) : object 'NANA' not found

It looks like the NANA values are throwing the tribble_paste() function off. No worries, datapasta also has a df_paste() function too!

Can this get any better?

Now I can also go through and edit the data.frame() function a bit to remove the pesky NA columns and values.

What if my dataset is really big?

Maybe you have a huge dataset you want to import into R, but you’re not sure if datapasta can handle it? All it takes is a little adjustment on the datapasta::dp_set_max_rows() function.

For example, if I wanted to copy + paste this table into an RStudio session, I could enter datapasta::dp_set_max_rows(num_rows = 15000) in my .R script just above the tribble_paste() function.

datapasta::dp_set_max_rows(num_rows = 15000)
datapasta::tribble_paste()
it’s slow, but it gets there…

As you can see, tribble_paste() parsed this table into a tibble::tribble() function (and it was over 14000 rows!).

But wait…there’s more!

What if I found a problem with the table I just pasta ‘d into R (or another data set in R)? I don’t know about you, but I’m constantly needing to ask questions or share code on Stackoverflow or RStudio Community.

Well, the handy-dandy dpasta() function helps me create excellent reproducible examples.

Let’s assume I needed to share a sample from the height and weight data I just imported (WordHtWt).

  • First I’ll add some meaningful names to the columns in WordHtWt (using magrittr::set_names()),
  • Then take a small sample with help from dplyr (it’s a good idea to always use the smallest possible data frame to re-create the problem),
  • And…
# new names
world_height_weight_names <- c("country", "male_avg_ht_m",
"male_avg_wt_kg", "male_bmi",
"female_avg_ht_m", "female_avg_wt_kg",
"female_bmi")
# clean and set names
WordHtWt %>%
# set some better names
magrittr::set_names(world_height_weight_names) %>%
# get a sample for reprex
dplyr::sample_frac(size = 0.10) %>%
# PASTA!!!
dpasta()
Voila!

Now I have a nice bit of code I can post (and hopefully get my questions answered).

Friends and alternatives to datapasta

datapasta plays well with the reprex package. If you aren’t sure what reprex does, you should watch the webinar from Jenny Bryan (the package author). If you are looking for the base R alternative to dpasta(), there’s dput(), but the output is not as clean (and it doesn’t have a direct analog to the _paste() functions).


# clean and set names
WordHtWt %>%
# set some better names
magrittr::set_names(world_height_weight_names) %>%
# get a sample for reprex
dplyr::sample_frac(size = 0.10) %>%
# try this with dput()
dput()

structure(list(country = c("Taiwan", "Burma", "Kazakhstan", "Bolivia",
"Belgium", "Mali", "Mauritius", "Laos", "Burundi", "France",
"Mexico", "Nigeria", "Turkey"), male_avg_ht_m = c("1.73 m", "1.65 m",
"1.72 m", "1.67 m", "1.81 m", "1.72 m", "1.71 m", "1.60 m", "1.68 m",
"1.79 m", "1.68 m", "1.67 m", "1.74 m"), male_avg_wt_kg = c("74.8 kg",
"60.4 kg", "77.8 kg", "70.6 kg", "87.8 kg", "67.7 kg", "71.9 kg",
"57.9 kg", "61.5 kg", "83.3 kg", "77.6 kg", "63.0 kg", "82.4 kg"
), male_bmi = c(25, 22.2, 26.3, 25.3, 26.8, 22.9, 24.6, 22.6,
21.8, 26, 27.5, 22.6, 27.2), female_avg_ht_m = c("1.60 m", "1.54 m",
"1.60 m", "1.53 m", "1.65 m", "1.61 m", "1.57 m", "1.51 m", "1.55 m",
"1.65 m", "1.56 m", "1.58 m", "1.60 m"), female_avg_wt_kg = c("60.7 kg",
"54.5 kg", "68.1 kg", "64.8 kg", "70.0 kg", "59.9 kg", "64.1 kg",
"52.4 kg", "51.7 kg", "66.4 kg", "69.4 kg", "59.9 kg", "73.7 kg"
), female_bmi = c(23.7, 23, 26.6, 27.7, 25.7, 23.1, 26, 23, 21.5,
24.4, 28.5, 24, 28.8)), row.names = c(NA, -13L), class = "data.frame")

Additional resources

Check out the vignette for datapasta here, the tidyverse packages, and an excellent description of how to write a reproducible example from Advanced R by Hadley Wickham. Be sure to thank the datapasta author Miles McBain for all the future headaches he just saved you from.