## Quickly import and export data from R with datapasta

EDIT: As usual, Mara Averick was way ahead of me on these two packages 🙂 She wrote a great example (with much cleaner .gifs), so be sure to check out her post too!

https://maraaverick.rbind.io/2018/10/reprex-with-datapasta/

follow her on twitter too! -> https://twitter.com/dataandme

The datapasta package is great for importing and exporting data. It’s `copy`

`+`

`paste`

ability transforms just about any rectangular data you can drag your mouse over into `data.frame`

‘s or `tibble`

‘s.

# Using datapasta

Check out the data in this table on male and female height and weights. I can highlight it all, click `cmd`

`+`

`c`

(or `ctrl`

`+`

`c`

),

Now I can head back to RStudio and enter the following in a fresh .R script.

library(datapasta)

library(tidyverse)

datapasta::tribble_paste()

You should see something like this:

Pretty slick, huh? Unfortunately, when I try to run this code, I get the following error:

Error in list2(…) : object 'NANA' not found

It looks like the `NANA`

values are throwing the `tribble_paste()`

function off. No worries, `datapasta`

also has a `df_paste()`

function too!

Now I can also go through and edit the `data.frame()`

function a bit to remove the pesky `NA`

columns and values.

## What if my dataset is really big?

Maybe you have a **huge** dataset you want to import into R, but you’re not sure if `datapasta`

can handle it? All it takes is a little adjustment on the `datapasta::dp_set_max_rows()`

function.

For example, if I wanted to `copy`

`+`

`paste`

this table into an RStudio session, I could enter `datapasta::dp_set_max_rows(num_rows = 15000)`

in my .R script just above the `tribble_paste()`

function.

datapasta::dp_set_max_rows(num_rows = 15000)

datapasta::tribble_paste()

As you can see, `tribble_paste()`

parsed this table into a `tibble::tribble()`

function (and it was over 14000 rows!).

## But wait…there’s more!

What if I found a problem with the table I just `pasta`

‘d into R (or another data set in R)? I don’t know about you, but I’m *constantly* needing to ask questions or share code on Stackoverflow or RStudio Community.

Well, the handy-dandy

function helps me create excellent reproducible examples. `dpasta()`

Let’s assume I needed to share a sample from the height and weight data I just imported (`WordHtWt`

).

- First I’ll add some meaningful names to the columns in
`WordHtWt`

(using`magrittr::set_names()`

), - Then take a small sample with help from
`dplyr`

(it’s a good idea to always use the smallest possible data frame to re-create the problem), - And…

# new names

world_height_weight_names <- c("country", "male_avg_ht_m",

"male_avg_wt_kg", "male_bmi",

"female_avg_ht_m", "female_avg_wt_kg",

"female_bmi")

# clean and set names

WordHtWt %>%

# set some better names

magrittr::set_names(world_height_weight_names) %>%

# get a sample for reprex

dplyr::sample_frac(size = 0.10) %>%

# PASTA!!!

dpasta()

Now I have a nice bit of code I can post (and hopefully get my questions answered).

## Friends and alternatives to datapasta

`datapasta`

plays well with the `reprex`

package. If you aren’t sure what `reprex`

does, you should watch the webinar from Jenny Bryan (the package author). If you are looking for the base R alternative to `dpasta()`

, there’s `dput()`

, but the output is not as clean (and it doesn’t have a direct analog to the `_paste()`

functions).

# clean and set names

WordHtWt %>%

# set some better names

magrittr::set_names(world_height_weight_names) %>%

# get a sample for reprex

dplyr::sample_frac(size = 0.10) %>%

# try this with dput()

dput()

structure(list(country = c("Taiwan", "Burma", "Kazakhstan", "Bolivia",

"Belgium", "Mali", "Mauritius", "Laos", "Burundi", "France",

"Mexico", "Nigeria", "Turkey"), male_avg_ht_m = c("1.73 m", "1.65 m",

"1.72 m", "1.67 m", "1.81 m", "1.72 m", "1.71 m", "1.60 m", "1.68 m",

"1.79 m", "1.68 m", "1.67 m", "1.74 m"), male_avg_wt_kg = c("74.8 kg",

"60.4 kg", "77.8 kg", "70.6 kg", "87.8 kg", "67.7 kg", "71.9 kg",

"57.9 kg", "61.5 kg", "83.3 kg", "77.6 kg", "63.0 kg", "82.4 kg"

), male_bmi = c(25, 22.2, 26.3, 25.3, 26.8, 22.9, 24.6, 22.6,

21.8, 26, 27.5, 22.6, 27.2), female_avg_ht_m = c("1.60 m", "1.54 m",

"1.60 m", "1.53 m", "1.65 m", "1.61 m", "1.57 m", "1.51 m", "1.55 m",

"1.65 m", "1.56 m", "1.58 m", "1.60 m"), female_avg_wt_kg = c("60.7 kg",

"54.5 kg", "68.1 kg", "64.8 kg", "70.0 kg", "59.9 kg", "64.1 kg",

"52.4 kg", "51.7 kg", "66.4 kg", "69.4 kg", "59.9 kg", "73.7 kg"

), female_bmi = c(23.7, 23, 26.6, 27.7, 25.7, 23.1, 26, 23, 21.5,

24.4, 28.5, 24, 28.8)), row.names = c(NA, -13L), class = "data.frame")

## Additional resources

Check out the vignette for datapasta here, the tidyverse packages, and an excellent description of how to write a reproducible example from Advanced R by Hadley Wickham. Be sure to thank the `datapasta`

author Miles McBain for all the future headaches he just saved you from.