"

An easy trick to sort out all the line breaks when copying text from a PDF

Copy-pasting from a PDF into another format (say, Rmd or qmd or Word) and getting fed up with the line breaks? stringr::str_squish() is your friend!
Author
Affiliation

Building Stories with Data

Published

September 1, 2022

Copy-pasting from a PDF into another format (say, Rmd or qmd or Word) and getting fed up with the line breaks? stringr::str_squish() is your friend!

#rstats - also useful for removing manual processes from text editing regardless of the format of the destination document 🤫

The output from a straight copy-paste looks something like this…

This text was taken from a PDF and there are loads of random page breaks. Boo!

# Quick solution:
stringr::str_squish(
  "This text was taken
   from a PDF and there are loads of
   random page
   breaks. Boo!"
)

# No more deleting line breaks by hand!

See also str_to_sentence and str_to_title for getting rid of all caps, or for… creating a title 🪄

stringr::str_to_sentence(
  "THIS IS IN ALL CAPS BUT IT WOULD BE BETTER IF IT WASN'T!"
)
stringr::str_to_sentence(
  "Let's make this a title instead"
)

P.S. str_to_sentence() doesn’t currently keep a capital I by default. Has this already been explored and decided against?

stringr::str_to_sentence(
  "THIS IS A SENTENCE I'LL NEED TO EDIT MANUALLY. SAD FACE."
)

P.P.S. It will also remove paragraph breaks!


When I posted this online, I had no idea this would be so popular. Clearly, a more common problem than I realised! I couldn’t help thinking there must be some better solutions to my hacky off-label use of str_squish()

Here’s a shiny app you can try instead!

Reuse

Citation

For attribution, please cite this work as:
Thompson, Cara. 2022. “An Easy Trick to Sort Out All the Line Breaks When Copying Text from a PDF.” September 1, 2022. https://www.cararthompson.com/posts/2022-09-01-copy-pasting-from-a-pdf-into/.