"

How many characters are there in 700000? It depends!

Today’s fun #rstats discovery while working on a client project for rfortherest
Author
Affiliation

Building Stories with Data

Published

May 13, 2022

Quick demo and explanation below!

Console display showing 'nchar' function applied to values. The function returns character count: 123456 gives 6, 700000 as a number gives 5, and '700000' as a string gives 6. Highlights difference in count between numeric and string inputs.

First, why? I’m working on an image-driven dashboard which users can sort by the value represented in the plot. We’ll share more about that soon - the short story is that I’m creating character-sortable strings containing the value using str_pad() to make the all the same width.

So, what’s going on?

The answer is default scientific formatting behind the scenes!

R console screenshot showing two commands. The first, `nchar(as.character(700000))`, returns 5. The second, `as.character(700000)`, returns '7e+05'. Demonstrates character length and scientific notation in R. Black terminal with colourful text for clarity.

Good to know!

How do we avoid that?

You can switch off scientific formatting in your options(), but if you don’t want to do that, just use format(x, scientific = F).

A tad convoluted, but it did the job I needed within the constraints of the project!

R code output showing formatting of the number 700,000. First line formats it in scientific notation as '7e+05'. Second line uses non-scientific format, displaying '700000'. Third line calculates the character length of the formatted number, showing 6.

What did others suggest?

For what it’s worth, str_length() has the same behaviour. And this also isn’t the safest long term solution because it is based on how the numbers are printed rather than on the numbers themselves. It just did the trick within the constraints of the project (i.e. I knew the max number wouldn’t break this solution).

When I shared this on Twitter, the best solution came from Hadley Wickham (thank you Hadley!):

I’d suggest using sprintf() instead — it’s designed specifically for the use case of padding numbers. str_pad() would also break here if you ever get a non integer

When I asked how I would make sure it would pad to the same nchar as the longest integer in the data, here was his suggestion:

You can figure that out with something like floor(max(log10(x)))

Reuse

Citation

For attribution, please cite this work as:
Thompson, Cara. 2022. “How Many Characters Are There in 700000? It Depends!” May 13, 2022. https://www.cararthompson.com/posts/2022-05-13-todays-fun-rstats-discovery-while/.