Errorbars on barcharts


An errorbar is a graphical indication of the standard deviation of a measurement. Or a confidence interval.

The mean of a measurement is something. We want to illustrate how confident we are of that value.

How do we plot that?

I am going to use three libraries:


ggplot2 for plotting, dplyr to make manipulating data easier. And tibble to support a single line later.

Then, we need some data.

data <- data.frame(

I now have a dataframe with three variables, name, mean and standard deviation (sd).

How do we plot that?


data %>% 
  ggplot() +
  geom_bar( aes(x=name, y=mean), stat="identity", fill="skyblue") +
  geom_errorbar(aes(x=name, ymin=mean-sd, ymax = mean+sd), width=0.1)

plot of chunk unnamed-chunk-12

So. What on earth was that?

The pipe-operator %>% takes whatever is on the left-hand-side, and inserts it as the first variable in whatever is on the right-hand-side.

What is on the right-hand-side is the ggplot function. That would normally have a datat=something as the first variable. Here that is the data we constructed earlier.

To that initial plot, which is completely empty, we add a geom_bar. That plots the bars on the plot. It takes an x-value, name, and a y-value, mean. And we tell the function, that rather than counting the number of observations of each x-value (the default behavior of geom_bar), it should use the y-values provided. We also want a nice lightblue color for the bars.

To that bar-chart, we now add errorbars. geom_errorbar needs to know the x- and y-values of the bars, in order to place them correctly. It also needs to know where to place the upper errorbar, and the lower errorbar. And we supply the information that ymin, the lower, should be the mean value minus the standard deviation. And the upper bar, ymax, the sum of the mean and sd. Finally we need to decide how broad those lines should be. We do that by writing “width=0.1”. We do not actually have to, but the default value results in an ugly plot.

And there you go, a barchart with errorbars!

Next step

That was all very nice. However! We do not usually have a nice dataframe with means and standard deviations calculated directly. More often, we have a dataframe like this:

mtcars %>% 
  remove_rownames() %>% 
  select(cyl, disp) %>% 
##   cyl disp
## 1   6  160
## 2   6  160
## 3   4  108
## 4   6  258
## 5   8  360
## 6   6  225

I’ll get back to what the code actually means later.

Here we have 32 observations (only 6 shown above), of the variables “cyl” and “disp”. I would now like to make a barplot of the mean value of disp for each of the three different values or groups in cyl (4,6 and 8). And add the errorbars.

You could scroll through all the data, sort them by cyl, manually count the number of observations in each group, add the disp, divide etc etc etc.

But there is a simpler way:

mtcars %>% 
  remove_rownames() %>% 
  select(cyl, disp) %>% 
  group_by(cyl) %>% 
  summarise(mean=mean(disp), sd=sd(disp))
## # A tibble: 3 x 3
##     cyl  mean    sd
##   <dbl> <dbl> <dbl>
## 1     4  105.  26.9
## 2     6  183.  41.6
## 3     8  353.  67.8

mtcars is a build-in dataset (cars in the us in 1974). I send that, using the pipe-operator, to the function remove_rownames, that does exactly that. We don’t need them, and they will just confuse us. That result is then send to the function select, that selects the two columns/variables cyl and disp, and discards the rest. Next, we group the data according to the value of cyl. There are three different values, 4, 6 and 8. And then we use the summarise function, to calculate the mean and the standard deviation of disp, for each of the three groups.

Now we should be ready to plot. We just send the result above to the plot function from before:

mtcars %>% 
  remove_rownames() %>% 
  select(cyl, disp) %>% 
  group_by(cyl) %>% 
  summarise(mean=mean(disp), sd=sd(disp)) %>% 
  ggplot() +
    geom_bar( aes(x=cyl, y=mean), stat="identity", fill="skyblue") +
    geom_errorbar(aes(x=cyl, ymin=mean-sd, ymax = mean+sd), width=0.1)

plot of chunk unnamed-chunk-15
All we need to remember is to change “name” in the original to “cyl”. All done!

But wait! There is more!!

Those errorbars can be shown in more than one way.

Let us start by saving our means and sds in a dataframe:

data <- mtcars %>% 
  remove_rownames() %>% 
  select(cyl, disp) %>% 
  group_by(cyl) %>% 
  summarise(mean=mean(disp), sd=sd(disp))

geom_crossbar results in this:

data %>% 
ggplot() +
  geom_bar( aes(x=cyl, y=mean), stat="identity", fill="skyblue") +
  geom_crossbar( aes(x=cyl, y=mean, ymin=mean-sd, ymax=mean+sd))

plot of chunk unnamed-chunk-17

I think it is ugly. But whatever floats your boat.

Then there is just a vertival bar, geom_linerange. I think it makes it a bit more difficult to compare the errorbars. On the other hand, it results in a plot that is a bit more clean:

data %>% ggplot() +
  geom_bar( aes(x=cyl, y=mean), stat="identity", fill="skyblue") +
  geom_linerange( aes(x=cyl, ymin=mean-sd, ymax=mean+sd))

plot of chunk unnamed-chunk-18

And here is geom_pointrange. The mean is shown as a point. This probably works best without the bars.

data %>% ggplot() +
  geom_bar( aes(x=cyl, y=mean), stat="identity", fill="skyblue", alpha=0.5) +
  geom_pointrange( aes(x=cyl, y=mean, ymin=mean-sd, ymax=mean+sd))

plot of chunk unnamed-chunk-19

Project Euler 5 – Smallest multiple

What is the smallest, positive, number that can be divided by all numbers from 1 to 20 without any remainder?

We are given that 2520 is the smallest that can be divided by all numbers from 1:10.

One number that can definitely be divided by all numbers from 1:20 is:

## [1] 2.432902e+18

But given that

## [1] 3628800

is rather larger than 2520, it is definitely not the answer.

The answer must be a multiple of all the primes smaller than 20. A number that is divisible by 15, will be divisible by
3 and 5.

The library “numbers” have a lot of useful functions. Primes(20) returns all primes smaller than 20, and prod() returns the product of all those primes

## [1] 9699690

Could that be the answer?

What we are looking at is the modulo-operator. 9699690 modulo 2 – what is the remainder? We know that all the remainders, dividing by 1 to 20 must be 0.

prod(Primes(20)) %% 2
## [1] 0

And our large product is divisible by 2 without a remainder.

Thankfully the operator is vectorized, so we can do all the divisions in one go:

9699690 %% 1:20
##  [1]  0  0  0  2  0  0  0  2  3  0  0  6  0  0  0 10  0 12  0 10


9699690 %% 4
## [1] 2

Leaves a remainder.

(2*9699690) %% 4
## [1] 0

Now I just need to find the number to multiply 9699690 with, in order for all the divisions to have a remainder of 0.
That is, change i in this code until the answer is true.

i <- 2
all((i*9699690) %% 1:20 == 0)
## [1] FALSE

Starting with 1*9699690, I test if all the remainders of the divisions by all numbers from 1 to 20 is zero.
As long as they are not, I increase i by 1, save i*9699690 as the answer, and test again.
If the test is TRUE, that is all the remainders are 0, the while-loop quits, and I have the answer.

i <- 1
while(!all((i*9699690) %% 1:20 == 0)){
 i <- i + 1
 answer <- i*9699690

Weird behavior of is.numeric()

An interesting observation. Please note that I have absolutely no idea about why this happens. (at least not at the time of first writing this)

In our datalab, ingeniuously named “Datalab” (the one at KUB-North, because contrary to the labs at the libraries for social sciences, and for the humanities, we are not allowed to have a name), we were visited by at student.

She wanted to make a dose-response plot. Something about the concentration of something, giving some clotting of some blood. Or something…

Anyway, she wanted to do a 4-parameter logistic model. Thats nice, I had never heard about that before, but that is the adventure of running a datalab, and what makes it fun.

Of course there is a package for it, dr4pl. After an introduction to the wonderful world of dplyr, we set out to actually fit the model. This is a minimal working example of what happened:


data <- tibble(dose= 1:10, response = 2:11)

dr4pl(data, dose, response)
## Error in dr4pl.default(dose = dose, response = response, init.parm = init.parm, : Both doses and responses should be numeric.

WTF? “Both doses and responses should be numeric”?. But they are!

## [1] TRUE
## [1] TRUE

The error is thrown by these lines in the source:

if(!is.numeric(dose)||!is.numeric(response)) {
    stop("Both doses and responses should be numeric.")

Lets try:

if(!is.numeric(data$dose)||!is.numeric(data$response)) {
    stop("Both doses and responses should be numeric.")
  } else {
    print("Where did the problem go?")
## [1] "Where did the problem go?"

No idea. Did it disappear? No:

dr4pl(data, dose, response)
## Error in dr4pl.default(dose = dose, response = response, init.parm = init.parm, : Both doses and responses should be numeric.

Looking at the data might give us an idea of the source:

## Classes 'tbl_df', 'tbl' and 'data.frame':    10 obs. of  2 variables:
##  $ dose    : int  1 2 3 4 5 6 7 8 9 10
##  $ response: int  2 3 4 5 6 7 8 9 10 11

Both dose and response are integers. Might that be the problem?

data <- tibble(dose= (1:10)*1.1, response = (2:11)*1.1)
dr4pl(data, dose, response)
## Error in dr4pl.default(dose = dose, response = response, init.parm = init.parm, : Both doses and responses should be numeric.

Nope. Both dose and response are now definitely numeric:

## Classes 'tbl_df', 'tbl' and 'data.frame':    10 obs. of  2 variables:
##  $ dose    : num  1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8 9.9 11
##  $ response: num  2.2 3.3 4.4 5.5 6.6 7.7 8.8 9.9 11 12.1

But the problem persists.

Might this be the reason?

## # A tibble: 2 x 2
##    dose response
##   <dbl>    <dbl>
## 1   1.1      2.2
## 2   2.2      3.3

It is a tibble. And the variables are reported to be doubles.

But according to the documentation:

“numeric is identical to double (and real)”

That should not be a problem then.

However. In desperation, I tried this:

c <- data %>% %>% 
  dr4pl(dose, response)

And it works!

Why? Absolutely no idea!

Or do i?

The problem is that subsetting a tibble returns a new tibble. Well, subsetting a dataframe returns a dataframe as well?

It does. Unless you subset out a single variable or observation.

In a dataframe, df$var returns a vektor, containing the values of var in df.

If, however, df is a tibble, df$var will return a tibble, with just one variable.

StarLog – The main character. Enterprise!

In what ways does the U.S.S. Enterprise function as a character, not just a vehicle in Star Trek? Does “she” have a personality? Do the other ships in the Star Trek universe have the same level of character development?

To be honest I do not think that the starships actually can be viewed as characters. Yes, Enterprise is named, gendered, and the “real” characters talk to her. And the computer responds. But as a general rule, the computer has no selfawareness, does not solve problems (only at the prompt of the real characters), and only has a personality to the extent that the real characters project their own perceptions and ideas on to the ship.

We like to talk about Enterprise as a character. And Star Trek would not have been Star Trek without Enterprise. But I flatly dismiss the idea that Enterprise is actually a character in her own right.

StarLog – the future of propulsion

Where do you think ion propulsion and future engine technology will take us? What are the dangers? Are there other applications?

It will take us further out in space! But where it will also take us is not places far out in space. But also to more local spaces. Solving problems related to it, will give us new technology. And we have no idea where that will take us, in the same way that we did not know how the general theory of relativity and quantum mechanics would bring us GPS-navigation.

The dangers? Hard to say. I would claim that we should be mindful of not using all our reserves of Xenon for this. Other than that. Probably none. Unless it turns out that ion propulsion damages subspace in some way.

Finally. All four pips:

Starlog – diversity

More homework for Star Trek: Inspiring Culture and Technology.

Why is it important to see yourself on television? Why is television an important subject for scholarly study and how does what we watch shape the world we live in?


Scott asks if you think we’re getting closer to realizing the Vulcan philosophy of IDIC (Infinite Diversity in Infinite Combinations) here on Earth. What would it take for that to happen? What would it look like? How might things be different?

The first question. Well, Scott answers it. It is important to see yourself on Star Trek. Because it shows a vision of a future. That has room for people like me.

I know it from myself. Seeing Jadzia Dax kiss her former wife in rejoined was important. It was not a lesbian kiss as such. And yet it was, and as a gay man, that actually meant something. Especially since we had waited for so long to actually see any representation of gay and lesbian characters in Star Trek, a thing Roddenberry had promised would be adressed in season five of TNG.

This makes a difference. We all to a certain degree view ourself through the stories we are told or shown. Small girls (and some boys) imagine themselves to be princesses when they are read fairytales. Grown men (and some women) imagine themselves to be action heroes when they watch Die Hard. And seeing yourself, or someone like you, portrayed positively makes a huge difference.

That is the real true promise of Star Trek. That the diversity we have on Earth today – not always ideal, will live on in the future, in a more positive and meaningful way, than it does for young people. No matter what their circumstances. In that words of Dan Savage, that it will get better.

That Vulcan ideal, Infinite Diversity in Infinite Combinations, is still in the future. But not as far off, as it has been. What it would take to get closer? I could come with a lot of politically correct suggestions about educating the ignorant, and fighting racism. But I think the underlying problem is scarcity, and a clash of cultures. Make sure that immigration does not mean that I have to pay more in taxes to support unemployed immigrants, and that they do not threaten my livelyhood by putting pressure on wages. And make sure that people of all cultures, accept the basic foundation of a liberal (in the original sense, not necessarily the american political sense) society, democracy, equality of the sexes, and non-discrimination, I think we will be all right. Not that we won’t still have idiots discriminating. But we should try to move to a point where idiots cannot get away with justifying their discrimination with religion and/or culture.

If we can do that, and despite bumps on the road, we are getting ever closer, we will be able to realise the alien ideal of IDIC.

Maybe that also explains why television makes sense as a subject of scholarly study. Television is one of the most important common representations of popular culture. We are placed in front of this entertainment for an inordinate amount of time every day. It affects a lot of people, in diverse ways. If that should not be an important subject of study, I don’t know what should be.

And that gives me the rank I would really like. Commander:

StarLog – we are explorers

At the end of the video, Margaret says that space exploration was controversial in the 1970s and 1980s. People wondered why the government was spending time and money exploring the solar system when critical problems existed here on Earth. What do you think? Should the government resolve Earthly issues before exploring space? Or is a scientific investigation of distant worlds a fundamentally human endeavor of exploration? Explain your argument.

No. Governments should not attempt to solve all problems on the globe, before exploring. One. Human life is one continous series of problems and challenges. We will never get past the goal-post, because it will continously move. It is an impossible goal.

Should we not try anyway? Why pour money into a spaceprogram, when we could instead help homeless people, to take just one of the, relatively, smaller problems we are facing? It is hard to argue for exploration, especially in an endeavour as expensive as spaceexploratoin, in the face of a homeless man.

But humanity is fundamentally an exploring race. The reason we are everywhere on this planet, for good and bad, is that we have explored. The reason that life-expectancy is at a record high, even in poor countries in the third world, is that we are explorers. The introductory video mentions that. We are explorers, not only in a geographical, or astronomical sense, but in every sense of the word. Abandoning the exploration, even the expensive space-version of it, betrays our future. To take just one very basic example, the images brought home from the moon, for the first time showing how small, beautiful and fragile our planet is, made it clear to humanity how extraordinary lucky we are to live. And how important it is to take care of the one planet in the universe where we know live exist.

Now we just need to prove that intelligent live exists somewhere in the universe. Demonstrating that it exists on Earth, would be a good place to start.

StarLog – utopia

And even more homework for Star Trek: Inspiring Culture and Technology

Think of a global issue that we are facing today that causes fear or concern. What would be the plot of a television show that depicted a utopian and optimistic vision of the future of that issue? 

Climate. No doubt. The consequences of globalisation, and the problems arising from that, populist political leaders in most of the world to begin with could be another. But the most pressing should probably be climate change.

And the plot. Well, it could simply be – Star Trek. We do see a lot of plots for episodes and movies in that universe that adresses that specific issue. “The one with the whales” (Star Trek IV – the voyage home) is the most obvious. But we see a lot of other episodes. Night in VOY, where Janeway confronts the Malons. The existential threat to Star Fleet and the Federation in “Force of Nature”. So I will abstain from trying to device my own show. And simply ṕoint to Star Trek.

Star Trek confronts the problems head-on. The problems are more or less simply solved. Or at the very least they try to solve them. I am not sure the problems in “Force of Nature” are actually solved. But the Federation, at least in that episode, and some of the following, actually tries to mitigate the effects of their environmentally damaging actions. We see similar issues in Discovery, where the use of the spore-drive, must be said to present som environmental problems. That might be the reason we do not see spore-drive in the existing series.

StarLog – Artificial Intelligence

Next piece of homework for Star Trek: Inspiring Culture and Technology:

Where do you see Artificial Intelligence going? Will it be Data, The Doctor or something new? Do we need to fear it, embrace it or something in between? 

On the very long term – definitely something like Data og the EMH, that is general artificial intelligence with an awareness of self. Intelligence and self-awareness, appears to be an emergent phenomenon of complex networks. We have it. The more we look for it, the more higher animals we observe it in. That means that it is probably just a question of time, before we are able to build a neural network, that is sufficiently complex that self-awareness and the ability to learn, will emerge.

That will take some time. A lot longer than what we are led to believe by the marketing. On the shorter term, we are going to see a lot more specific artificial intelligence.

So – where is it going? Definitely towards Data. In very small steps. And there is a very long way to go before we get to the EMH.

Should we fear it? We should fear it kinda like we fear the eventual death of the sun. Yes, it will burn out, it will end life on Earth. But not in our lifetimes, and not in our grandchildrens lifetime. Right now we should embrace it. As mentioned in the interview, artificial intelligence as we know it now, has given us more time to do what truly brings value to our work, ourselfes, and our fellow human beings. That is not a threat.

But we should begin to think hard about the moral and ethical implications of how we use artificial intelligence. The US drones are now capable of taking off themselves, flying to the target area on autopilot, identify targets, and returning after the killshot. As it appears from outside the intelligence community, the only reason that the artificial intelligence in the drones does not also pull the trigger and launch the missile on its own, is ethical considerations.

And those are things we need to think about. And perhaps be a little fearful of. Not the artificial intelligence, but rather the all to real human stupidity. We should think about who will be held responsible for mistakes. If the missile launch is determined to be a breach of the laws of war, who is responsible? The programmer? The people inputting data into the neural networks? The designers of the training sets? When the selfdriving car makes a mistake, and hits someone – who is responsible? When it makes a choice between hitting the stroller and the retiree – was that the right answer.

We might as well get started on those issues. They are only going to get more complicated. And don’t get me started on the moral implications of what we should do when we reboot the computer network that has gained self-awareness.

A good place to start, would be to watch some Star Trek. Because these issues has been discussed before. The trial determining the humanity of Data and the issues regarding holodeck malfunctions to take just to cases.

And that should get me to the next rank – Lieutenant Junior Grade:

StarLog – technology must-haves

More homework for “Star Trek: Inspiring Culture and Technology

Scott asked, “What Star Trek technology is on your list of must-haves?” Could the Star Trek universe exist without this type of technology? How would it be better (or worse) with (or without) this technology? Be sure to use evidence to support your argument.

The obvious answer: Star Trek could not exist without warp-drive. Getting from planet to planet would be impossible. What is really cool is the transporter. But that is not at must-have. Neither are phasers, quantum-torpedoes, (medical) tricorders etc. Would Star Trek be worse without warp-drive? No, it would not exist.

But I think the most important technology is the replicator – in combination with, for all practical purposes, unlimited energy. That, in my opinion is the important technology.

What it does, is making the Star Trek universe a universe without scarcity. Picard sums it up in “First Contact”:

“The economics of the future is somewhat different. You see, money doesn’t exist in the 24th century. The acquisition of wealth is no longer the driving force in our lives. We work to better ourselves and the rest of humanity.”

Humanity does not want anymore. No one needs to starve, no one has to be without housing, food and clean water. And assuming that what ever delivers the energy required, does so in a clean way, we can do it without destrying the environment.

That, the unlimited energy, the instant delivery of “earl grey. Hot”, is the most important technology.

Star Trek can work without it. Voyager frequently has to ration the energy reserves, and in TOS at least a couple of episodes are centered around the need to get fresh dilithium crystals.

That does not imply that money is a thing of the past. It might be in the ideal world of the Star Fleet flagship. But we see several instances of Star Fleet officers having to pay for things. And the other races, Ferengis is the obvious example, do have money. But hunger, and need for material things is a thing of the past in the Federation. And that, makes the replicator and unlimited energy the most important technologies in Star Trek.