Legitimation vs identifikation

Det meste af denne blog er på engelsk. Jeg arbejder i et internationalt felt. Men her kommer mit engelske lidt til kort – i hvert fald på skrift.

Vi har et problem med CPR-numre. Og det skyldes en grundlæggende misforståelse.

CPR-nummeret er den unikke identifikation af en person i Danmark. De ti cifre der er knyttet til mig, identificerer mig. Når vi taler om en person med CPR-nummeret 123456-7890, er der en og kun en person der kan være tale om.

Vi ved også at det er en person født i den ukendte måned 34 og at vedkommende er født med en biologi der gør vedkommende til en “han”.

Så når jeg går ind i en forretning, og indgår en låneaftale, hvoraf det fremgår at CPR-nummeret på låntager er 123456-7890 – så er det entydigt identificeret, at det er denne meget fiktive person der hæfter for pengene.

Betyder det så at jeg, den person der stod i forretningen, var denne person?

Nej. Jeg kan have stjålet et sygesikringsbevis. Jeg kan for så vidt have produceret det selv. Det er ikke så svært. I de mest grelle eksempler, har låntager slet ikke fremvist noget som helst, men bare sagt CPR-nummeret. Hvis jeg ikke har fremvist legitimation på at jeg er den person der er identificeret med det CPR-nummer – så er der ingen garanti for at jeg faktisk er den person.

Problemet opstår fordi folk ikke kan finde ud af at skelne mellem netop identifikation og legitimation. CPR-nummeret bliver betragtet som en hemmelig kode. Hvis man kan oplyse det, så må man være den person. Men det er ikke spor hemmeligt. Det står på dit sygesikringsbevis. Det gemmer sig i stregkoden. De første seks cifre oplyser du gladeligt til hvem som helst. Det sidste ciffer kan, for 99,4% af befolkningen, begrænses til at være et af fem.

Du tror det er hemmeligt, og at hvis du kan oplyse det, så kan der ikke være tvivl om at det faktisk er dig. Udbydere af kviklån lader som om det er hemmeligt, og at de har fat i den rigtige person hvis de får det oplyst. Jeg er helt sikker på at de godt er klar over problemstillingen, men bare er ligeglade. Alle lader som om at man kan bruge CPR-nummeret til at legitimere sig. Og fordi vi lader som om man kan det, får man problemer med identitetstyveri når folk kan optage forbrugslån ved blot at oplyse et CPR-nummer. Og folk går i panik over at deres CPR-nummer offentliggøres.

Vi burde offentliggøre alle CPR-numre. Så ville det forhåbentlig stå klart for alle at CPR-nummeret ikke legitimerer noget som helst.

Working from home – it’s not all bad

As a follow up to the previous post.

Yes, a sub-optimal meeting culture does not automatically become optimal because it goes online.

But! This corona/covid-19/chinese flu crisis amplifies stuff. The racial problems in the US becomes more pronounced. The problems with the british class system are suddenly visible even from across the north sea. The challenges workers in precarious posions in a global economy faces are enlarged.

And the problematic elements in a meeting culture becomes large enough, that something might actually be done about them!

Working from home

Denmark has been in lockdown – or something like it – for close to two months now. March 11th our prime minister announced that all public employees where to be sent home as soon as possible. A lot of other institutions, companies and functions were similarly closed down. Further restrictions were introduced in the following weeks.

My husband was also told to work from home. He had his first full day at home on the 12th. I had to show up at work on the 12th, and was sent home early because I had an online meeting in the afternoon.

How is this working? The first 1½ week it was great! Finally I had the time to get on top of things, uninterrupted by meetings, colleagues asking if I wanted to drink coffee, or if I could help change the toner in the printer.

The next two weeks – (timeline excluding the 1½ week I was on mandatory vacation at home in connection with easter) – where not that productive. The stuff that could be done without talking with colleagues was running out.

And after that – now we have found the way. Online meetings are working, debating the way we should design a general, introductory workshop on visualization of data is as frustrating as it would have been otherwise. And the amount of irrelevant meetings is back to normal. Do not misunderstand me – meeting with colleagues without having an agenda is nice. And necessary, especially since we are in almost complete isolation at home. On the other hand. Not every meeting is actually necessary. I started the lockdown joking that we would now discover which meetings could have been emails. Now, I think that we are beginning to discover which emails could have been meetings. If we cant meet, at least we can have a meeting.

In other words – the bad habits of the physical office is entering the online office.

Managers and colleagues have no problem booking online meetings at the same time as you are in another meeting. With little to no understanding that you are actually in another meeting. As I wrote my boss: I can’t wait for us to get physically back to the library, so I can tell you that I cannot attend that meeting, since I am out of town at another meeting.

Usually it is possible to get people to understand, that if your meeting at location A ends at 12.00 you are not able to attend another meeting at location B. Or, they can understand if if there is 30 minutes of transportation between the two locations. Not so in the virtual world. Leading to a day (two weeks ago as of this writing), with the first meeting beginning at 9 and ending at 9.30. The next meeting from 9.30 to 11.00. The next again from 11.00 to 12.00. And the next from 12.00 to 13.00. The fifth meeting started 13.00 and was finished 14.00 Not that we were actually finished, but the sixth meeting was at 14.00. Happily it was a short one, only lasting 15 minutes. So I had 45 minutes for a biobreak, before the final meeting started at 15.00.

 

Errorbars on barcharts

Errorbars

An errorbar is a graphical indication of the standard deviation of a measurement. Or a confidence interval.

The mean of a measurement is something. We want to illustrate how confident we are of that value.

How do we plot that?

I am going to use three libraries:

library(ggplot2)
library(dplyr)
library(tibble)

ggplot2 for plotting, dplyr to make manipulating data easier. And tibble to support a single line later.

Then, we need some data.

data <- data.frame(
  name=letters[1:3],
  mean=sample(seq(4,15),3),
  sd=c(1,0.2,3)
)

I now have a dataframe with three variables, name, mean and standard deviation (sd).

How do we plot that?

Simple:

data %>% 
  ggplot() +
  geom_bar( aes(x=name, y=mean), stat="identity", fill="skyblue") +
  geom_errorbar(aes(x=name, ymin=mean-sd, ymax = mean+sd), width=0.1)

plot of chunk unnamed-chunk-12

So. What on earth was that?

The pipe-operator %>% takes whatever is on the left-hand-side, and inserts it as the first variable in whatever is on the right-hand-side.

What is on the right-hand-side is the ggplot function. That would normally have a datat=something as the first variable. Here that is the data we constructed earlier.

To that initial plot, which is completely empty, we add a geom_bar. That plots the bars on the plot. It takes an x-value, name, and a y-value, mean. And we tell the function, that rather than counting the number of observations of each x-value (the default behavior of geom_bar), it should use the y-values provided. We also want a nice lightblue color for the bars.

To that bar-chart, we now add errorbars. geom_errorbar needs to know the x- and y-values of the bars, in order to place them correctly. It also needs to know where to place the upper errorbar, and the lower errorbar. And we supply the information that ymin, the lower, should be the mean value minus the standard deviation. And the upper bar, ymax, the sum of the mean and sd. Finally we need to decide how broad those lines should be. We do that by writing “width=0.1”. We do not actually have to, but the default value results in an ugly plot.

And there you go, a barchart with errorbars!

Next step

That was all very nice. However! We do not usually have a nice dataframe with means and standard deviations calculated directly. More often, we have a dataframe like this:

mtcars %>% 
  remove_rownames() %>% 
  select(cyl, disp) %>% 
  head()
##   cyl disp
## 1   6  160
## 2   6  160
## 3   4  108
## 4   6  258
## 5   8  360
## 6   6  225

I’ll get back to what the code actually means later.

Here we have 32 observations (only 6 shown above), of the variables “cyl” and “disp”. I would now like to make a barplot of the mean value of disp for each of the three different values or groups in cyl (4,6 and 8). And add the errorbars.

You could scroll through all the data, sort them by cyl, manually count the number of observations in each group, add the disp, divide etc etc etc.

But there is a simpler way:

mtcars %>% 
  remove_rownames() %>% 
  select(cyl, disp) %>% 
  group_by(cyl) %>% 
  summarise(mean=mean(disp), sd=sd(disp))
## # A tibble: 3 x 3
##     cyl  mean    sd
##   <dbl> <dbl> <dbl>
## 1     4  105.  26.9
## 2     6  183.  41.6
## 3     8  353.  67.8

mtcars is a build-in dataset (cars in the us in 1974). I send that, using the pipe-operator, to the function remove_rownames, that does exactly that. We don’t need them, and they will just confuse us. That result is then send to the function select, that selects the two columns/variables cyl and disp, and discards the rest. Next, we group the data according to the value of cyl. There are three different values, 4, 6 and 8. And then we use the summarise function, to calculate the mean and the standard deviation of disp, for each of the three groups.

Now we should be ready to plot. We just send the result above to the plot function from before:

mtcars %>% 
  remove_rownames() %>% 
  select(cyl, disp) %>% 
  group_by(cyl) %>% 
  summarise(mean=mean(disp), sd=sd(disp)) %>% 
  ggplot() +
    geom_bar( aes(x=cyl, y=mean), stat="identity", fill="skyblue") +
    geom_errorbar(aes(x=cyl, ymin=mean-sd, ymax = mean+sd), width=0.1)

plot of chunk unnamed-chunk-15
All we need to remember is to change “name” in the original to “cyl”. All done!

But wait! There is more!!

Those errorbars can be shown in more than one way.

Let us start by saving our means and sds in a dataframe:

data <- mtcars %>% 
  remove_rownames() %>% 
  select(cyl, disp) %>% 
  group_by(cyl) %>% 
  summarise(mean=mean(disp), sd=sd(disp))

geom_crossbar results in this:

data %>% 
ggplot() +
  geom_bar( aes(x=cyl, y=mean), stat="identity", fill="skyblue") +
  geom_crossbar( aes(x=cyl, y=mean, ymin=mean-sd, ymax=mean+sd))

plot of chunk unnamed-chunk-17

I think it is ugly. But whatever floats your boat.

Then there is just a vertival bar, geom_linerange. I think it makes it a bit more difficult to compare the errorbars. On the other hand, it results in a plot that is a bit more clean:

data %>% ggplot() +
  geom_bar( aes(x=cyl, y=mean), stat="identity", fill="skyblue") +
  geom_linerange( aes(x=cyl, ymin=mean-sd, ymax=mean+sd))

plot of chunk unnamed-chunk-18

And here is geom_pointrange. The mean is shown as a point. This probably works best without the bars.

data %>% ggplot() +
  geom_bar( aes(x=cyl, y=mean), stat="identity", fill="skyblue", alpha=0.5) +
  geom_pointrange( aes(x=cyl, y=mean, ymin=mean-sd, ymax=mean+sd))

plot of chunk unnamed-chunk-19

Project Euler 5 – Smallest multiple

What is the smallest, positive, number that can be divided by all numbers from 1 to 20 without any remainder?

We are given that 2520 is the smallest that can be divided by all numbers from 1:10.

One number that can definitely be divided by all numbers from 1:20 is:

factorial(20)
## [1] 2.432902e+18

But given that

factorial(10)
## [1] 3628800

is rather larger than 2520, it is definitely not the answer.

The answer must be a multiple of all the primes smaller than 20. A number that is divisible by 15, will be divisible by
3 and 5.

The library “numbers” have a lot of useful functions. Primes(20) returns all primes smaller than 20, and prod() returns the product of all those primes

library(numbers)
prod(Primes(20))
## [1] 9699690

Could that be the answer?

What we are looking at is the modulo-operator. 9699690 modulo 2 – what is the remainder? We know that all the remainders, dividing by 1 to 20 must be 0.

prod(Primes(20)) %% 2
## [1] 0

And our large product is divisible by 2 without a remainder.

Thankfully the operator is vectorized, so we can do all the divisions in one go:

9699690 %% 1:20
##  [1]  0  0  0  2  0  0  0  2  3  0  0  6  0  0  0 10  0 12  0 10

Nope.

9699690 %% 4
## [1] 2

Leaves a remainder.

(2*9699690) %% 4
## [1] 0

Now I just need to find the number to multiply 9699690 with, in order for all the divisions to have a remainder of 0.
That is, change i in this code until the answer is true.

i <- 2
all((i*9699690) %% 1:20 == 0)
## [1] FALSE

Starting with 1*9699690, I test if all the remainders of the divisions by all numbers from 1 to 20 is zero.
As long as they are not, I increase i by 1, save i*9699690 as the answer, and test again.
If the test is TRUE, that is all the remainders are 0, the while-loop quits, and I have the answer.

i <- 1
while(!all((i*9699690) %% 1:20 == 0)){
 i <- i + 1
 answer <- i*9699690
}

Weird behavior of is.numeric()

An interesting observation. Please note that I have absolutely no idea about why this happens. (at least not at the time of first writing this)

In our datalab, ingeniuously named “Datalab” (the one at KUB-North, because contrary to the labs at the libraries for social sciences, and for the humanities, we are not allowed to have a name), we were visited by at student.

She wanted to make a dose-response plot. Something about the concentration of something, giving some clotting of some blood. Or something…

Anyway, she wanted to do a 4-parameter logistic model. Thats nice, I had never heard about that before, but that is the adventure of running a datalab, and what makes it fun.

Of course there is a package for it, dr4pl. After an introduction to the wonderful world of dplyr, we set out to actually fit the model. This is a minimal working example of what happened:

library(tidyverse)
library(dr4pl)

data <- tibble(dose= 1:10, response = 2:11)

dr4pl(data, dose, response)
## Error in dr4pl.default(dose = dose, response = response, init.parm = init.parm, : Both doses and responses should be numeric.

WTF? “Both doses and responses should be numeric”?. But they are!

is.numeric(data$dose)
## [1] TRUE
is.numeric(data$response)
## [1] TRUE

The error is thrown by these lines in the source:

if(!is.numeric(dose)||!is.numeric(response)) {
    stop("Both doses and responses should be numeric.")
  }

Lets try:

if(!is.numeric(data$dose)||!is.numeric(data$response)) {
    stop("Both doses and responses should be numeric.")
  } else {
    print("Where did the problem go?")
  }
## [1] "Where did the problem go?"

No idea. Did it disappear? No:

dr4pl(data, dose, response)
## Error in dr4pl.default(dose = dose, response = response, init.parm = init.parm, : Both doses and responses should be numeric.

Looking at the data might give us an idea of the source:

str(data)
## Classes 'tbl_df', 'tbl' and 'data.frame':    10 obs. of  2 variables:
##  $ dose    : int  1 2 3 4 5 6 7 8 9 10
##  $ response: int  2 3 4 5 6 7 8 9 10 11

Both dose and response are integers. Might that be the problem?

data <- tibble(dose= (1:10)*1.1, response = (2:11)*1.1)
dr4pl(data, dose, response)
## Error in dr4pl.default(dose = dose, response = response, init.parm = init.parm, : Both doses and responses should be numeric.

Nope. Both dose and response are now definitely numeric:

str(data)
## Classes 'tbl_df', 'tbl' and 'data.frame':    10 obs. of  2 variables:
##  $ dose    : num  1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8 9.9 11
##  $ response: num  2.2 3.3 4.4 5.5 6.6 7.7 8.8 9.9 11 12.1

But the problem persists.

Might this be the reason?

head(data,2)
## # A tibble: 2 x 2
##    dose response
##   <dbl>    <dbl>
## 1   1.1      2.2
## 2   2.2      3.3

It is a tibble. And the variables are reported to be doubles.

But according to the documentation:

“numeric is identical to double (and real)”

That should not be a problem then.

However. In desperation, I tried this:

c <- data %>% 
  as.data.frame() %>% 
  dr4pl(dose, response)

And it works!

Why? Absolutely no idea!

Or do i?

The problem is that subsetting a tibble returns a new tibble. Well, subsetting a dataframe returns a dataframe as well?

It does. Unless you subset out a single variable or observation.

In a dataframe, df$var returns a vektor, containing the values of var in df.

If, however, df is a tibble, df$var will return a tibble, with just one variable.

StarLog – The main character. Enterprise!

In what ways does the U.S.S. Enterprise function as a character, not just a vehicle in Star Trek? Does “she” have a personality? Do the other ships in the Star Trek universe have the same level of character development?

To be honest I do not think that the starships actually can be viewed as characters. Yes, Enterprise is named, gendered, and the “real” characters talk to her. And the computer responds. But as a general rule, the computer has no selfawareness, does not solve problems (only at the prompt of the real characters), and only has a personality to the extent that the real characters project their own perceptions and ideas on to the ship.

We like to talk about Enterprise as a character. And Star Trek would not have been Star Trek without Enterprise. But I flatly dismiss the idea that Enterprise is actually a character in her own right.

StarLog – the future of propulsion

Where do you think ion propulsion and future engine technology will take us? What are the dangers? Are there other applications?

It will take us further out in space! But where it will also take us is not places far out in space. But also to more local spaces. Solving problems related to it, will give us new technology. And we have no idea where that will take us, in the same way that we did not know how the general theory of relativity and quantum mechanics would bring us GPS-navigation.

The dangers? Hard to say. I would claim that we should be mindful of not using all our reserves of Xenon for this. Other than that. Probably none. Unless it turns out that ion propulsion damages subspace in some way.

Finally. All four pips:

Starlog – diversity

More homework for Star Trek: Inspiring Culture and Technology.

Why is it important to see yourself on television? Why is television an important subject for scholarly study and how does what we watch shape the world we live in?

Also:

Scott asks if you think we’re getting closer to realizing the Vulcan philosophy of IDIC (Infinite Diversity in Infinite Combinations) here on Earth. What would it take for that to happen? What would it look like? How might things be different?

The first question. Well, Scott answers it. It is important to see yourself on Star Trek. Because it shows a vision of a future. That has room for people like me.

I know it from myself. Seeing Jadzia Dax kiss her former wife in rejoined was important. It was not a lesbian kiss as such. And yet it was, and as a gay man, that actually meant something. Especially since we had waited for so long to actually see any representation of gay and lesbian characters in Star Trek, a thing Roddenberry had promised would be adressed in season five of TNG.

This makes a difference. We all to a certain degree view ourself through the stories we are told or shown. Small girls (and some boys) imagine themselves to be princesses when they are read fairytales. Grown men (and some women) imagine themselves to be action heroes when they watch Die Hard. And seeing yourself, or someone like you, portrayed positively makes a huge difference.

That is the real true promise of Star Trek. That the diversity we have on Earth today – not always ideal, will live on in the future, in a more positive and meaningful way, than it does for young people. No matter what their circumstances. In that words of Dan Savage, that it will get better.

That Vulcan ideal, Infinite Diversity in Infinite Combinations, is still in the future. But not as far off, as it has been. What it would take to get closer? I could come with a lot of politically correct suggestions about educating the ignorant, and fighting racism. But I think the underlying problem is scarcity, and a clash of cultures. Make sure that immigration does not mean that I have to pay more in taxes to support unemployed immigrants, and that they do not threaten my livelyhood by putting pressure on wages. And make sure that people of all cultures, accept the basic foundation of a liberal (in the original sense, not necessarily the american political sense) society, democracy, equality of the sexes, and non-discrimination, I think we will be all right. Not that we won’t still have idiots discriminating. But we should try to move to a point where idiots cannot get away with justifying their discrimination with religion and/or culture.

If we can do that, and despite bumps on the road, we are getting ever closer, we will be able to realise the alien ideal of IDIC.

Maybe that also explains why television makes sense as a subject of scholarly study. Television is one of the most important common representations of popular culture. We are placed in front of this entertainment for an inordinate amount of time every day. It affects a lot of people, in diverse ways. If that should not be an important subject of study, I don’t know what should be.

And that gives me the rank I would really like. Commander: