Starlog – diversity

More homework for Star Trek: Inspiring Culture and Technology.

Why is it important to see yourself on television? Why is television an important subject for scholarly study and how does what we watch shape the world we live in?

Also:

Scott asks if you think we’re getting closer to realizing the Vulcan philosophy of IDIC (Infinite Diversity in Infinite Combinations) here on Earth. What would it take for that to happen? What would it look like? How might things be different?

The first question. Well, Scott answers it. It is important to see yourself on Star Trek. Because it shows a vision of a future. That has room for people like me.

I know it from myself. Seeing Jadzia Dax kiss her former wife in rejoined was important. It was not a lesbian kiss as such. And yet it was, and as a gay man, that actually meant something. Especially since we had waited for so long to actually see any representation of gay and lesbian characters in Star Trek, a thing Roddenberry had promised would be adressed in season five of TNG.

This makes a difference. We all to a certain degree view ourself through the stories we are told or shown. Small girls (and some boys) imagine themselves to be princesses when they are read fairytales. Grown men (and some women) imagine themselves to be action heroes when they watch Die Hard. And seeing yourself, or someone like you, portrayed positively makes a huge difference.

That is the real true promise of Star Trek. That the diversity we have on Earth today – not always ideal, will live on in the future, in a more positive and meaningful way, than it does for young people. No matter what their circumstances. In that words of Dan Savage, that it will get better.

That Vulcan ideal, Infinite Diversity in Infinite Combinations, is still in the future. But not as far off, as it has been. What it would take to get closer? I could come with a lot of politically correct suggestions about educating the ignorant, and fighting racism. But I think the underlying problem is scarcity, and a clash of cultures. Make sure that immigration does not mean that I have to pay more in taxes to support unemployed immigrants, and that they do not threaten my livelyhood by putting pressure on wages. And make sure that people of all cultures, accept the basic foundation of a liberal (in the original sense, not necessarily the american political sense) society, democracy, equality of the sexes, and non-discrimination, I think we will be all right. Not that we won’t still have idiots discriminating. But we should try to move to a point where idiots cannot get away with justifying their discrimination with religion and/or culture.

If we can do that, and despite bumps on the road, we are getting ever closer, we will be able to realise the alien ideal of IDIC.

Maybe that also explains why television makes sense as a subject of scholarly study. Television is one of the most important common representations of popular culture. We are placed in front of this entertainment for an inordinate amount of time every day. It affects a lot of people, in diverse ways. If that should not be an important subject of study, I don’t know what should be.

And that gives me the rank I would really like. Commander:

Where to see Great Pandas

Zoos with Great Pandas in their exhibitions, as pr. medio March ’19.

And Copenhagen Zoo, which will get their pandas in april.

Corresponding value to a max-value

One of our users need to find the max-value of a variable. He also needs to find the corresponding value in another variable.
As in – the maximum value in column A is in row 42. What is the value in column B, row 42.

And of course we need to do it for several groups.

Let us begin by making a dataset. Four groups in id,

library(tidyverse)
id <- 1:3
val <- c(10,20)
kor <- c("a", "b", "c")


example <- expand.grid(id,val) %>% 
  as_tibble() %>% 
  arrange(Var1) %>% 
  cbind(kor, stringsAsFactors=F) %>% 
  rename(group=Var1, value=Var2, corr = kor)

example
##   group value corr
## 1     1    10    a
## 2     1    20    b
## 3     2    10    c
## 4     2    20    a
## 5     3    10    b
## 6     3    20    c

We have six observations, divided into three groups. They all have a value, and a letter in “corr” that is the corresponding value we are interested in.

So. In group 1 we should find the maximum value 20, and the corresponding value “b”.
In group 2 the max value is stil 20, but the corresponding value we are looking for is “a”.
And in group 3 the max value is yet again 20, but the corresponding value is now “c”.

How to do that?

example %>%
  group_by(group) %>% 
  mutate(max=max(value)) %>% 
  mutate(max_corr=corr[(value==max)]) %>% 
  ungroup()
## # A tibble: 6 x 5
##   group value corr    max max_corr
##   <int> <dbl> <chr> <dbl> <chr>   
## 1     1   10. a       20. b       
## 2     1   20. b       20. b       
## 3     2   10. c       20. a       
## 4     2   20. a       20. a       
## 5     3   10. b       20. c       
## 6     3   20. c       20. c

The maximum value for all groups is 20. And the corresponding value to that in the groups is b, a and c respectively.

Isn't there an easier solution using summarise function? Probably. But our user needs to do this for a lot of variables. And their names have nothing in common.

Digital Natives

One can only hope that the concept “Digital Natives” will soon be laid to rest. Or at least all the ideas about what they can do.

A digital native is a person that grows up in the digital age, in contrast to digital immigrants, that got their familiarity with digital systems as an adult.

And there are differences. Digital natives assumes that everything is online. Stuff that is not online does not exist. Their first instinct is digital.

However, in the library world, and a lot of other places, the idea has been, that digital natives, because they have never experienced a world without computers, groks them. That they just know how to use them, and how to use them in a responsible and effective way.

That is, with a technical term, bovine feces. And for far too long, libraries (and others) have ignored the real needs, assuming that there was now suddenly no need for instruction in IT-related issues. Becase digital natives.

Being a digital native does not mean that you know how to code.

Being a digital native does not mean that you know how to google efficiently.

Being a digital native does not mean that you are magically endowed with the ability to discern fake news from facts.

I my self is a car native. I have grown up in an age where cars were ubiquitous. And I still had to take the test twice before getting my license. I was not able to drive a car safely, just because I have never known a world without cars. Why do we assume that a digital native should be able to use a computer efficiently?

The next project

For many years, from 1977 to 2006, there was a regular feature in the journal for the Danish Chemical Society. “Kemiske småforsøg”, or “Small chemical experiments”. It was edited by the founder of the Danish Society for Historical Chemistry, and contained a lot of interesting chemistry, some of it with a historical angle.

The Danish Society for Historical Chemistry is considering collecting these experiments, and publishing them. It has been done before, but more experiments were published after that.

We still don’t know if we will be allowed to do it. And it is a pretty daunting task, as there are several hundred experiments. But that is what I’m spending my free time on at the moment. If we get i published, it will be for sale at the website of the Danish Society for Historical Chemistry.

Project Euler 39

Project Euler 39

We’re looking at Pythagorean triplets, that is equations where a, b and c are integers, and:

a2 + b2 = c2

The triangle defined by a,b,c has a perimeter.

The triplet 20,48,52 fulfills the equation, 202 + 482 = 522. And the perimeter of the triangle is 20 + 48 + 52 = 120

Which perimeter p, smaller than 1000, has the most solutions?

So, we have two equations:

a2 + b2 = c2

p = a + b + c

We can write

c = p – a – b

And substitute that into the first equation:

a2 + b2 = (p – a -b)2

Expanding the paranthesis:

a2 + b2 = p2 – ap – bp – ap + a2 + ab – bp + ab + b2

Cancelling:

0 = p2 – 2ap – 2bp + 2ab

Isolating b:

0 = p2 – 2ap – b(2p – 2a)

b(2p – 2a) = p2 – 2ap

b = (p2 – 2ap)/(2p – 2a)

So. For a given value of p, we can run through all possible values of a and get b. If b is integer, we have a solution that satisfies the constraints.

The smallest value of a we need to check is 1. But what is the largest value of a for a given value of p?

We can see from the pythagorean equation, that a =< b < c. a might be larger than b, but we can then just switch a and b. So it holds. What follows from that, is that a =< p/3.

What else? If a and b are both even, a2 and b2 are also even, then c2 is even, and then c is even, and therefore p = a + b + c is also even.

If a and b are both uneven, a2 and b2 are also uneven, and c2 is then even. c is then even. And therefore p = a + b + c must be even.

If either a or b are uneven, either a2 or b2 is uneven. Then c2 is uneven, and c is then uneven. Therefore p = a + b + c must be even.

So. I only need to check even values of p. That halves the number of values to check.

Allright, time to write some code:

current_best_number_of_solutions <- 0

for(p in seq(2,1000,by=2)){
  solutions_for_current_p <- 0
  for(a in 1:ceiling(p/3)){
    if(!(p**2-2*a*p)%%(2*p-2*a)){
      solutions_for_current_p <- solutions_for_current_p + 1
    }
  }
  if(solutions_for_current_p > current_best_number_of_solutions){
    current_best_p <- p
    current_best_number_of_solutions <- solutions_for_current_p
   }
}

answer <- current_best_p

current_best_number_of_solutions is initialized to 0.

For every p from 2 to 1000, in steps of 2 (only checking even values of p), I set the number of solutions_for_current_p to 0.

For every value a from 1 to p/3 – rounded to to an integer: If !(p2-2*a*p)%%(2*p-2*a) is true, that is, if the remainder of (p2-2*a*p)/(2*p-2*a) is 0, I increment the solutions_for_current_p.

After running through all possible values of a for the value of p we have reached in the for-loop:

If the number of solutions for this value of p is larger, than the previous current_best_number_of_solutions, we have found a value of p that has a higher number of solutions than any previous value of p we have examined. In that case, set the current_best_p to the current value of p. And the current_best_number_of_solutions to the number of solutions we have found for the value of p.

If not, dont change anything, reset solutions_for_current_p and check a new value of p.

Project Euler 4

A palindromic number is similar to a palindrome. It is the same read both left to right, and right to left.

Project Euler tells us, that the largest palindrom made from the product of two 2-digit numbers is 9009. That number is made by multiplying 91 and 99.

I must now find the largest palindrome, made from the product of two 3-digit numbers.

What is given, is that the three digit numbers cannot end with a zero.

There are probably other restrictions as well.

I’ll need a function that tests if a given number is palindromic.

palindromic <- function(x){
  sapply(x, function(x) (str_c(rev(unlist(str_split(as.character(x),""))), collapse="")==as.character(x)))
}

The function part converts x to character, splits it in individual characters, unlists the result, reverses that, and concatenates it to a string. Then it is compared to the original x – converted to a character.
The sapply part kinda vectorises it. But it is still the slow part.

If I could pare the number of numbers down, that would be nice.

One way would be to compare the first and last digits in the number.

first_last <- function(x) { 
  x %/% 10^(floor(log10(x))) == x%%10
}

This function finds the number of digits – 1 in x. I then modulo-divide the number by 10 to the number of digits minus 1. That gives me the first digit, that I compare with the last. If the first and the last digit is the same – it returns true.

Now I am ready. Generate a vector of all three-digit numbers from 101 to 999. Expand the grid to get all combinations. Convert to a tibble,
filter out all the three-digit numbers that end with 0. Calculate a new column as the multiplication of the two numbers, filter out all the results where the first and last digit are not identical, and then filter out the results that are not palindromic. Finally, pass it to max (using %$% to access the individual variables), and get the result.

library(dplyr)
library(magrittr)

res <- 101:999 %>% 
  expand.grid(.,.) %>% 
  as_tibble() %>% 
  filter(Var1 %% 10 != 0, Var2 %% 10 != 10) %>% 
  mutate(pal = Var1 * Var2) %>% 
  filter(first_last(pal)) %>% 
  filter(palindromic(pal)) %$% 
  max(pal)

There are probably faster ways of doing this…

Vision. Mission. Strategy. Tactics? Values?

I have a love-hate relationship with visions and missions. The ones that companies and organizations spend a lot of time and ressources on developing. They often have a rather formulaic form:

We exist to restore intellectual capital whilst continuing to collaboratively coordinate information.

This is actually a mission statement from a mission statement generator.

I do love the idea of missions, visions and strategies. It speaks to the engineer in me. We should define the goal we want to achieve, and then break that goal down into individual work-packages. When we have completed all of them, we have achieved our goal. The logical framework approach is a good example.

I also like values. I need consistency. Or, I don’t need it, but it is important to me. I like people and organizations to actually be consistent in their actions. Walk the talk! If you want a work environment that does not discriminate – do not discriminate. Anyone. In any way, I do not really mind that you discriminate based on gender. Go ahead. Just be honest about it. In reality, I will of course hate you if you discriminate based on gender. But the hate will take a more deep and incandecent nature if you discriminate based on gender, while claiming that you are all about equality.

So – I love strategies. I love visions. I love missions. I love values.

On the other hand. I hate them. Most of them are exactly like the example above. An example that is taken from the mission statement generator. Noone is able to explain the difference between mission and vision, and the values all falls for the negation test. And tend to end up being rather tautological. Often they are self contradictory. You can have loyalty in all situations. Or you can have honesty in all situations. But you can’t have both. Sometimes being loyal means holding back on honesty a bit. How do you prioritize your values? I have never seen a description of a hierarchy.

Usually it doesn’t matter. Most values in organizations tend to be nothing more than hot air. And as hot air, easy to dismiss when it is opportune. Try it yourself. Tell your boss that her decision is wrong, because it is in conflict the the defined values of the organization.

So – I love values. But not when they are meaningless.

And if they are – don’t bother defining them. They are just going to be a waste of time.

Openstreetmap data – for Florence

Not that advanced, but I wanted to play around a bit with plotting the raw data from Openstreetmap.

We’re going to Florence this fall. It’s been five years since we last visited the fair city, that has played such an important role in western history.

Openstreetmaps is, as the name implies, open.

I’m going to need some libraries

#library(OpenStreetMap)
library(osmar)
library(ggplot2)
library(broom)
library(geosphere)
library(dplyr)

osmar provides functions to interact with Openstreetmap. ggplot2 is used for the plots, broom for making some objects tidy and dplyr for manipulating data.

Getting the raw data, requires me to define a boundary box, encompassing the part of Florence I would like to work with. Looking at https://www.openstreetmap.org/export#map=13/43.7715/11.2717, I choose these coordinates:

top <- 43.7770
bottom <- 43.7642
left <- 11.2443
right <- 11.2661

After that, I can define the bounding box, tell the osmar functions at what URL we can find the relevant API (this is just the default). And then I can retrieve the data via get_osm(). I immediately save it to disc. This takes some time to download, and there is no reason to do that more than once.

box <- corner_bbox(left, bottom, right, top)
src <- osmsource_api(url = "https://api.openstreetmap.org/api/0.6/")
florence <- get_osm(box, source=src)
saveRDS(florence, "florence.rda")

Lets begin by making a quick plot:

plot(florence, xlim=c(left,right),ylim=c(bottom,top) )

plot of chunk unnamed-chunk-44

Note that what we get a plot of, is, among other things, of all lines that are partly in the box. If a line extends beyond the box, we get it as well.

Looking at the data:

summary(florence$ways)
## osmar$ways object
## 6707 ways, 9689 tags, 59052 refs 
## 
## ..$attrs data.frame: 
##     id, visible, timestamp, version, changeset, user, uid 
## ..$tags data.frame: 
##     id, k, v 
## ..$refs data.frame: 
##     id, ref 
##  
## Key-Value contingency table:
##         Key         Value Freq
## 1  building           yes 4157
## 2    oneway           yes  456
## 3   highway    pedestrian  335
## 4   highway   residential  317
## 5   bicycle           yes  316
## 6       psv           yes  122
## 7   highway  unclassified  108
## 8   highway       footway  101
## 9   barrier          wall   98
## 10  surface paving_stones   87

I would like to plot the roads and buildings. For some reason there are a lot of highways, of a kind I would probably not call highways.

Anyway, lets make a list of tags. tags() finds the elements that have a key in the tag_list, way finds the lines that are represented by these elements, and find, finds the ID of the objects in “florence” matching this.
find_down() finds all the elements related to these id’s. And finally we take the subset of the large florence data-set, which have id’s matching the id’s we have in from before.

tag_list <- c("highway", "bicycle", "oneway", "building")
dat <- find(florence, way(tags(k %in% tag_list)))
dat <- find_down(florence, way(dat))
dat <- subset(florence, ids = dat)

Now, in a couple of lines, I’m gonna tidy the data. That removes the information of the type of line. As I would like to be able to color highways differently from buildings, I need to keep the information.
Saving the key-part of the tags, and the id:

types <- data.frame(dat$ways$tags$k, dat$ways$tags$id)
names(types) <- c("type", "id")

This gives me all the key-parts of all the tags. And I’m only interested in a subset of them:

types <- types %>% 
  filter(type %in% tag_list)

types$id <- as.character(types$id)

Next as_sp() converts the osmar object to a spatial object (just taking the lines):

dat <- as_sp(dat, "lines")

tidy (from the library broom), converts it to a tidy tibble

dat <- tidy(dat)

That tibble is missing the types – those are added.

new_df <- left_join(dat, types, by="id")

And now we can plot:

new_df %>% 
  ggplot(aes(x=long, y=lat, group=group)) +
  geom_path(aes(color=type)) +
  scale_color_brewer() +
    xlim(left,right) +
  ylim(bottom,top) +
  theme_void() +
theme(legend.position="none")

plot of chunk unnamed-chunk-52

Nice.

Whats next? Someting like what is on this page: https://github.com/ropensci/osmplotr