Project Euler 5 – Smallest multiple

What is the smallest, positive, number that can be divided by all numbers from 1 to 20 without any remainder?

We are given that 2520 is the smallest that can be divided by all numbers from 1:10.

One number that can definitely be divided by all numbers from 1:20 is:

factorial(20)
## [1] 2.432902e+18

But given that

factorial(10)
## [1] 3628800

is rather larger than 2520, it is definitely not the answer.

The answer must be a multiple of all the primes smaller than 20. A number that is divisible by 15, will be divisible by
3 and 5.

The library “numbers” have a lot of useful functions. Primes(20) returns all primes smaller than 20, and prod() returns the product of all those primes

library(numbers)
prod(Primes(20))
## [1] 9699690

Could that be the answer?

What we are looking at is the modulo-operator. 9699690 modulo 2 – what is the remainder? We know that all the remainders, dividing by 1 to 20 must be 0.

prod(Primes(20)) %% 2
## [1] 0

And our large product is divisible by 2 without a remainder.

Thankfully the operator is vectorized, so we can do all the divisions in one go:

9699690 %% 1:20
##  [1]  0  0  0  2  0  0  0  2  3  0  0  6  0  0  0 10  0 12  0 10

Nope.

9699690 %% 4
## [1] 2

Leaves a remainder.

(2*9699690) %% 4
## [1] 0

Now I just need to find the number to multiply 9699690 with, in order for all the divisions to have a remainder of 0.
That is, change i in this code until the answer is true.

i <- 2
all((i*9699690) %% 1:20 == 0)
## [1] FALSE

Starting with 1*9699690, I test if all the remainders of the divisions by all numbers from 1 to 20 is zero.
As long as they are not, I increase i by 1, save i*9699690 as the answer, and test again.
If the test is TRUE, that is all the remainders are 0, the while-loop quits, and I have the answer.

i <- 1
while(!all((i*9699690) %% 1:20 == 0)){
 i <- i + 1
 answer <- i*9699690
}

Project Euler 187

Project Euler 187. Semiprimes

I had to look it up. Semiprimes are numbers that are the product of two prime numbers. And only two, although they may be equal.

There are ten of them below 30: 4, 6, 9, 10, 14, 15, 21, 22, 25 and 26.

16 is not. The only primefactor is 2, but it occurs four times.

How many of these semiprimes are there below 108?

That should be pretty straightforward: Generate all primes below 108, make all the multiplications, and count how many uniqe numbers there are below n, where n=108.

One problem:

n <- 10**8
numbers <- primes(n)
length(numbers)
## [1] 5761455

That is a lot of numbers to multiply.

A trick: 2 times all the primes below n/2 will give all the semiprimes that have 2 as one of the primefactors (smaller than n).

3 times all the primes below n/3 will in the same way give all the semiprimes, that have 3 as one of the primefactors.

If I can figure out how many primes there are below n/2, I get the number of semiprimes that has 2 as one of the two primefactors. The same for the number of primes below n/3. If continue that to \(\sqrt(n)\), and add them all together, I should get the total number of semiprimes below n.

One issue though. The first prime below n/2 that I multiply by 2, is 3. And the first prime below n/3 that I multiply by 3 is 2. Both giving 6. I need to figure out how to only count 6 one time.

I just generated all the primes below n. The number of primes below n/2 is:

length(numbers[numbers<n/2])
## [1] 3001134

And the number of primes below n/3 is:

length(numbers[numbers<n/3])
## [1] 2050943

I do want to multiply 3 by 3. But I need to exclude 2.

length(numbers[numbers<n/3 & numbers>=3])
## [1] 2050942

Qap’la! One less prime.

I just need to do it for all of them.

n_semi_primes <- function(x){
  counter <- 0
  for(i in numbers[numbers<=sqrt(x)]){
    counter <- counter + length(numbers[numbers<x/i & numbers>=i])
  }
  return(counter)
}

I’m writing this as a function, taking a limit x. A counter is set to 0. And for all primes i less than \(\sqrt(n)\), I add the number of primes between i and < x/i.

I can test it on the example given:

n_semi_primes(30)
## [1] 10

That was the number of semiprimes below 30. And then it is just a question of running it on 108:

answer <- n_semi_primes(10**8)

Project Euler problem 62

Euler problem 62

The cube, 41063625 (3453), can be permuted to produce two other cubes: 56623104 (3843) and 66430125 (4053). In fact, 41063625 is the smallest cube which has exactly three permutations of its digits which are also cube.

Find the smallest cube for which exactly five permutations of its digits are cube.

Alright. I need to find five cubes, that are permutations of the same digits.

How to check if two numbers are permutations of each other?

We can generate the largest permutation of a given number. If the largest permutation of two numbers are identical, the two numbers are permutations of each other.

So I need a function, that returns the largest permutation of a number. It would be nice, if that function was vectorized.

max_perm <- function(t){
  require(magrittr)
  options(scipen=5)
  t %>% 
    as.character() %>% 
    str_split("") %>% 
    lapply(sort, decreasing=TRUE) %>% 
    lapply(paste0, collapse="") %>% 
    unlist() %>% 
    as.double()
}

Convert the input to character. Split at “”. That returns a list with vectors containing the individual digits of the input. lapply sorts the individual vectors in the list in decreasing order. Then lapply pastes the elements in each vector together with paste0 and “” as the separator. Then it is unlisted, and returned as numeric.

What is worth noting is a thing I was struggling with for far too long. R likes to write numbers in scientific notation. As in “1e+06”. I have not studied the phenomenon in detail. But options(scipen=5) solves the problem. It is the “penalty” used to decide when a number should be written in scientific notation. Unless I change that (trial and error, but it should be larger than whatever is default), as.character(1000000) will return “1e+06”. And the permutations of “1” “e” “+” “0” “6” are not terribly useful in this context.

I’m hazarding a guess that I don’t need to handle cubes of values of more than four digits.

Beginning with a vector of all numbers from 1 to 9999, I convert it to a dataframe. I transmute the first column to a column with the name x.
Then I mutate a second column, cube, into existence, and calculate it as the cube of the x-value. A third column, max_cube, is mutated with the result from my max_perm function above. And tha column is immediately used to group the data, so I get date grouped by identical maximum values of the permutations. I filter on the count of those groups, and only keep the groups that contain 5 elements. Then I ungroup it, and select just the cube column.

I now have a data frame with a single column containing 10 values. They are all cubes, five of them are permutations of each other. The other five are also permutaions of each other. And now I just have to take the smallest of them.

result <- 1:9999 %>% 
  as.double() %>% 
  as.data.frame() %>% 
  transmute(., x = .) %>% 
  mutate(cube = x**3) %>% 
  mutate(max_cube = max_perm(cube)) %>% 
  group_by(max_cube) %>% 
  filter(n()==5) %>% 
  ungroup() %>% 
  select(cube) %>% 
  min()

Before I print the result, so I can enter it into Project Euler, I set options(digits=17).

Done! A good exercise. And a valuable lesson in the importance of the options in R.

Project Euler 203

Project Euler 203

Squarefree Binomial Coefficients. Building Pascals triangle to 51 rows (look it up – its a bloody nightmare to write out here): Locate all distinct numbers in that triangle, that does not divide by a square of a prime.

OK. First step is to get all the numbers in 51 rows of Pascals triangle.

R has a built in function, choose(i,j) that returns the number for row i, position j.

We can use that to iterate through all the possible positions in a 51 row triangle:

numbers <- 1
for(i in 0:50){
  for(j in 0:i){
    numbers <- c(numbers, choose(i,j))
  }
}

Next step is to make sure that we only have unique, distinct, values:

numbers <- unique(numbers)

We now need to divide each number by the square of a prime. My first instinct was to generate all primes smaller than the squareroot of the largest number in numbers.

That would be all primes lower than 11,243,247.

I would then square all those primes, and see if one of them divided neatly into the numbers in the triangel.

Thats an awfull lot of primes.

Much easier would be to note, that if the square of a prime divides neatly into a number, then the prime does as well. And is in fact a prime factor in that number.

And since we have a nice library that makes it easy to get the primefactors, thats the way to do it.

library(numbers)
library(purrr)

res <- numbers %>%
    discard(function(x) any(x%%(factors(x)**2)==0))

answer <- sum(res)+1

Passing the numbers vector to the discard function. Discard the element, if the element, modulo the primefactors in that element squared, has one (or more) results that are equal to 0.

The answer is the sum of the elements that are left. Plus 1, since 1 is discarded by the function. factors(1) – for some reason – returns 1.

Nice and simple really.

An engineers’ dream come true

Project Euler, problem 263 – An engineers’ dream come true

This is, at the time of coding, the highest numbered Project Euler problem I’ve tried to tackle. With a difficulty rating of 75% it is also the most difficult. At least on paper. But An engineers’ dream come true? How can I not, as an engineer, not try to solve it?

We har looking for an n, for which it holds that:

  • n-9 and n-3 must be consecutive primes
  • n-3 and n+3 must also be consecutive primes
  • n+3 and n+9 must also be consecutive primes.

These are primepairs that are “sexy”, that is that have differences of 6.

Also, n, n-8, n-4, n+4 and n+8 must be practical numbers, that is numbers where the numbes 1:n can be written as sums of distinct divisors of n.

So if a number n gives sexy prime pairs, and are very practical – that is what an engineer dreams of – hence the paradise.

The fastest way seems to be to generate a list of primes, sieve those out that conforms to the requirements for consecutive primes, and then test those numbers for practicality.

Lets get started!

The trusty numbers library provides the primes, up to 1000000. Then for each of those primes, return the n-value, if the differences between the sets of primes, are 6.

library(numbers)

primenumbers <- Primes(1000000)
primecandidates <- numeric()
for(i in 1:(length(primenumbers)-4)){
if((primenumbers[i+1] - primenumbers[i] == 6) & (primenumbers[i+2] - primenumbers[i+1]== 6) & (primenumbers[i+3] - primenumbers[i+2]== 6)){
  primcandidates <- c(primecandidates,(primenumbers[i]+9))
  }
}

What we get from this, is not primenumbers, but values of n, that gives the consecutive primes we need.

Now I have a list of candidate n’s based on the requirements for primes. Next I need to check for practicality.

First I tried a naive way. Generate the list of numbers that I need to sum to using distinct divisors, for a given x.
Then get the divisors of x. I dont need to check if I can sum to the numbers that are themselves divisors, so excluding them leaves me with at slightly smaller set. Then I get all the ways I can take two divisors from the set of divisors. Sum them, and exclude them from the list of numbers. I continue until I have taken all the possible combinations of 2, 3, 4 etc divisors, based on how many there are. If there are no numbers left in the vector of numbers that I need to be able to sum to, I was able to express all those numbers as sums of distinct divisors. And then x was a practical number.

practical <- function(x){
  test <- 1:x
  divs <- divisors(x)
  test <- setdiff(test,divs)
  for(i in 2:length(divs)){
    test <- setdiff(test,combn(divs,i,sum))
  }
  !as.logical(length(test))
}

Two problems. One that can be fixed. I continue to generate combinations of divisors and summing them, even if I have already found ways to sum all the numbers. The second problem is more serious. When I have to test a number with really many divisors – it takes a long time. Also, handling a vector containing all numbers 1:1000000 takes a lot of time.

I need a faster way of checking for practicality.

Wikipedia to the rescue. There exists a way of checking. I have no idea why it works. But it does.

For a number x, larger than 1, and where the first primefactor is 2. All primefactors are ordered. Taking each primefactor, that has to be smaller than or equal to the sum of the divisors of the product of all the smaller primefactors. Plus one. Oh! And that sum – if 3 is a primefactor twice, that is if 32 is a factor, I should square 3 in the product.

That sounds simple.

For a number x, get the primefactors. Use table to get the counts of the primefactors, ie that 3 is there twice. Those are the values of the result from the table function. The names of the table function are the primefactors.

For each factor from number 2 to the end of the number of factors, get the names of the primefactors from number 1 to just before that factor we are looking at (as numeric). Exponentiate with the values from the table – that is how many times a primefactor is a primefactor. Generate the product, get the divisors of that product, sum them, and add 1. If the factor we were looking at is larger that that, we have determined that x is not practical – and can return FALSE. If x gets through that process, it is practial.

I need to handle the case where there is only one primefactor – 2. Those numbers are practial, but the way I have done the check breaks when there is only one primefactor. Its simple enough, just check if there is only one distinct primefactor, and return TRUE in that case.

practical <- function(x){
  all_factors <- factors(x)
  all_factors <- table(all_factors)
  n_factors <- length(all_factors)
  res <- TRUE
    if(n_factors ==1){
    return(TRUE)
    break()
  }

  for(i in 2:n_factors){
    if((as.numeric(names(all_factors)[i]))>(sum(divisors(prod(as.numeric(names(all_factors)[1:i-1])**as.numeric(all_factors)[1:i-1])))+1)){
      return(FALSE)
      break()
      }
  }
  return(TRUE)
}

So, quite a bit faster!

Now I can take my candidate n’s based on the requirements for primepairs, and just keep the n’s that are themselves practical. And where n-8, n-4, n+4 and n+8 are also practial:

eng_numbers <- primecandidates %>%
  keep(function(x) practical(x-8)) %>%
  keep(function(x) practical(x-4)) %>%
  keep(function(x) practical(x)) %>%
  keep(function(x) practical(x+4)) %>%
  keep(function(x) practical(x+8))

eng_numbers
## numeric(0)

OK. There was no n’s in this set.

This is kinda problem. The n we are looking for are actually pretty large. I know this, because this writeup is written after I found the solution. So it is not because the code is flawed.

Nu har vi så den udfordring, at vi skal have fat i ret høje tal.

primenumbers <- primes(500000000)
primecandidates <- numeric()
for(i in 1:(length(primenumbers)-4)){
if((primenumbers[i+1] - primenumbers[i] == 6) & (primenumbers[i+2] - primenumbers[i+1]== 6) & (primenumbers[i+3] - primenumbers[i+2]== 6)){
  primecandidates <- c(primecandidates,(primenumbers[i]+9))
  }
}
eng_numbers <- primecandidates %>%
  keep(function(x) practical(x-8)) %>%
  keep(function(x) practical(x-4)) %>%
  keep(function(x) practical(x))   %>%
  keep(function(x) practical(x+4)) %>%
  keep(function(x) practical(x+8))

length(eng_numbers)
## [1] 3

That gives us three. We need four.

Now – I could just test larger and larger sets of primes. I run into memory problems when I try that. Then I could write some code, that generates primenumbers between some two numbers, and increse those numbers until I get number four.

I have not. Instead I just tried again and again, until I found number 4. We need to have an upper limit for primes that are some four times larger than the one I just used. Anyway, I found number four. And got the green tickmark.

Lesson learned

Sometimes you really need to look into the math. This would have been impossible if I had not found a quicker way to check for practicality.

Also, I should refactor this code. The check for practicality could certainly be optimised a bit.

And – I should probably check for primality, rather than generating the list of primes.

Stacked barcharts with ggplot2

Barcharts to percentages

Maybe not the most correct headline.

Anyway. Assume we have five cases, with two values for a given treatment, each labelled “AVO” and “INT”. How to get a barchart, where each cases is one bar, and the bars are coloured based on the percentages of the two values?

Lets make some sample data:

set.seed(47)
cases <- 1:5
cases <- c(cases, cases)
treat1 <- rep("AVO", 5)
treat2 <- rep("INT", 5)
treat <- c(treat1, treat2)
val <- runif(10)

df <- data.frame(cases=cases, treatment = treat, value = val, stringsAsFactors = FALSE)

How does that look?

df
##    cases treatment     value
## 1      1       AVO 0.9769620
## 2      2       AVO 0.3739160
## 3      3       AVO 0.7615020
## 4      4       AVO 0.8224916
## 5      5       AVO 0.5735444
## 6      1       INT 0.6914124
## 7      2       INT 0.3890619
## 8      3       INT 0.4689460
## 9      4       INT 0.5433097
## 10     5       INT 0.9248920

OK, nice and tidy data.

What I want is a bar for case 1, filled with percentages. One color with 0.976/(0.976+0.691) and another color for 0.691/(0.976+0.691).

library(ggplot2)
ggplot(df)+
  geom_col(aes(y=value, x=cases, fill=treatment), position = "fill")

plot of chunk unnamed-chunk-3

We’re using geom_col. That is a shorthand for geom_bar with stat=“identity”, and is the shortcut to expressing values in the barchart.

The y-value is the value from df, the x-value is the cases. And we’re colouring the bars based on the value in treatment. The position=“fill” stack the bars, and normalize the heights. If we wanted to just stack the elements, we would use position=“stack”:

ggplot(df)+
  geom_col(aes(y=value, x=cases, fill=treatment), position = "stack")

plot of chunk unnamed-chunk-4

This is a great, like really great, cheatsheet for ggplot:

https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf

Project Euler 54

Poker hands. Project Euler number 54

Given 1000 random hands of 5 playing cards dealt to two players, how many hands does player 1 win?

We have the following notation for the cards:

2, 3, 4, 5, 6, 7, 8, 9, T, J, Q, K, A

Where T is ten, J is Jack, Q is Queen, K is King and A is Ace.

And the suits:

H, C, S, D

We have a number of different combinations:

  1. High Card: Highest value card.
  2. One Pair: Two cards of the same value.
  3. Two Pairs: Two different pairs.
  4. Three of a Kind: Three cards of the same value.
  5. Straight: All cards are consecutive values.
  6. Flush: All cards of the same suit.
  7. Full House: Three of a kind and a pair.
  8. Four of a Kind: Four cards of the same value.
  9. Straight Flush: All cards are consecutive values of same suit.
  10. Royal Flush: Ten, Jack, Queen, King, Ace, in same suit.

If a given hand is a Flush, it gets the rank of 6. Which means it beat a Straight, which has the rank of 5.

That should be simple. Write a function that returns the rank of a given hand, and determine if one hand beats another hand.

However, two hands with the rank 2, are not necessarily equal. One pair of fours is better than one pair of twos.

So the function should also return some value that can be used for discriminating between different hands with the same rank.

A hand comes on the form:

5H 5C 6S 7S KD

I think it might be useful to have a function that returns the suits and the values.

library(dplyr)
suits <- function(s){
  delres <- unlist(strsplit(s, split = " "))
  sapply(delres, function(x) substr(x,2,2), USE.NAMES = FALSE)
}

values <- function(s){
  delres <- unlist(strsplit(s, split = " "))
  sapply(delres, function(x) substr(x,1,1), USE.NAMES = FALSE)
}

Testing for a Royal Flush. All values of suits should be identical. And the values should be T, J, Q, K, A. That is simple. Check if all the values that should be in the hand, if it is a Royal Flush are there. Then check if the suit is the same for all cards in the hand. If both are true, return TRUE.

is_royal_flush <- function(testhand){
  testvalue <- c("T", "J", "Q", "K", "A")
  value_match <- all(testvalue %in% values(testhand))
  suits_match <- length(unique(suits(testhand)))==1
  return(as.logical(value_match*suits_match))
}

Straight Flush: All cards are consecutive values of same suit.

Testing for the same suit is copy-paste from above. Testing for consecutive values is a little bit more tricky. I cant just sort the values, since K is before Q in the alphabet. So I’m using dplyr::recode() to recode the T, J, Q, K, A values to numbers. The I can convert them to numeric and sort them. And I’ll just have to figure out if the values are consecutive.

I had to google a bit. But diff() does it. It calculates the difference between the elements. It is possible to define a lag, but here the default of 1 does the trick. It returns the second element in the vector, minus the first element. And the third element minus the second.

When I have five elements, and they have consecutive values, diff() will return a vector of 4 ones.

is_straight_flush <- function(testhand){
  suits_match <- length(unique(suits(testhand)))==1
  testvalues <- values(testhand)
  testvalues <- recode(testvalues, T = "10", J = "11", Q = "12", K = "13", A = "14")
  testvalues <- as.numeric(testvalues)
  testvalues <- sort(testvalues)
  value_match <- all(diff(testvalues) == c(1,1,1,1))
  return(as.logical(value_match*suits_match))
}

Four of a Kind: Four cards of the same value.

We are not interested in the suits here. First I get the values. Then I use table() to get the counts of the different values. Passing that to max() returns 4, if there are four identical values.

is_four_of_a_kind <- function(testhand){
  testvalues <- values(testhand)
  testvalues <- table(testvalues)
  return(max(testvalues)==4)
}

Full House: Three of a kind and a pair. I use the same functionality as in is_four_of_a_kind. The trick here is, that the max of the tabulation of values should be 3. But the minimum should also be 2.

is_full_house <- function(testhand){
  testvalues <- values(testhand)
  testvalues <- table(testvalues)
  (max(testvalues)==3)&(min(testvalues)==2)
}

Flush: All cards of the same suit. In is_straight_flush I test if the suits are the same. That is what I need here, I just skip the part where I test if the values are consecutive.

is_flush <- function(testhand){
  suits_match <- length(unique(suits(testhand)))==1
  return(as.logical(suits_match))
}

Straight: All cards are consecutive values. This is more or less the same function is_straight_flush(). I just skip the test for suits.

is_straight <- function(testhand){
  testvalues <- values(testhand)
  testvalues <- recode(testvalues, T = "10", J = "11", Q = "12", K = "13", A = "14")
  testvalues <- as.numeric(testvalues)
  testvalues <- sort(testvalues)
  value_match <- all(diff(testvalues) == c(1,1,1,1))
  return(as.logical(value_match))
}

Three of a Kind: Three cards of the same value. That is basically the same as the function is_four_of_a_kind() above. The maximum value from the table should just be 3 instead of 4:

is_three_of_a_kind <- function(testhand){
  testvalues <- values(testhand)
  testvalues <- table(testvalues)
  return(max(testvalues)==3)
}

Two Pairs: Two different pairs. First I use values() to get the values of the hand. I’m not interested in the suits.
Then I tabulate the testvalues. I get a count of the number each value occurs in the hand.
But I’m no interested in that number. If there are two pairs in the hand, I will get two values, that occurs two times.

So if I run table() on the result again, I get a count of how many times the count 2 occurs. If the count 2 occurs twice, there are two hands.

is_two_pairs <- function(testhand){
  testvalues <- values(testhand)
  testvalues <- table(testvalues)
  testvalues <- table(testvalues)
  2 %in% testvalues
}

One Pair: Two cards of the same value. Again, basically the same as is_three_of_a_kind(), with 2 instead of 3 in the test:

is_pair <- function(testhand){
  testvalues <- values(testhand)
  testvalues <- table(testvalues)
  return(max(testvalues)==2)
}

It will return FALSE if there are three of a kind. I’m interested in the highest rank a hand can get, so that should not be a problem.

High Card: Highest value card. I don’t need to test for this. If a hand has not gotten a higher rank, it will always have this.

Nice. I now have a set of functions that I can use to figure out what hand has the highest rank.

Lets put that together:

rank <- function(testhand){
  res <- 1 # The hand will never get a rank lower than 1.
  if(is_pair(testhand)){res <- 2}
  if(is_two_pairs(testhand)){res <- 3}
  if(is_three_of_a_kind(testhand)){res <- 4}
  if(is_straight(testhand)){res <- 5}
  if(is_flush(testhand)){res <- 6}
  if(is_full_house(testhand)){res <- 7}
  if(is_four_of_a_kind(testhand)){res <- 8}
  if(is_straight_flush(testhand)){res <- 9}
  if(is_royal_flush(testhand)){res <- 10}
  return(res)
}

That should solve a lot of the cases. However. What if the rank of two hands are the same?

If both hands have rank 9, that is if they are both straight flush, the hand with the highest card wins. There is no reason to test for the rest of the cards.

For all the others, I need functions that breaks the tie. It it should be broken.

Most of those determinations will be based on the value of the cards. So I might as well have a function that recodes ace to 14 etc.

recode_value <- function(values){
  as.numeric(recode(values, T = "10", J = "11", Q = "12", K = "13", A = "14"))
}

I’m probably going to need the reverse function:

rev_recode_value <- function(values){
  recode(as.character(values), "10" = "T", "11" = "J", "12" = "Q", "13" = "K", "14" = "A")
}

For High Card, the hand with the highest value card wins. If the highest value card in both hand 1 and hand 2 is equal, we look at the second highest card in each hand. Etc. Only if the values are identical in the two hands is there a tie.

break_high_card <- function(hand1, hand2){
  v1 <- values(hand1)
  v2 <- values(hand2)
  v1 <- sort(recode_value(v1))
  v2 <- sort(recode_value(v2))
  if((identical(v1,v2))){
    return(0)
    break()
  } else{
  while(v1[length(v1)]==v2[length(v2)]){
    v1 <- v1[-length(v1)]
    v2 <- v2[-length(v2)]
  }  
  }
  if(max(v1)>max(v2)){
    return(1)
  } else {
    return(2)
  }
}

All right. That was not easy… First I get the values for the two hands. Then I recode them to numerical values, and sort them.

If the two hands have equal values, it is a tie. Otherwise, while the last element in the first vector containing the values is equal to the last element in the second vector, I get rid of the last element in both vectors.

When that is done, the last element in the two vectors is compared. The hand with the highest value wins.
The funcion returns 0 if it is a tie, 1 if hand1 wins, and 2 if hand2 wins.

This was actually the hardest part to write.

Next. Breaking the tie between two hands each having one pair.

break_one_pair <- function(hand1, hand2){
  v1 <- values(hand1)  
  v2 <- values(hand2)
  v1 <- recode_value(v1)
  v2 <- recode_value(v2)
  v1 <- table(v1)
  v2 <- table(v2)
  v1_pair <- as.numeric(names(v1)[v1==2])
  v2_pair <- as.numeric(names(v2)[v2==2])
  if(v1_pair>v2_pair){
    return(1)
    break()
  }
  if(v2_pair>v1_pair){
    return(2)
    break()
  }
  if(v2_pair==v1_pair){
    v1 <- names(v1)[!v1==2]
    v2 <- names(v2)[!v2==2]
    v1 <- rev_recode_value(v1)
    v2 <- rev_recode_value(v2)
    return(break_high_card(v1,v2))
    break()
  }
}

First I’m picking out the values of the pairs in the two hands. If hand1 has a pair of 8’s and hand2 has a pair of 7’s, hand1 wins.

If the pairs are of the same value. Then I pick out the values of the rest of the cards. I am making a dangerous assumption here. This will only work if the two hands actually only have one pair. I hope it works, but this is one of the places where it could go wrong.

Two Pairs: Two different pairs.

The hand with the highest value pair wins. If those values are identical, the value of the second pair determines the winner. If that is also equal, the value of the fifth remaining card, determines the winner.
Only if that is also equal, is there a tie. Therefore, there can only be a tie, if all values of all cards are equal.

break_two_pairs <- function(hand1, hand2){
  v1 <- values(hand1)  
  v2 <- values(hand2)
  v1 <- recode_value(v1)
  v2 <- recode_value(v2)
  v1 <- table(v1)
  v2 <- table(v2)
  v1_pair <- sort(as.numeric(names(v1)[v1==2]))
  v2_pair <- sort(as.numeric(names(v2)[v2==2])) # now v1_pair and v2_pair contains the values of each of the pairs in each of the hands.
  if(identical(v1_pair,v2_pair)){        # If the pairs have the same value, pick out the fifth value, and compare them.
    v1 <- as.numeric(names(v1)[!v1==2])
    v2 <- as.numeric(names(v2)[!v2==2])
    if(v1>v2){
      return(1)
      break()
    }
    if(v2>v1){
      return(2)
      break()
    }
    if(v1==v2){
      return(0)
      break()
    }
  } else {
    v1 <- rev_recode_value(v1_pair)
    v2 <- rev_recode_value(v2_pair)
    return(break_high_card(v1,v2))
  }
}

Kinda the same as for one pair. v1_pair and v2_pair ends up with vectors containing the values of the two pairs in the two hands.

If these two vectors are identical, we need to look at the fifth value.

If not, I reverse_recode the values of the pairs, and use break_high_card, to determine which is larger.

Three of a Kind: Three cards of the same value.

Again I’m making the assumption that the two hands passed to this function have three of a kind. And nothing better than that.

This is a bit simpler. There is no way that both hands can have three of a kind with the same value. So I don’t need to consider the remaining cards. The assumption here is that there is only on deck of cards involved. This is another place where things can go wrong. But lets try!

break_three_of_a_kind <- function(hand1, hand2){
  v1 <- values(hand1)
  v2 <- values(hand2)
  v1 <- recode_value(v1)
  v2 <- recode_value(v2)
  v1 <- table(v1)
  v2 <- table(v2)
  v1 <- as.numeric(names(v1)[v1==3])
  v2 <- as.numeric(names(v2)[v2==3])
  if(v1>v2){
    return(1)
    break()
  } else {
    return(2)
    break()
  }
}

I convert to values, recode them, tabulates them. And pick out the values that occur three times. Then I just compare them.

Straight: All cards are consecutive values.

Once more I assume that the two hands passed to this function actually only contains a straigth. It is rather simple. As the values must be consecutive, the highest value in each hand determines the winner.

break_straigth <- function(hand1,hand2){
  v1 <- values(hand1)
  v2 <- values(hand2)
  v1 <- recode_value(v1)
  v2 <- recode_value(v2)
  v1 <- max(v1)
  v2 <- max(v2)
  if(v1>v2){
    return(1)
    break()
  } 
  if(v2>v1){
    return(2)
    break()
  }
  if(v2==v1){
    return(0)
    break()
  }
}

Flush: All cards of the same suit.

Again – I assume that flush is the highest combination in these hands. This is basically the same as high card.

break_flush <- function(hand1, hand2){
  return(break_high_card(hand1,hand2))
}

Full House: Three of a kind and a pair.

The same assumption applies. These two hands both have Full House as the highest rank.

The winner is determined by the higher value of the three of a kind.

break_full_house <- function(hand1,hand2){
  return(break_three_of_a_kind(hand1,hand2))
}

Four of a Kind: Four cards of the same value.

Same as for the full house. The winner is determined by the higher value of the four of a kind.

break_four_of_a_kind <- function(hand1,hand2){
  v1 <- values(hand1)  
  v2 <- values(hand2)
  v1 <- recode_value(v1)
  v2 <- recode_value(v2)
  v1 <- table(v1)
  v2 <- table(v2)
  v1 <- sort(as.numeric(names(v1)[v1==4]))
  v2 <- sort(as.numeric(names(v2)[v2==4]))
  if(v1>v2){
    return(1)
    break()
  } else {
    return(2)
    break()
  }
}

Straight Flush: All cards are consecutive values of same suit.

Again I’m making dangerous assumptions. But, this is functionally the same as high card.

break_straight_flush <- function(hand1,hand2){
  return(break_high_card(hand1,hand2))
}

Royal Flush: Ten, Jack, Queen, King, Ace, in same suit.

If both hand are royal flush – it will always be a tie.

break_royal_flush <- function(hand1,hand2){
  return(0)
}

Phew…

I now have a function that will return the rank of a given hand.

If I compare the rank of two hands, I will either get a clear winner, or need to run one of the break functions to determine the result.

winner <- function(hand1, hand2){
  if(rank(hand1)>rank(hand2)){
    return(1)
    break()
  }
  if(rank(hand2)>rank(hand1)){
    return(2)
    break()
  }
  if(rank(hand2)==rank(hand1)){
    rang <- rank(hand2)
    if(rang==1){
      return(break_high_card(hand1,hand2))
      break()
    }
    if(rang==2){
      return(break_one_pair(hand1,hand2))
      break()
    }
    if(rang==3){
      return(break_two_pairs(hand1,hand2))
      break()
    }
    if(rang==4){
      return(break_three_of_a_kind(hand1,hand2))
      break()
    }
    if(rang==5){
      return(break_straigth(hand1,hand2))
      break()
    }
    if(rang==6){
      return(break_flush(hand1,hand2))
      break()
    }
    if(rang==7){
      return(break_full_house(hand1,hand2))
      break()
    }
    if(rang==8){
      return(break_four_of_a_kind(hand1,hand2))
      break()
    }
    if(rang==9){
      return(break_straight_flush(hand1,hand2))
      break()
    }
    if(rang==10){
      return(break_royal_flush(hand1,hand2))
      break()
    }
  }
}

Phew… Again.

What should be left now, is to load the hands, and test them.

data <- read.csv("https://projecteuler.net/project/resources/p054_poker.txt", header=FALSE, stringsAsFactors = FALSE)
library(tidyr)
library(stringr)

answer <- data %>%
  mutate("h1" = str_sub(V1, 1,14)) %>%
  mutate("h2" = str_sub(V1, 16,30)) %>%
  rowwise() %>%
  mutate("w" = winner(h1,h2)) %>%
  filter(w==1) %>%
  nrow()

I read in the file to data, make a column h1 containing the first 14 characters of each line in the datafile, and a column h2 containing the rest of the line.

rowwise() groups the dataframe by row – so it doesn’t matter that my winner() function is not that vectorized. For each set of hands I calculate the winner, filter the result so I only have the rows where hand 1 wins. And then I get the number of rows – which are the answer.

Lessons learned

Some. rowwise() is a usefull function. I’m certain I dont understand everything about it, but it would seem to solve a fair number of the problems I’ve had using non-vectorized functions.

I should probably also learn to set aside some time for re-factoring my code before publishing it.

Project Euler 37

Project Euler 37

3797 is a prime. What is interesting about this prime, is that if we truncate it one digit at a time from left, ie:
3797, 797, 97, 7 every step is also prime. And if we do the same from the right, we see the same property:
3797, 379, 37, 3.

There are 11 primes with this property. What is the sum of them?

2, 3, 5 and 7 are not considered to be truncatable primes.

And that gives us a hint. The first digit of a truncatble prime must be either 2, 3, 5 or 7. And the last digit must be 3, 5 or 7.

Lets begin by loading the package numbers:

library(numbers)

That gives us couple of useful functions. isPrime(x) returns true if x is prime. And Primes(x,y) returns all primes between x and y.

I am going to need a couple of functions. One that repeatedly truncates a number from the left, and returns true if all the steps in the sequence are prime. And another that does the same, just from the rigth.

truncleft <- function(x){
  res <- TRUE
  while(x>9){
    x <- x %/% 10
    res <- isPrime(x)*res
  }
  return(as.logical(res))
}

truncright <- function(x){
  res <- TRUE
  while(x>9){
    x <- x - (x %/% 10**floor(log10(x)))*(10**floor(log10(x)))
    res <- isPrime(x)*res
  }
  return(as.logical(res))
}

Now I can determine if a given number is truncatable prime from both left and right. Next, finding all 11 primes with that property.

Lets take a chance, and see if not number 11 is found before we reach 1000000

library(purrr)
answer <- Primes(9,1000000) %>%
  keep(truncleft) %>%
  keep(truncright) %>%
  sum()

Yep. The difficult part is finding number 11. The first four primes with this property have two digits – you can find them manually. The next four are a bit more difficult. Number 10 is the example Project Euler provides.

Lessons learned

Well – purrr is pretty useful. But I already knew that.

Project Euler – 28

Number spiral diagonals

We can make squares of numbers, by starting with 1, and add numbers in a spiral.

It is easier to show than descripe:

square <- matrix(c(21, 22, 23, 24, 25,20,  7,  8,  9, 10,19,  6,  1,  2, 11,18,  5,  4,  3, 12,17, 16, 15, 14, 13), 5,5,byrow=T)
square
##      [,1] [,2] [,3] [,4] [,5]
## [1,]   21   22   23   24   25
## [2,]   20    7    8    9   10
## [3,]   19    6    1    2   11
## [4,]   18    5    4    3   12
## [5,]   17   16   15   14   13

This is a 5 by 5 square. Start at 1, add the 2 to the right of that, and add numbers in a clockwise spiral.

If we look at all the numbers in the diagonals, we can see that the sum of those numbers, is 101. Ie, 21 + 7 + 1 + 3 + 13 + 25 + 9 + 5 + 17 = 101

We are now asked: What is the sum of the numbers in the diagonals in a 1001 by 1001 square?

Looking at an n by n square, where n is uneven, we can see that the north-eastern corner of the square will always be n2.

We know that the side is n. So the value of the north-western corner will always be n2 – (n-1).

Similar the south-western corner is n2 – 2(n-1). And the south-eastern is n2 – 3(n-1).

The first “square” is a special case. But I can now calculate the corners in all the squares with unequal n.

The sum of those corners is:

n2 + n2 -(n-1) + n2 – 2(n-1) + n2 – 3(n-1)

Which reduces to:

4*n2 – 6n + 6

If I calculate that for all unequal n from 3 to 1001, sum them, and add 1, I have my answer.

library(purrr)
answer <- seq(3,1001,by=2) %>%
  map(function(x) 4*x*x - 6*x + 6) %>%
  unlist() %>%
  sum() 
answer <- answer + 1

Lessons learned

Not really.

Sudokus – Project Euler 96

This problem is basically about making a sudoku solver, that can be automated. If you don’t know what a sudoku is – google it and come back.

I have solved a LOT of sudokus. Manually. There are several techniques, but some of them are not that easy to code. And of course I’ll need a method that guarantees a solution.

Fortunately a brute force solution exists:

https://en.wikipedia.org/wiki/Sudoku_solving_algorithms

It is called backtracking. The process is to look at all the empty cells.

In the first one – insert the lowest allowed value, and progress to the next empty cell. Insert the lowest allowed value in it. Continue until you reach a cell with no allowed values. When you do that, go back to the previous cell, and insert the next-lowest allowed value. And then go back (forward?) to the next cell. If the cell you went back to also have no allowed values, you go one step futher back. Continue until you reach the last cell in the puzzle.

So. We’ll need a few things.

First of all we’ll need a matrix in which to keep the sudoku.

I’m going with a matrix rather than a dataframe after reading the R Inferno (https://www.burns-stat.com/pages/Tutor/R_inferno.pdf), that suggests that there is no reason to work with a dataframe if a matrix will do.

I’ll need a couple, or three (actually more like five) functions to figure out what values are allowd for a given cell in a given sudoku.

And I’ll need a way to keep track of which cells that are “frozen”, and which cells should be filled out.

Lets begin by making a matrix to keep the puzzle in. Project Euler provides us with a puzzle and the matching solution. And a file with 50 puzzles that needs to be solved.

Lets read in that file.

problems <- read.csv("p096_sudoku.txt",header=FALSE, stringsAsFactors = FALSE)

The structure is rather simple. 500 rows. First one row with an identifier, eg “Grid 01”. And then 9 rows with just the rows of digits, 0 for the empty cells.

We’ll need to pick out first the rows 1 to 10, then rows 11 to 20 etc.

Lets make a dataframe, and read it in:

start <- seq(1,500,by=10)
end <- seq(10,500,by=10)
prob_df <- data.frame(id=character(),problem=character(), stringsAsFactors = FALSE)
for(i in 1:50){
  prob_df[i,2] <- paste(problems[start[i]:end[i],][-1], collapse="")
}
prob_df[1,2]
## [1] "003020600900305001001806400008102900700000008006708200002609500800203009005010300"

Nice and simple, 50 rows with an 81 character long string of digits. This is the first puzzle in the file. It is also the puzzle where we are provided with a solution.

We now need to convert that to a matrix.

Or – we don’t need to. There are certainly techniques that would allow me to handle it as-is. But I think it is more intuitive to get it into a matrix.

So, lets do that.

mat <- matrix(as.numeric(unlist(strsplit(prob_df[1,2],""))),9,9,byrow=TRUE)  
mat
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
##  [1,]    0    0    3    0    2    0    6    0    0
##  [2,]    9    0    0    3    0    5    0    0    1
##  [3,]    0    0    1    8    0    6    4    0    0
##  [4,]    0    0    8    1    0    2    9    0    0
##  [5,]    7    0    0    0    0    0    0    0    8
##  [6,]    0    0    6    7    0    8    2    0    0
##  [7,]    0    0    2    6    0    9    5    0    0
##  [8,]    8    0    0    2    0    3    0    0    9
##  [9,]    0    0    5    0    1    0    3    0    0

Neat. Split the string in individual characters. Unlist it, and cast it to numeric. Then pass it all to matrix(), noting that we want 9 rows and 9 columns. And that the matrix should be filled by row.

Depending on where you count from, the first empty cell is 1,2. Given the values that are already filled in, what values are allowed in that cell?

First, what values are allowed based on the row? From inspection it is obvious that only the values 1, 4, 5, 6, 7, 8 and 9 are allowed. Lets write a function for that:

rowall <- function(x,mat){
  setdiff(1:9, mat[x,])
}
rowall(1,mat)
## [1] 1 4 5 7 8 9

The function takes a row-number, in this example 1, and a matrix. mat[x,] returns the values in that row. And setdiff returns the values that are in 1:9 but not in the list of values already in the row.

More or less the exact same thing for the column:

# returnerer tilladte værdier i kolonne y i en matrix mat
colall <- function(y,mat){
  setdiff(1:9, mat[,y])
}
colall(2, mat)
## [1] 1 2 3 4 5 6 7 8 9

What about the frame, the 3×3 set of numbers?

I need a subset of the matrix. For our example, cell 1,2, I need the columns 1, 2 and 3. And the rows 1, 2 and 3. Or a vector 1:3. Thankfully there is symmetry, so column 1 gives the same vector as row 1. I need a function that returns 1:3 when I pass it 1, 2 or 3. 4:6 for the values 4, 5 or 6. and 7:9 for the values 7, 8 or 9.

My good colleague Henrik has tried to solve the same problem using Python. And he suggested that the way to do it is by integer division. In this way:

framelookup <- list(a=c(1,2,3),b=c(4,5,6),c=c(7,8,9))

getint <- function(x){
  x <- (x -1)%/%3+1
  unlist(framelookup[x])
}
  • Take the value x.
  • Substract 1 and divide the result by 3.
  • Then add 1. For x = 1,2 or 3, the result is 1, for x = 4, 5 or 6, the result is 2. And for 7 through 9, we get 3.
  • We can then use that to look up the vector we need in a list, and return the unlisted number.

At first I had done it with three if-statements. This solution is a bit more elegant, and Henrik thinks that if-statements should be outlawed. So in the interest of keeping the peace in our office, I’m going with that.

Now I can write a function that returns the values allowed in a given cell in a given matrix, given the values that are already filled out in the frame that cell is in:

fraall <- function(x,y,mat){
  res <- mat[getint(x),getint(y)]
  setdiff(1:9, res)
}

The function takes the coordinates of the cell, and a matrix. Based on that, getint() returns the vectors describing the frame, and saves that subframe in res. Then I use the setdiff() function in the same way I’ve done previously.

That gives me all the constraints. What values are allowed based on row, column and frame. The intersection between these three vectors gives me the allowed values for the cell. It is now simple to write a function that returns the values allowed for a given cell in a given matrix:

allowed <- function(x,y,mat){
  res <- intersect(rowall(x,mat), colall(y,mat))
  res <- intersect(res, fraall(x,y,mat))
  return(res)
}

Actually this is all I need for solving quite a lot of sudokus. Go through all the empty cells. Figure out what values are allowed. If only one value is allowed, plot it into the cell. Repeat until there are no more empty cells where only one value is allowed. A lot of sudokus can be solved with only that method.

The ones that cannot be solved in this way can be solved by the backtracking algorithm. But that is slow, so it might be a good idea to simplify the those puzzles by filling out what can be filled out with this method first.
So let’s write a function that does exactly that.

First I’ll need a list of the empty cells. The which() function can help me.

test <- which(mat ==0, arr.ind=T)
nrow(test)
## [1] 49

Which parts of the matrix is equal to 0? and arr.ind tells that what we want returned (when which is used on an array) is the array indeces. There are 56 0’es in the puzzle.

The matrix test has this structure:

head(test)
##      row col
## [1,]   1   1
## [2,]   3   1
## [3,]   4   1
## [4,]   6   1
## [5,]   7   1
## [6,]   9   1

The first pair of values is 2,1, the next is 4,1 etc. And if I take each one of those, looks at the allowed values for those positions in the matrix, and, if there is only one value allowed, place that value in that position, I get a step closer to the solution.

This does exactly that:

for(i in 1:nrow(test)){
  all <- allowed(test[i,1],test[i,2],mat)
  if(length(all)==1){
    mat[test[i,1],test[i,2]] <- all
  }
}
nrow(which(mat ==0, arr.ind=T))
## [1] 43

And, hey presto, 6 empty cells were filled. The logic is: Find the list of allowed values based on the values in the list of empty cells. If the length is 1, set the current cell to that value.

Now I need to get a new list of empty cells:

test <- which(mat ==0, arr.ind=T)

And I could do it again:

for(i in 1:nrow(test)){
  all <- allowed(test[i,1],test[i,2],mat)
  if(length(all)==1){
    mat[test[i,1],test[i,2]] <- all
  }
}
nrow(which(mat ==0, arr.ind=T))
## [1] 29

14 previously empty cells filled!

Lets make a function that does that again and again, until there is no more improvement.

We’ll start by writing a function that does it one time:

singlevalues <- function(mat){
  test <- which(mat ==0, arr.ind=T)
  for(i in 1:nrow(test)){
    all <- allowed(test[i,1],test[i,2],mat)
  if(length(all)==1){
    mat[test[i,1],test[i,2]] <- all
  }
  }
  return(mat)
}

Then I need to run that several times. When the number of empty cells after running it is the same as the number of empty cells before, stop. No more values can be filled in.

repsinglevalues <- function(mat){
  oldlen <- 1
  newlen <- 0
  while(oldlen > newlen){
  oldlen <- length(which(mat == 0))
  mat <- singlevalues(mat)
  newlen <- length(which(mat == 0))
  if(newlen==0) break()
  }
  return(mat)
}

Take a matrix. Set oldlen to 1, and newlen to 0. As long ans oldlen is larger than newlen, do this:
Set oldlen to the number of 0’es in the matrix. Run the singlevalues function from above. Find out how many empty cells, or zeroes there are now, and set newlen to that. If newlen is equal to 0, stop everything, otherwise repeat. And finally return the changed matrix.

Lets test it:

repsinglevalues(mat)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
##  [1,]    4    8    3    9    2    1    6    5    7
##  [2,]    9    6    7    3    4    5    8    2    1
##  [3,]    2    5    1    8    7    6    4    9    3
##  [4,]    5    4    8    1    3    2    9    7    6
##  [5,]    7    2    9    5    6    4    1    3    8
##  [6,]    1    3    6    7    9    8    2    4    5
##  [7,]    3    7    2    6    8    9    5    1    4
##  [8,]    8    1    4    2    5    3    7    6    9
##  [9,]    6    9    5    4    1    7    3    8    2

Yeah! The sudoku is solved!

But not every puzzle can be solved in that way. Now for the brutish way…

First of all, lets get a fresh copy of the original puzzle read in. And a fresh list of empyt cells.

All righty! Nu kan jeg løse en del sudokuer alene ved denne metode. Så er der resten…

mat <- matrix(as.numeric(unlist(strsplit(prob_df[1,2],""))),9,9,byrow=TRUE)  
test <- which(mat ==0, arr.ind=T)
nrow(test)
## [1] 49

Back to the 49 empty cells.

The algorithm is as follows.

  • Set i <- 1
  • Get the row and column indeces from test[i,1] and test[i,2] respectively.
  • Identify the list of allowed values for that cell.
  • Set the value of the cell to the lowest of the allowed values.
  • Increment i <- i + 1.
  • Get the row and column indeces again.
  • Get the list of allowed values for that cell. If there are no allowed values, decrement i <- i -1.
  • If there are allowed values, set the value of the cell to the lowest of the allowed values.
  • When we return to a cell that already has a value, we should not set it to the lowest allowed value. We should set it to the next lowest. That is the lowest of the allowed values, excluding the value that was already there.
  • If we get to a cell with no allowed values (excluding the one that might already be there), the value of the cell should be set to 0, and i should be decrementet.

This function does that.

solve <- function(mat){
  i <- 1
  test <- which(mat ==0, arr.ind=T)
  while(nrow(which(mat==0,arr.ind=T))>0){
    allowedvalues <- allowed(test[i,1],test[i,2],mat)
    currentvalue <- mat[test[i,1],test[i,2]]
    allowedvalues <- allowedvalues[allowedvalues>currentvalue]
    if(length(allowedvalues)==0){
      mat[test[i,1],test[i,2]] <- 0
      i <- i -1
    } else{
      mat[test[i,1],test[i,2]] <- min(allowedvalues)
      i <- i + 1
    }
  }
  mat
  }

The function takes a matrix. It sets the counter/pointer i to 1.

Then a test-matrix is generated, containing all the positions that are empty.

While the number of empty places in the matrix is larger than zero do this:

  • Find the allowed values.
  • Find the current value in the cell.
  • Remove all allowed values that are smaller than the current value. In that way, taking the minimum of the remaining allowed values, will always give us the next lowest value. And if the original value was 0, we will get the lowest.
  • If there are no allowed values, set the value of the cell to zero, and decrement the counter.
  • If there are allowed values, set the value of the cell to the lowest of the remaining allowed values, and increment the counter.
  • End by returning the now filled matrix.

Does it work?

solve(mat)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
##  [1,]    4    8    3    9    2    1    6    5    7
##  [2,]    9    6    7    3    4    5    8    2    1
##  [3,]    2    5    1    8    7    6    4    9    3
##  [4,]    5    4    8    1    3    2    9    7    6
##  [5,]    7    2    9    5    6    4    1    3    8
##  [6,]    1    3    6    7    9    8    2    4    5
##  [7,]    3    7    2    6    8    9    5    1    4
##  [8,]    8    1    4    2    5    3    7    6    9
##  [9,]    6    9    5    4    1    7    3    8    2

It does! Qap’la!

I now have two functions that solves sudokus. solve(), that bruteforces it and will always get a solution. And
repsinglevalues() that uses logic, gets a solution sometimes, but not always.

I also have a dataframe with 50 sudokus that I need to solve.

And what was the task again? Take the three first digits in the first row. Concatenate them to a single three digit number. Do that for all 50 puzzles. Add them all up. That should be the answer.

answer <- 0
for(i in 1:50){
  mat <- matrix(as.numeric(unlist(strsplit(prob_df[i,2],""))),9,9,byrow=TRUE)  
  mat <- repsinglevalues(mat)
  mat <- solve(mat)
  answer <<- answer + as.numeric(paste(mat[1,1:3],collapse=""))
}
  • Set a variable answer to zero.
  • Pick out the sudokus one by one.
  • Run the logic-solving function on it.
  • Then run the brute-force function on the result of that.
  • Pick out the three first digits in row 1, collapse them, cast as numeric, and add to the answer-variable.

Done!

Lessons learned

  1. Errors in the logic are a bit difficult to locate. I forgot to take into account that the list of allowed values in the brute-force solution does not include the value that is already in the cell. That made the function run for eternity. Or something pretty close. Had I not made that mistake, I would finished this problem a day earlier.
  2. Technically the lesson has not been learned yet. But below there is a note ot speed. Adding just one extra value to the puzzle reduces the time it takes to bruteforce the solution significantly.
  3. Set-functions, here specifically setdiff(), are neat!

Notes on speed

OK. I needed a function for returning a vector for getting the relevant frame in the matrix. As I mentioned, my colleague Henrik thought it should be done wit integer division rather than if-statements.

This was the way I originally did it:

ifgetint <- function(x){
if(x %in% c(1:3)){
  res <- c(1:3)
} else if(x %in% c(4:6)){
  res <- c(4:6)
  } else {  
      res <- c(7:9)
  }
  return(res)
}

But what way is faster? Lets find out:

library(microbenchmark)
mbm <- microbenchmark(ifgetint, getint, times=1000)
mbm
## Unit: nanoseconds
##      expr min lq   mean median uq   max neval cld
##  ifgetint  50 55 60.824     56 58  3906  1000   a
##    getint  52 57 78.630     59 60 17763  1000   a

Actually my original way of doing it was a little bit faster.

library(ggplot2)
autoplot(mbm)
plot of chunk unnamed-chunk-23

So. I found two methods for solving sudokus. It would be interesting to compare them. I can’t do that on just any sudoku. I need to use one that can be solved with both methods. Luckily the first problem in the set can do just that.

mat <- matrix(as.numeric(unlist(strsplit(prob_df[1,2],""))),9,9,byrow=TRUE)  
solve(mat)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
##  [1,]    4    8    3    9    2    1    6    5    7
##  [2,]    9    6    7    3    4    5    8    2    1
##  [3,]    2    5    1    8    7    6    4    9    3
##  [4,]    5    4    8    1    3    2    9    7    6
##  [5,]    7    2    9    5    6    4    1    3    8
##  [6,]    1    3    6    7    9    8    2    4    5
##  [7,]    3    7    2    6    8    9    5    1    4
##  [8,]    8    1    4    2    5    3    7    6    9
##  [9,]    6    9    5    4    1    7    3    8    2
repsinglevalues(mat)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
##  [1,]    4    8    3    9    2    1    6    5    7
##  [2,]    9    6    7    3    4    5    8    2    1
##  [3,]    2    5    1    8    7    6    4    9    3
##  [4,]    5    4    8    1    3    2    9    7    6
##  [5,]    7    2    9    5    6    4    1    3    8
##  [6,]    1    3    6    7    9    8    2    4    5
##  [7,]    3    7    2    6    8    9    5    1    4
##  [8,]    8    1    4    2    5    3    7    6    9
##  [9,]    6    9    5    4    1    7    3    8    2
mbm <- microbenchmark(solve(mat),repsinglevalues(mat), times=100)
mbm
## Unit: milliseconds
##                  expr       min        lq      mean    median        uq
##            solve(mat) 47.681675 52.437511 55.053319 53.852274 54.621338
##  repsinglevalues(mat)  6.637329  7.001639  7.631094  7.157844  7.393044
##        max neval cld
##  171.67914   100   b
##   11.03237   100  a

No surprise there. The logical method is much faster. Something like 8 times faster.

autoplot(mbm)
plot of chunk unnamed-chunk-25

How much of a difference does it make when part of the sudoku has been solved already? By inspection (ie, trial and error) I find sudoku number 11. The original puzzle has 53 empty cells. By filling out what can be filled out using logic, that is reduced to 52 empty cells.

How much faster is it to solve sudoku number 11, when an additional cell has been filled out using logic?
mat is the original sudoku. redmat is the sudoku partially solved by logic.

i <- 11
mat <- matrix(as.numeric(unlist(strsplit(prob_df[i,2],""))),9,9,byrow=TRUE)  
redmat <- repsinglevalues(mat)

mbm <- microbenchmark(solve(mat),solve(redmat), times=10)
mbm
## Unit: milliseconds
##           expr      min       lq     mean   median       uq      max neval
##     solve(mat) 767.3325 777.2788 794.4597 781.4264 791.7815 911.4314    10
##  solve(redmat) 544.3811 546.2505 574.9207 549.2904 553.6324 690.5734    10
##  cld
##    b
##   a

Wow! That really makes a difference!! Filling in just a single extra value using logic cuts about one third of the time it takes to solve the sudoku.

autoplot(mbm)
plot of chunk unnamed-chunk-27