Advent Of Code 2023

on a Commodore 64

Jukka Jylรคnki

Last year I participated in Advent of Code for the first time, using the second ever computer we had, a 16MHz MikroMikko 386SX with Borland Turbo C++ 3.0.

At first I thought it was a bog crazy endeavour. But still, quite a lot of fun!

Afterwards, it got me wondering, how it would be like to do the same using the first ever computer we had - a Commodore 64 that my mom bought in 1989, when I was just four years old.

Well, only one way to find out.

Table of Contents


0. Preparation
1. Day 1: Trebuchet?!
2. Day 2: Cube Conundrum
3. Day 3: Gear Ratios
4. Day 4: Scratchcards
5. Day 5: If You Give A Seed A Fertilizer
a#. Interlude: Typing on the C64 Keyboard
6. Day 6: Wait For It
7. Day 7: Camel Cards
8. Day 8: Haunted Wasteland
9. Day 9: Mirage Maintenance
10. Day 10: Pipe Maze
b#. Interlude: My Original Commodore 64 Games
11. Day 11: Cosmic Expansion
12. Day 12: Hot Springs
13. Day 13: Point of Incidence
14. Day 14: Parabolic Reflector Dish
15. Day 15: Lens Library
c#. Interlude: My Unoriginal Commodore 64 Games
16. Day 16: The Floor Will Be Lava
17. Day 17: Clumsy Crucible
18. Day 18: Lavaduct Lagoon
19. Day 19: Aplenty
20. Day 20: Pulse Propagation
d#. Interlude: Three Scary Commodore 64 Games
21. Day 21: Step Counter
22. Day 22: Sand Slabs
23. Day 23: A Long Walk
24. Day 24: Never Tell Me The Odds
25. Day 25: Snowverload
e#. Total Score
f#. Runtime Benchmarks
g#. Conclusion
h#. Credits

Preparation

When thinking about Commodore 64 development, traditionally one can use one of the two languages: BASIC or 6502 assembly.

However, I find that neither of those would be ideal for Advent of Code, since they both lack (or I lack?) the needed productivity to solve the problems with these languages after a few days progress. BASIC would be more high level, but I doubt it would have enough performance. Assembly on the other hand would be super fast, but I'd just get bogged down by the complexity later on. I would like to have something that is more performant than BASIC *and* more productive than assembly.

As luck would have it, some expert developers have undertaken a port of LLVM/Clang on to target the 6502 CPU some time ago: check out the llvm-mos project.

When I first saw that project this year, it felt like all the (50) stars had aligned. I just knew that it would be what I would want to use.

However, the typical modern C64 development workflow is to write the code on a PC, and either run the compiled program (a ".PRG" file) on an emulator like VICE, or cross-deploy it to a real C64 to execute.

While that workflow is super efficient, in the Christmas spirit for this kind of journey like AoC, I really want to feel like I am operating the actual device, rather than be cross-developing from a PC.

I want to feel what it is like to have a text editor run on the C64 directly, type code on the C64 keyboard, and see what it feels like to use it on an original CRT. (Note though that I am not any kind of a CRT purist in general)

So in early November this year, I started to prepare for the event a little bit in advance to actually build up a workflow to make this possible. The missing pieces that I found that I needed to develop are:

It took about a month of development, but with the help of the Lemon64 community and the llvm-mos and Mega65 Discord channels I was able to finish the above projects in time for the start of December. \o/

In my development setup, I use a custom-built Advent of Code client program and a RR-Net 3 cartridge to visit the Advent of Code website.

The client program allows me to read the problem descriptions and save the problem inputs either to tape or disk.

Because the adventofcode.com website uses HTTPS, and getting OpenSSL on the C64 would be asking a bit too much, I wrote a proxy server on the PC that provides an API-based access to Advent of Code server to the C64. This is built on top of the scarvalhojr/aoc-cli project. Thanks scarvalhojr for such a useful CLI tool!

And then I use my own C++64 "IDE" program to develop the code to solve the puzzles. The written code is streamed over to a PC, which compiles the written code with llvm-mos, and the compiled .PRG result (or the programming error diagnostics) are finally sent back over to the C64.

The result is an oddly native C/C++ development experience and with the help of the Advent of Code client program, I can attempt to solve AoC puzzles without leaving the C64! I can even submit puzzle answers directly from the C64, how cool is that :)

Day 1

Well, this is finally it. Both hardware and software are ready for the big start day and I've got my streaming setup prepared as well.

I am thinking that I'll stream the experience like I did last year. Unfortunately I learned that Twitch has started to delete old videos at some point, so the 2022 videos from last year got lost. This year I'll need to be careful to download the videos into an archive for local storage.

The Advent of Code client is working out well in action, and I can read the puzzle descriptions directly on the 1084S monitor screen on the C64. What a cool experience. 40 columns text brings shivers :D.


"Let's prepare the trebuchet" - Reddit u/ArnaudValensi

This year with the power of AI image generation, we have gotten to enjoy many really creative AI prompts that visualize the Advent of Code journey nicely. I will showcase my favorites here, like the one above.

From day 1 we are getting incompetent elves in action again. Not a surprise. The Elves are going to trebuchet us into the sky to solve why there is no snow being produced.

This gives me flashbacks to when I was a kid, my parents would tease me that santa wouldn't come if there was no snow, and I would get freaked out if the temperature would turn above zero when it got close to the 24th, so that snow would start to melt. Oh dear, so better help the Elves.

Back in the day, we never had a disk drive for the C64, only a Datassette that was a 3rd party clone model from manufacturer "Phonemark". That is the very same unit and C64 that I am using for this challenge - my mom saved the C64 in the attic all these years.

That Phonemark Datassette is a bit special in that it has this "Load-It!" mechanism that shows a bar of LED lights to indicate the signal quality. As a kid, I thought all C64 Datassettes had these lights, but only much later learned that was not the case.

So for Day 1, I am solving the challenge using the Datassette.

While doing so, I ended up authoring a C tape that has all the problem inputs (.TXT files), the program code (.CPP files) and the compiled output programs (.PRG files) on the tape.

Something that was surprising is how much of the 60 minute long tape was taken up just by the problem input. (the file 2023.IN1.TXT, 22,043 bytes, is almost half of side A alone!)

Typing code on the 1084S monitor turns out to be a super fun experience. The above image shows a helper function I wrote to solve the second part of the problem.

Also something I find out is that when using a Datassette, every single coding mistake really burns time. To test out my code, I have to

  1. load the C++64 editor
  2. write the code and save the PRG on tape,
  3. reset C64, load the compiled program from tape,
  4. run it, wait for it to load the problem input from tape,
  5. reset the C64 again, wait for the AOC program to load from tape, then submit the solution (to discover it was wrong),
  6. reset C64, and go back to step 1 to try again.

In the middle of all of this, a good deal of switching tapes and rewinding and fast-forwarding. This takes some patience!

Time to solve: 3 hours 40 minutes! This Datassette setup really is not that productive!

To take a step back and think, the Datassette itself might not necessarily be to blame for all of the slowdown, but part of the problem is that I was not able to develop support for shared residency with my compiled puzzle .PRG and the C++64 IDE, i.e. the two cannot be present at the same time in memory.

The main reason for that is that my C++64 IDE is just too big (about 44KB of RAM), so the compiled PRG wouldn't be able to fit side by side with the IDE. (also, llvm-mos does not support this kind of side-by-side deployment)

Nevertheless, I persisted, and at the end did solve the puzzle, even though it did take some serious patience.

Mistakes made:

Day 1 Part 2 Code Listing

#include                       // llvm-mos-sdk does not yet have a full
#include                      // libc or stdlib++ implementation, but
#include                      // many of the functions already exist.
#include                       // I wrote some more for this project.

int8_t find_digit(const char *s) {      // Given a ptr to middle of a string, 
  switch(*s) {                          // that is either a digit 0-9 or a word
  case 'z':                             // "zero" - "nine", returns the integer
    return strncmp(s, "zero", 4)? -1:0; // value, or -1 if string represented
  case 'o':                             // none of that.
    return strncmp(s, "one", 3) ? -1:1;
  case 't':
    if (!strncmp(s, "two",3)) return 2;
    return strncmp(s, "three", 5)?-1:3;
  case 'f':
    if (!strncmp(s,"four",4)) return 4;
    return strncmp(s, "five", 4) ?-1:5;
  case 's':
    if (!strncmp(s, "six",3)) return 6;
    return strncmp(s, "seven", 5)?-1:7;
  case 'e':
    return strncmp(s, "eight", 5)?-1:8;
  case 'n':
    return strncmp(s, "nine", 4) ?-1:9;
  default:
    return isdigit(*s) ? *s - '0' : -1;
  }
}

int main() {
  FILE *f = fopen("in1.txt", "r");
  char line[80];
  uint32_t result = 0;
  while(fgets(line, sizeof(line), f)) { // Read input line by line
    int8_t f, l, d;
    char *s = line;
    for(; *s; ++s)
      if ((f = l = find_digit(s)) >= 0) // Scan forward until the first digit
        break;                          // on line, initialize first & last

    for(; *s; ++s)                      // Continue scanning to find the last
      if ((d = find_digit(s)) >= 0)     // digit
        l = d;

    result += f*10 + l;                 // Accumulate final result
  }
  fclose(f);
  printf("Calibration: %lu\n", result);
}

I am uploading the video recording from each day to YouTube. Be warned though, the quality is not really that great. This is more to have a historical recording, rather than "for content". In Day 1 video I had some technical difficulties for the first ~30 minutes. Also, loud noises warning. Only from Day 4 onwards I understood to apply an audio dynamic compression filter to help normalize the audio levels.

Day 2

For today, I really want to start improving the productivity of my setup, otherwise it would be really easy to get fatigued at AoC as days go by. This means that it is *bye bye* to the Datassette. It was a fun thing to try, but not really something that I have the patience to persist. So it was a "proof of concept": it works, now move on.

Instead, say hello to the 1541-II.

I actually have never owned a disk drive for the C64 before, so it is pretty cool to finally have one.

Using a disk drive will speed up puzzle input file and program loading, but it actually enables something much more important: With a disk drive, I can hotswap between the compiled program and the C++64 IDE without needing to reboot and reload the IDE.

The reason for that is that with a disk drive I can utilize a Turbo Chameleon 2 cartridge that I have. I couldn't use this before, because TC2 is incompatible with Datassettes.

Turbo Chameleon 2 is a multifunction cartridge that has a lot of features, but I actually only need the following two:

  1. TC 2 can be used together with the RR-Net 3 cartridge, since it has a built-in port where RR-Net 3 can slot into, and
  2. TC 2 can emulate a Commodore 64 RAM Expansion Unit.
With a REU, I can hotswap between compiled code and the C++64 editor by utilizing the RAM Expansion Unit. This massively improves iteration speeds on the code, as it enables a practically interactive swapping capability between editing code and executing it. That is something that I didn't even have on the MikroMikko 386SX last year!

I got the idea for implementing this from this 8-bit Show and Tell video.


"The Elf would like to play a game" - Reddit u/MarkusGamers

In the puzzle problem for today, we are playing a silly "show cubes from a bag" game. In this game, we see sequences of red, green or blue cubes, and want to calculate the maximum count of each that were seen.

The problem itself is pretty easy, but here I found out that I don't actually have an implementation of a C runtime sscanf() function at all! (I would only afterwards realize that the problem would be easy to parse with fscanf() as well, at first it felt a bit awkward so)

So instead, I opted to write some janky manual string scanning and atoi()/strstr() based parsing. I mean, at least it is fast in a way that C64 would like, so execution performance was good.

First part went through smooth without issues, but in the second part I got the answer wrong two times.

Mistakes made:

Afterwards, I found out I could condense the solution for part B to fit in a single screenful, as seen below.

At this point you may get the feeling that I am not going for any kind of "mature" or "robust" or "good" or "clean" code result, rather, I find it cute if it is possible to condense the result into a single image like that. 40x25 displays FTW!

Time to solve: 2h 17min.

Day 2 Part 2 Code Listing

#include                        // libc.h is not a standard header file,
                                        // but I got bored to typing these, so
int main()                              // created a custom amalgamation file.
{
  FILE *f = fopen("in2.txt", "r");
  u32 sum = 0;                          // Likewise, u32 is not a standard type,
  int id = 0;                           // but my own extended typedef in libc.h
  while(fscanf(f, "Game %d:", &id) > 0)
  {
    int r = 0, g = 0, b = 0, n;
    char clr[8];
    while(fscanf(f,"%d %s",&n,clr)==2)  // The ',' and ';' chars get read into
      if      (*clr=='r') r = MAX(r,n); // the string clr, only partially parse
      else if (*clr=='g') g = MAX(g,n); // the string and interpret from the
      else if (*clr=='b') b = MAX(b,n); // first char which color the cube is
    sum += r*g*b;
  }
  fclose(f);
  printf("Sum of powers: %lu\n", sum);
}

Day 3

Today we are going up a mountain in a gondola. The problem at hand is to find digits that neighbor certain symbols in a 2D map.


Reddit u/EnvironmentalCow2662

The solution for today's problem still allows a one-pass "online" algorithm that streams through the problem input by scanning it through only once, without loading the whole input file in memory at once.

By maintaining a rolling window of three lines of input (e.g. "y-1", "y=current", "y+1"), it is possible to avoid loading the whole 2D map in memory at the same time, so that memory usage stays small even for a C64. This way only less than 512B of work RAM is needed - even a VIC-20 could do it!

PETSCII vs ASCII encoding issues came back to bite. After I had fixed the issue from Day 1 where my AoC downloader would inconsistently convert ASCII to PETSCII in one code path but not in another, I now got to realize that it cascaded into a regression in my C standard library fgets() implementation, which I had originally written to operate on ASCII files only (so \n newlines). In Commodore 64 convention, newlines are marked with the \r character.

Figuring that regression out took some half an hour of debugging. My program code for the puzzle itself didn't have any issues after all!

Apart from that snafu, I found that the productivity in my workflow is starting to become pretty good. Today I improved it even further by adopting a "fast loader" cartridge.

From back in the day, I have owned this Action Replay Mark 6 cartridge. I remember that I would have bought it from Konemyynti Pekka Saari OY (who are still in operation today!) quite late of my C64 use period, maybe in 1991 or even 1992, for some 10 or 20 Finnish Marks. Someone must have traded in their old hardware there when upgrading their computer.

(looks like there is maybe(?) supposed to have existed a 5,25" floppy disk that accompanied the cartridge, but looks like it didn't come with my second hand purchase)

I would basically just barely know enough how to use it to scan for changes in C64 memory locations to create infinite lives cheat codes in games, even though it is capable of much more.

The instructions manual is in Finnish, and says:

"Congratulations. You have acquired the most sophisticated utility module for the C64 and the C128 in C64 mode. Action Replay is not only the world's best copying module, but also the fastest loader module for disk drives and a versatile utility module, as you will come to find out."

That sounds just perfect, a faster disk drive is exactly what I need to improve my Advent of Code productivity.

Unfortunately I don't have room in the C64 to use this cartridge - but fortunately, the Turbo Chameleon 2 that I am using can emulate the features of this module right out of the box. So I set up my TC2 to emulate a "Retro Replay" cartridge, which is a revised clone of this Action Replay module with many bugfixes and improvements (such as REU compatibility, which is just perfect).

Loading up C++64 and AOC programs is now much faster than it was yesterday!

In the second part, I seem to have pulled off the every programmer's dream: the compiled code worked on the first try without any issues. A chat viewer aptly called that a "hole-in-one". Let's hope there will be many more of those on later days :)

Mistakes made:

Time to solve: 2h 01min.

Day 3 Part 2 Code Listing

#include 

char *READ(char *d, int max, FILE *f) { // Create a "sentineled" read function
 static u8 eof = 0;                     // which reads from a file, but returns
 if (!fgets(d, max, f)) {               // an extra empty line after the end of
  if (eof++) return 0;                  // the last line (on the C64 this was
  memset(d, 0, max);                    // easier as I do not have a text editor
 }                                      // program there to modify the input
 return d;                              // files)
}

int NUM(const char *s) {                // Given a ptr to middle of a number,
 while(isdigit(s[-1])) --s;             // winds the pointer back and converts
 return atoi(s);                        // it to an integer.
}

int main() {
 FILE *f = fopen("in3.txt", "r");       // Use three line rolling window,
 char *C = (char*)calloc(1, 3*256)+1,   // C:current, P:previous, N:next
      *N = C + 256, *P = N + 256;       // +1 after calloc to create a safe
 u32 sum = 0;                           // sentinel at start of each line.
 READ(C, 255, f);                       // Read first line in advance
 while(READ(N, 255, f)) {               // Loop each input line (y-coord)
  for(u8 x = 0; C[x]; ++x) {            // Loop each input char (x-coord)
   if (C[x] == '*') {                   // If we have a cog at (x,y)
    int d[6];
    u8 n = 0;
#define DIG isdigit                     // Condense isdigit() to fit in 40 cols
    if (DIG(C[x-1])) d[n++]=NUM(C+x-1); // Find number to left of cog
    if (DIG(C[x+1])) d[n++]=NUM(C+x+1); // .. and right of cog
    if (DIG(P[x]))   d[n++]=NUM(P+x);   // Is above of cog just a single num?
    else {                              // If not,
     if (DIG(P[x-1]))d[n++]=NUM(P+x-1); // check and parse two separate nums
     if (DIG(P[x+1]))d[n++]=NUM(P+x+1); // (top-left and top-right)
    }
    if (DIG(N[x]))   d[n++]=NUM(N+x);   // A single number below the cog?
    else {                              // If not,
     if (DIG(N[x-1]))d[n++]=NUM(N+x-1); // check bottom-left and bottom-right
     if (DIG(N[x+1]))d[n++]=NUM(N+x+1); // for possibly two separate numbers
    }
    if (n == 2) sum += (u32)d[0]*d[1];  // Sum cogs with exactly two neighbors
   }
  }
  char *tmp = P; P = C; C = N; N = tmp; // Advance three line rolling window,
 }                                      // Prev <- Current <- Next
 fclose(f);
 printf("Sum of ratios: %lu\n", sum);
}

Day 4

All the problems so far have been solvable with a single linear pass through the input data. This type of algorithm structure is nice and easy for the C64, since it keeps memory usage low. As long as this structure holds, C64 will be able to solve any problem that comes our way (since AoC problem inputs seem to be at most some 20-30KB in size)

Fortunately, today will be no different.


"Scratchcards" - Reddit u/dimpopo

Some elf seems to have an addiction to scratch cards, so the task is to calculate their scratch card winnings. (basically match winning lottery numbers to a played line to see how many matches come up)

The first part is nicely solved by building a set data structure using an array for buckets. Most of the runtime seems to be dominated by the slow disk I/O access that Commodore 64 has. I would like to benchmark "PC vs C64" times against each other later on, and I am pondering a bit to cache the disk I/O out of the equation... we'll see.

The second part is a little bit more complex, but fortune was on my side, and the problem structure for both parts clicked directly. With only minor typos/shortlived code mistakes, the problem was quickly solved, and in both cases, working code for the example cases directly translated to the right answer to the puzzle input.

It took about 2 minutes in the second part for the C64 to churn out the answer. Later running the same code on my AMD Ryzen 5950X PC, it would take about 0,00075 seconds, i.e. ~160,000x faster!

Time to solve: 1 hours 00 minutes exact, the fastest so far.

Mistakes made:

When reading the subreddit afterwards, it looks like some people tripped themselves by going down implementing an exponential algorithm via recursion. Fortunately, like is the case with e.g. the Fibonacci sequence, no recursion is necessary, and the problem can still be solved via a one pass linear scan through the input data.

I wrote a couple of subreddit comments to try to explain the problem structure and how to solve part 2 with a linear scan. ([1], [2])

Day 4 Part 2 Code Listing

#include 
u24 cards[256];                          // Tallies # of cards of each type won

int main() {
  FILE *f = fopen("in4.txt", "r");
  int id, n;
  while(fscanf(f, "Card %d:", &id)>0) {
    ++cards[id];                         // Store first card of this type won
    bool nums[100] = {};                 
    while(fscanf(f, "%d", &n) > 0)       // Create a bucket set of winning
      nums[n] = true;                    // numbers
    u8 dummy, wins = 0;                  // wins: number of hits on this card
    fscanf(f, "%c", &dummy);             // Consume the dummy '|' char
    while(fscanf(f, "%d", &n) > 0)       // Read our numbers
      if (nums[n])                       // Did our number hit? If so,
        cards[id + ++wins] += cards[id]; // propagate winnings to later cards
  }
  fclose(f);
  u32 total = 0;
  for(u32 c : cards)
    total += c;                          // Sum up the total number of cards
  printf("TOTAL: %lu\n", total);
}

Day 5

Today an absolute disaster struck!

Not necessarily due to the advent calendar problem (which definitely did go up a level), but today really showed how brittle my C64 code editor actually is. Crashes. Crashes here and crashes there. Left and right.

These types of rough edges definitely slow down my solution times, and I expect they will become a major showstopper already in the next few days if I am unable to find the root causes quickly. Yikes.

Fortunately tomorrow is a national holiday (Finnish Independence Day), and I have the rest of the week off for a long weekend, so I will hopefully have time to look into tackling this.


"If You Give A Seed A Fertilizer" - Reddit u/dimpopo

Today, we meet the island gardener, who has a complicated seed->soil->fertilizer->water->light->temperature->humidity->location mapping scheme. Basically, seven steps of piecewise linear maps of u32->u32 range. The first part asks to compute the minimum output value through this series of functions, given a set of input values.

I decided to build a binary evaluation tree for each mapping, so seven evaluation trees in total. To map a seed value to its output, the binary tree is walked from root to leaf to find which output value it generates.

This wasn't too hard to do, and modulo some small issues, my code was up and running and solved the first part well. That was a relief, since I did wonder a bit whether the problem input might have been crafted adversarially, so that naively building a nonbalancing binary tree would have led to an unbalanced tree (a near flat linear list). But fortunately that was not the case and there was no need to worry about rebalancing the tree.

In the second part, the problem is complicated by asking us to instead map long integer ranges of numbers through these seven piecewise linear functions, instead of just mapping individual numbers.

This kind of mapping is a little bit more involved compared to mapping just individual values. Though it quickly felt like it was possible to retain the same binary evaluation tree structure, and split the input ranges into possibly three parts at each step.

To make this work, I really needed a vector container, but as llvm-mos-sdk does not have a libstd++ implementation yet, there is no std::vector available that I could utilize.

I started to implement one, but instead got an editor crashfest, and had to pause for a call and dinner. Afterwards, I decided to speed up the brittle development experience, and copied a custom minimal vec type I have written before into my standard library implementation, so I could focus on the problem.

While implementing the binary evaluation tree approach for part two, I was worried that it would cause an exponential blow-up of memory usage, as every input number range could potentially split into three output ranges (left/middle/right parts), and run out of memory on the C64.

After finishing the code, I was positively surprised to realize that such exponential behavior was not at all possible (since the ranges are disjoint after all), and the whole program fit in the Commodore 64's 64KB of RAM, and ran to completion in just 26.1 seconds. 20.5 seconds was for initial disk loading, and 5.6s was for computing the seed range mappings through the seven binary evaluation trees.

Not gonna lie, after such a fast 5.6 seconds runtime, browsing all the long CPU runtime memes on the subreddit afterwards does feel pretty giddy :) Good job C64, you did me a solid.

Time to solve: 3 hours 04 minutes. Probably close to an hour lost due to editor crashes. :(

Mistakes made:

Day 5 Part 2 Code Listing

#include 
#include "vec.h"                        // I don't have STL, use custom vector.

struct sr { u32 lo, hi; };              // A seed range is half-open [lo, hi[
struct node {                           // Node of binary evaluation tree
  u32 lo, hi, ofs;                      // [lo, hi[ also half-open, ofs: the
  node *l, *r;                          // offset/delta that this map node adds
  vec seeds;                        // l,r: left/right child nodes
};                                      // seeds: an array of seed ranges that
                                        // are waiting to be mapped through
void insert(node *n, node *p) {         // Insert node n into BST evaluation
  for(;;)                               // tree, under parent/root p.
   if (n->hi <= p->lo) {                // Node n should walk to left of p?
     if (!p->l) { p->l = n; return; }   // If no left child, we are it.Otherwise
     p = p->l;                          // traverse deeper into BST.
   } else {                             // We weren't left child, so must belong
     if (!p->r) { p->r = n; return; }   // in right subtree. Walk down that path
     p = p->r;                          // or insert as child.
   }
}

node *read_nodes(FILE *f) {
  char dummy[40];
  fgets(dummy, sizeof(dummy), f);       // Consume dummy "seed-to-soil" lines
  u32 dst, src, len;
  node *root = 0;
  while(fscanf(f, "%lu %lu %lu",        // Read all piecewise linear maps in
               &dst, &src, &len) >= 3){ // this stage
    node *n=(node*)calloc(1,
                          sizeof(node));// Create a new expression evaluation
    n->lo = src;                        // node to represent this node
    n->hi = src + len;
    n->ofs = dst - src;
    if (root) insert(n, root);          // And insert it to the binary eval tree
    else root = n;
  }
  return root;
}
                                        // Maps all pending seed ranges from src
void map(node *src, node *dst) {        // through its children, and finally
  for(sr &i : src->seeds) {             // into the next stage 'dst'.
    if (i.lo < src->lo) {               // If some seeds in this range should
      node *d = src->l ? src->l : dst;  // walk down the left subtree?
      d->seeds.push(                    // Add pending seeds to go down left
        { i.lo, MIN(i.hi, src->lo) });
      i.lo = src->lo;                   // And clip those seeds off of this sr.
    }
    if (i.hi > src->hi) {               // If some seeds in this range should
      node *d = src->r ? src->r : dst;  // walk down the right subtree?
      d->seeds.push(                    // Split this range in two and add the
        { MAX(i.lo, src->hi), i.hi });  // pending seeds to walk the right tree.
      i.hi = src->hi;                   // Clip the seeds off of this seedrange.
    }
    if (i.lo < i.hi)                    // Did any seeds map through this node?
      dst->seeds.push({i.lo+src->ofs,   // If so, translate these seeds and
                       i.hi+src->ofs });// directly send them out to next stage.
  }
  src->seeds.free();                    // Free mem from this stage.
  if (src->l) map(src->l, dst);         // And process onwards to all seeds in
  if (src->r) map(src->r, dst);         // the child ranges.
}

int main() {
  FILE *f = fopen("in5.txt", "r");
  fscanf(f, "seeds:");                  // Skip initial seeds word.
  vec seeds;
  u32 lo, hi;
  while(fscanf(f,"%lu %lu",&lo,&hi)>1)  // Read input seed ranges to vector
    seeds.push({lo, lo+hi});

  node *maps[8];
  for(int i = 0; i < 7; ++i)            // Build binary evaluation trees for all
    maps[i] = read_nodes(f);            // seven seed mapping layers.
  fclose(f);

  maps[0]->seeds = seeds;               // First layer gets all seeds as input
  maps[7]=(node*)calloc(1,sizeof(node));// Add one extra layer to store final
                                        // outputs.
  for(int m = 0; m < 7; ++m)            // Do the actual processing: map all
    map(maps[m], maps[m+1]);            // seed ranges through each seed stage.

  u32 min = -1;
  for(sr &s : maps[7]->seeds)           // Finally from all the seed ranges that
    min = MIN(min, s.lo);               // made it, find the smallest seed #.
  printf("Smallest: %lu\n", min);
}

Interlude: Typing on the C64 Keyboard

I have now had a good number of days to practice writing code on the Commodore 64 keyboard. A big part of this "challenge", or journey, or experience, whatever you might call it (more than one people have referred to it as an act of masochism), was to focus on doing as much as possible hands-on on the Commodore 64.

None of this comes from a perspective that "emulators are bad" or anything like it, but rather, a big part of reliving the nostalgia for me is to physically feel how operating the machine directly used to be like.

Feeling how the cassette tapes rattle a bit when you handle them around, or how 5,25" floppy disks bend slightly under your thumb when you grab them. And the exact feeling of typing on the C64 keyboard...

This keyboard was the main motivation why I specifically wanted to develop a code editor for the C64. The C64 never had a good "on-device" C/C++ development capability, and what really charmed me was to think what it would be like if one did exist.

With this llvm-mos and RR-Net setup, I am able to in a way walk into this fantasy revisionist history to experience what developing C/C++ on the C64 would have been like, had it been powerful enough to carry a proper native toolchain. (Power C really was not it :)) This setup is a splendid mix of the current day and the history.

The Keyboard

The C64 keys are quite enjoyable to type on. They are non-tactile/non-clicky, with smooth linear-ish action that does get slightly harder to press towards the end of the range. The movement range is nice, maybe slightly longer than what I think modern keyboards generally have. In the 80s era, the C64 really shined with this keyboard over the 8-bit home micro competition.

The shape of the keycaps is pleasing. They are slightly concavely curved in all four directions, not just to the left and the right that many modern keyboards have. There is an oddly satisfying feel to the larger Return key, the Restore key and the function keys, which can be nicely hammered with two fingers (or three in the case of Return) to make a statement of your intended action. F1 - *thump* - "Yes! Copy that block of text!"

However, the keyboard is not all perfect and could never outbeat a Model M.. there are a couple of elephants in the room to talk about.

The first one are the arrow keys.. what on earth are these!

(If you are a C64 veteran, bear with me here..)

The way the C64 arrows work is that there are only two arrow keys, and they move down and right. To move up or left, one needs to hold down the Shift modifier key and then press the corresponding arrow key.

I remember from ages ago that this scheme felt really hard to use, and now in modern days, it is hard to wrap one's head around how it was even possible for anyone to even approve this kind of design into production. In our modern hindsight, our four arrow key formation seems so obviously superior.

These shifted arrow keys were one of the biggest reasons that I initially hesitated to start the C64 text editor project. I thought to myself, "yeah, I bet I'd go crazy after a couple of hours of just trying to move the cursor".

Then in some time around October, my feeling kind of turned into aligning with taking it as 'a part of the experience' thing.

Now, after more than a week of typing code on the C64 keyboard for Advent of Code, it is odd to realize that this Shift-Down-Right key scheme has become almost a second nature. It is really not that hard at all (for text editing at least, that is).

The thing I anticipated to likely be the first dealbreaker was actually not that big of a deal at all. How fascinating.

Looking at the other side of the keyboard, we can see that there are the same Ctrl and Shift keys as modern keyboards do. It's just that the Ctrl key is located on top row in place of the modern tab key (and there is no tab at all).

By default The Ctrl and Shift keys operate like modern modifier keys in that they don't provide any action on their own if read using the C64 KERNAL (the BIOS of the C64). There is a third modifier key, the Commodore key at the very bottom corner, which could be paralleled with the Windows or the Apple Cmd keys of modern times.

On the right side of the keyboard, there is a nice four button function key row. I really <3 a vertical function button row, it is so much more usable than a horizontal function row at the top (Yes, I may have more than one IBM Model F lying around..). I have been occassionally eyeing these modern "build your DIY keyboard" sites to see if it might be possible to create a custom layouted keyboard like that, though haven't found a good one yet.

There is a peculiar RESTORE key also on the right hand side. This key is not actually handled using the regular keyboard scanning mechanisms on the C64. Instead it is hooked directly into generating a hardware interrupt. This is actually super useful, and I repurposed this key in my editor to return back to the editor when running a compiled program or the AoC client. This way I can switch between the editor and the compiled program easily.

Keyboard Shortcuts

While designing the keyboard shortcut functionality for my code editor, I was presented with a dilemma. For example, I wanted to have a keyboard shortcut for "Select all text", which typically is Ctrl+A in modern text editors.

On the C64,

In the end, I decided to follow syntactical memory as much as possible. Here are some of the common keyboard shortcuts that my editor supports:


-: Open File
-: Save File
--: Save File As...
-: Close File
-: Quit to BASIC
-: Open New File

Since those aren't accessed that frequently, it does not matter that the Ctrl key is in an odd place really.

After the arrow keys, the most navigation shortcuts are probably to move to the end/beginning of a line (Home and End keys on normal keyboards) and to move to the next/previous word (Ctrl-Left/Right arrow keys), and to move one page up or down. The C64 has only the Home key (which moves the cursor to top-left corner in BASIC), but none of the others. So getting these shortcuts required a little bit of creativity. Here is what I came up with:


-: Move to Previous Word
-: Move to Next Word
: Move to Line Start
-: Move to Line End
-: Toggle Between File Start & End
:Page 1/3rd Up
:Page 1/3rd Down
-:Full Page Up
-:Full Page Down

Commodore couldn't afford separate Backspace and Delete keys (just like Apple), so deleting text needed some adjustment as well:


: Backspace (del left)
-: Delete (del right)
-: Delete Current Line

For copy-paste, I did the Ctrl-C, Ctrl-V and Ctrl-X shortcuts, but somehow I realized that I didn't want to use them. Instead, I dedicated the F1, F2 and F3 keys for these:


: Enter/Exit Copy Text Mode
-:Cut Selected Text
: Paste Text

Finally, to manage the compilation and run subprograms, I added


-: Run Compiled Program
-: Save Compiled Program to Disk
-: Run Advent of Code Client

That's almost all the editor shortcuts, but there are a couple of important editor features that are missing:

What is peculiar is that those features haven't turned out to be that critical for my AoC editing, as long as I am mindful about them missing.

Character Set

When writing actual code, I use the Commodore 64 "Business PETSCII" (or Business CBM ASCII) character set, sometimes called the "shifted PETSCII", since the Shift-Commodore key is used to access this charset. The following characters exist out of the box:

The C64 does not support the modern ASCII character set at all, even though this PETSCII set does have many characters in the same places.

A notable difference is that the C64 uses the \r character (0x0D) as a newline, instead of the \n character (0x0A). This difference has been a source of multiple bugs during my AoC run so far.

Also notable is that the C64 PETSCII character set does not have the critical {, }, \ or ^ characters either. These four characters are very important for C/C++ programming.

Fortunately the character set on the C64 is programmable, so it is possible to redefine custom glyphs in the font. So I drew up my own font characters for these four missing symbols in order to use them in my editor. The resulting visual look can be seen above on a CRT.

In Closing

What surprised me throughout the first week of programming in my editor is that the set of keyboard shortcuts and the managing the cursor keys is actually quite surprisingly usable. Paired with interactive code editing experience and fast program launch and quit support, this setup has some merit at least.

The 40x25 columns display naturally provides big limitations, but it has been kind of fun to author 40 columns solutions like you've seen throughout this page. Also, like we know, Commodore did provide a path to evolve over to 80 columns in the C128. Maybe for the next AoC? ;)

Day 6

Happy Independence Day Finland! What would be a better way to celebrate today than by operating the Tasavallan Tietokone, "The Computer Of The Republic".

Yes, that is what the Commodore 64 was advertised as in a Finnish magazine campaign in the late 80s.

"Commodore 64 is the overwhelming number one in our market - a real computer of the republic.

Quality, reliability and features have been proven to be the trump cards of the Commodore in numerous tests. That is why tens of thousands of people at homes, in schools and in businesses have chosen Commodore as their own."

Then they go on to state four "laws" of Commodore and present a scientific graph that depicts the market statistics of how Commodore adoption has grown in Finland in the past years to surpass the 70,000 users mark.

"Be part of the rule in the era of the computer of the republic that goes on. And on."

Yes please! ๐Ÿคฉ

I started the morning by tending to some of the biggest problems from yesterday. Anticipating that the programs are going to get only bigger from now on, I added a memory saving feature to tackle the Out-Of-Memory errors in the editor: when the C64 runs out of memory, the editor will disable syntax color highlighting to free up memory that it uses. This should free up some 3-4KB of more RAM for more lines of text in the document itself.

Another improvement I made was to acquire a second disk drive, so that I wouldn't have to swap disks whenever reloading the editor and the program code. A few months ago I didn't own a single C64 floppy drive, now I have two!

The power light colors don't quite match, but maybe it is a good distinguishing feature. The red LED is not an error indication, just the way it was colored. I configured the drive with the green light to be the drive number #8, whereas the top drive with the red light shall be the drive number #9. Slick!


"Wait For It" - Reddit u/dimpopo

Onwards with today's puzzle. We are attending a boat race in order to win a trip over to Desert Island. The boats in the race are seemingly modeled according to those wind-up/pull-back toy car types, that they are "charged" at the start of the race (which costs some time), and then they spring on as fast as they were charged with.

The task is to calculate, given a fixed duration of a race, t (milli)seconds, how many different possible ways are there to beat a given record distance traveled, d (milli)meters.

I started by scribbling down the equation for the distance traveled as a function of the decision of how long one chooses to charge up the toy boat, say k (milli)seconds, where k is restricted to the integer domain.

It quickly turned out that this function is just a good old parabola (i.e. a second degree polynomial):

When asked for which values k this parabola reaches a result distance higher than a record limit distance d, i.e.

by applying the quadratic formula to find the zeros we get

so flooring and ceiling the result into the integer domain, the final(*) closed form answer is

(* Actually in the above formulation, there is a small technicality for when the discriminant is a square number and the numerator comes out to be even, since the asked domain is of form "strictly greater than" rather than a "greater or equal than". Later after finishing the puzzle, I realized that this technicality is solvable by changing the quadratic to form kt - k^2 >= d+1 instead)

This can be typed in to a calculator without the use of programming, but of course I want to make the Commodore 64 do it.

However there is a bit of a problem.. Calculating a square root involves handling floating point numbers, and C64 does not have those in hardware. Additionally, as of writing, llvm-mos-sdk does not have a soft-float emulation library either.

(While solving this puzzle, I was thinking that maybe the trick in part two would be that author would be asking to calculate this expression on such large numbers that the square root would round incorrectly due to lack of precision, so that people would have had to precisely examine the rounding part; but then I realize that it would have required at least 128-bit long input numbers to make that happen, making it somewhat unfair)

I ended up manually writing a pair of integer sqrt() functions that rounded up and down, depending on the desired direction.

These implementations were quite naive so I was worrying I would have to optimize that to do something faster to make the C64 cope. But when I ran it, the C64 still computed the result immediately in the blink of an eye, so all good there. (how could I even doubt the mighty C64's speed?)

The Day 6 meme factory served us the above. For me today felt a bit more like:

Time to solve: 1 hours 21 minutes.

Mistakes made:

Day 6 Part 2 Code Listing

#include 

u64 sqrt_ceil(u64 val) {                // Naively searches sqrt using integer
  for(u64 i = 1;; ++i)                  // arithmetic, rounding up the result.
    if (i*i >= val) return i;
}

int main()
{
  u64 t = 56717999;
  u64 d = 334113513502430ull;
  u64 sqrtdisc = sqrt_ceil(t*t-4*d-4);  // Calculate sqrt of the discriminant,
  u64 lo = (t - sqrtdisc) / 2;          // and conservatively estimate the low
  u64 hi = (t + sqrtdisc) / 2;          // and high ends of the winning priming
  if (lo * (t-lo) <= d) ++lo;           // times. Then fix up these guesses by
  if (hi * (t-hi) <= d) --hi;           // nudging them to the right answer.
  printf("Range: %llu\n", hi - lo + 1);
}

Day 7

Yesterday I realized that I spent quite a lot of time juggling between the Advent of Code client program and the editor. While working on a puzzle, I have to repeatedly check the example inputs or the puzzle description, and with just pen and paper for notes, it is easy to forget to make the right notes, which leads to juggling back and forth.

Doing that swapping takes quite a bit of time. If only the Commodore 64 had dual display support, so that I could have the editor on one screen, and the Advent of Code client on another. If only...

Well, I learned today that the C64 indeed can have two displays! The other display is just more advanced hi-fi tech, looks like maybe licensed on some kind of an Amazon Kindle patent, as it seemingly has a heritage from e-paper display technology. :)

The readability is just amazing, and vertical resolution feels nearly infinite. This will be a complete game changer!

Yeah, above yoy see a Commodore MPS-803, along with an tractor feed add-on to support continuous feed paper. This was an eBay find from Italy, which arrived just in time for the harder AoC puzzles.

I am shocked to realize that new ink ribbon cartridges are still being manufactured for this printer model (probably since there are several dozen matrix printer models that used this ink ribbon cartridge type), so I am now printing with brand new ink as well.

Getting messages from the elves will be easier than ever, plus I can make notes directly "on the screen". Take that LCD.

Overall, this Commodore 64 setup is getting pretty rad if I may say so.

Or "Hella gnarly", as they probably would've said back in the 80s ๐ŸŒŸ


"Camel Cards" - Reddit u/dimpopo

Ok, on to the actual puzzle finally. Today we are ranking poker Camel Cards hands.

Camel Cards hands are like in poker, except that only multiplicities of values matter, so there are no straights or flushes; and in case of a tie, the lexicographic order of the cards in your hand is used. How fanstastically weird. This was supposedly to make it easier to play while riding a camel, but I would think that if sailing on the back of a desert animal, keeping the order of cards preserved might not be the easiest thing to do..

Well, anyway, I do have prior experience from developing a poker hand evaluator - some decade and a half ago I worked as the AI developer for a World Series of Poker game on the Nintendo DS. ([IGN 5.5/10], but hey, we did try at least.. If I had the chance to go back in time, I'd lean the game heavily into a caricatyric direction similar to the NES Punchout).

So this puzzle should be easy from that experience, but it was anything but, and proved to be a tough adversary. More on that in a bit.

Sorting

Due to how the puzzle wanted the final result to be calculated, it became necessary to keep all the hands in memory and sort them in value order.

I wasn't worried that the C64 would run out of memory - there were 1000 hands in total, and each hand would take about 10 bytes of memory - peanuts for the C64 with its industry leading 64KB of random access work memory.

At first I started to write an implementation of a quick sort on the spot, but after a bit of pondering, realized that I was a bit hazy on the details. To not waste too much time on that, I wrapped together a quadratic time insertion sort instead so I could focus on the hand evaluator.

In the first part, to my surprise the C64 had no problems with running the quadratic time sort, and took only a couple of minutes to process through the problem. Wow, so fast!

At the very end after having solved the puzzle, I did revisit the sorting code and implemented quick sort to hide my earlier sins. You can see the finished sort code in the code listing below.

Hand Evaluation

The big issue today was in part two however. There the problem was changed so that J would mean a Joker instead of a Jack, and we were asked to re-score the thousand Camel Cards hands with this new interpretation.

This sounded easy, "nothing I shouldn't be able to handle". But nope, this was the first time for this Advent of Code where I got to really experience the dreaded "it works on the example, but not on the real input" pit (that is by far the most common post title on the AoC subreddit).

Yeah, today the part two of the problem really took me out for a ride.

It turned out that my hand evaluator was implementing the evaluation order between five of a kind and four of a kind incorrectly when exactly four jokers were involved, and the hand value for J2JJJ would be incorrectly labeled to be a Four of a Kind. That is, before seeing "a '2' plus four 'J's make a five of a kind", it would see "four jacks, so that's a four of a kind".

In the end, that was about an hour spent in deep troubleshooting, and I did have to eventually debug the issue on my PC where I could dump out the full one thousand line output into a text file to read through all of the test cases.

Nevertheless, today's puzzle was quite enjoyable. Visiting quicksort logic was a nice refresher, and optimizing the card evaluator a nice sandbox to play with after the main puzzle. I'd totally Monte Carlo with this eval() now.

While cleaning up the code on the stream, I got an awesome tip from a chat viewer, who suggested that the card evaluator could be designed so that Joker cards would be simply assigned to the card value that appeared the most in the input hand. This was a great simplification that I hadn't though about, but quickly then adopted. The final code listing I ended up with that idea applied is presented below.

Time to solve: 3 hours 32 minutes.

Mistakes made:

Day 7 Part 2 Code Listing

#include 
#include "vec.h"

u8 idx(char ch) {                       // Converts card character to integer
  switch(ch) {                          // based on its strength
  case 'A': return 12;                  
  case 'K': return 11;
  case 'Q': return 10;
  case 'J': return 0;                   // Joker is the worst single card (zero)
  case 'T': return 9;
  default: return ch - '2' + 1;         // integer digits '2'-'9' get 1..8
  }
}

struct hand {
  u32 val;                              // val: packed u32 that represents total
  int bid;                              // strength of the hand (bigger better)
  bool operator<(const hand &r) const { // bits 23..20: hand type
    return val < r.val;                 // bits 19..16: first card value
  }                                     // bits 15..12: second card value
};                                      // ...
                                        // bits 3..0: last card value
template<typename T>
void qsort(T *arr, u16 n) {             // Ad hoc quicksort :)
  T pivot = arr[n/2];                   // Choose middle elem as pivot
  arr[n/2] = arr[n-1];                  // Move last elem in place of middle
  u16 l = 0, r = n-1;
  while(l < r)                          // Pivot all elems to left or right
    if (arr[l] < pivot) ++l;            // if l should stay to left, skip it
    else arr[r]=arr[l], arr[l]=arr[--r];// otherwise locate it to the right
  arr[r] = pivot;                       // Finally put pivot back in the hole
  if (l > 1)   qsort(arr, l);           // and recurse to left
  if (r < n-2) qsort(arr+r+1, n-(r+1)); // and right sides of the pivot.
}

u32 eval(char *h) {                     // Calculate hand strength.
  u8 ct[13]={}, pairs=0, max=1,threes=0;// ct: count of each card value.
  for(int i = 0; i < 5; ++i)            // convert cards to their index values
    ++ct[(h[i] = idx(h[i]))];           // and tally up counts of each card.
  u32 val = (h[0]<<16) | (h[1]<<12)     // pack five cards to bits 19...0 for
   | h[4] | (h[3]<< 4) | (h[2]<< 8);    // the lexicographic tiebreaker values.
  for(u8 i = 2; i < 13; ++i)            // Find which card appears most times
    if (ct[i] > ct[max]) max = i;       
  ct[max] += ct[0];                     // and assign jokers to that card.
  for(int i = 1; i < 13; ++i)           // Search non-jokers to label the hand:
    if      (ct[i]==5)return(6<<20)|val;// Five of a kind? -> can early out
    else if (ct[i]==4)return(5<<20)|val;// Four of a kind? -> can early out
    else if (ct[i]==3) threes = 3;      // Mark if we see a three of something
    else if (ct[i]==2) ++pairs;         // And how many times we have pairs
  return ((threes+pairs) << 20) | val;  // Form hand score from threes and twos
}                                       // (full houses,threes,twop,pair,high)

int main() {
  vec hands;
  FILE *f = fopen("in7.txt", "r");
  char str[40];
  while(fgets(str, sizeof(str), f))     // Parse input hands
    hands.push({eval(str),atoi(str+6)});// And do hand evaluations
  fclose(f);
  qsort(hands.data(), hands.size());    // Sort hands from worst to best
  u32 winnings = 0;
  for(int i = 0; i < hands.size(); ++i)
    winnings += (u32)hands[i].bid*(i+1);// And accumulate total winnings
  printf("Winnings: %lu\n", winnings);
}
  

Day 8

The difficulty is slowly ramping up in the puzzles.

In the problem for today, a directed graph with exactly two outgoing nodes at each node was given, and an infinitely repeating walk instructions to follow it, and the question was first posed how long it will take to find a goal node in this graph.

The first part of the problem seemed easy - if it wasn't for the problem that I botched up an implementation of a hash table (missing setting the needed fields during an insert), and then incorrectly diagnosed the issue to be that I would have had too large occupancy/not enough memory on the C64 to grow the hash table big enough to be usable.

As result, I refactored the code to use a binary tree instead to identify nodes, which turned out to be a completely unnecessary refactoring (although with that change I did get the code to work).


"Probably should've learned by now not to trust elf maps" - Reddit u/flwyd

In part two, the question was modified that instead of finding the path to a goal from a single node, a total of six nodes would simultaneously walk according to the same instructions, and the task was to find when all six nodes would reach goal nodes at the same time.

On video, I wrote a naive implementation that would compute that, and concluded that it would not finish (at least not on the C64). For the first time in this Advent of Code, this was a problem that I was not able solve on the C64 directly using my editor. I reasoned that I would need to memoize the possible walks on the map to fast track through hundreds of thousands of steps, and I had no idea of how large such a memoization cache would have to be or if it would ever fit in the C64 memory.

So I took a break and headed for lunch. Afterwards I opted to solve the problem on my PC where I could explore the problem constraints better. Implementing the memoization caching idea, I got a solution for Part 2 that finishes in about 750 milliseconds on my Ryzen 5950X. A nice time for the PC, but far too long for the same code to ever complete in a reasonable time on a C64.

Nevertheless, I did want to give it a try, so I ported the same code to run on the C64.

For the first time, I wrote code to target the C64 Ram Expansion Unit, as you can see below. This was a great exercise, and it enables storing larger data structures on the C64 on the external REU cartridge (a secret weapon that I hope to use on later days to break through the 64KB RAM barrier)

Day 8 Part 2 Code Listing (memoized solution)

#include 
#ifdef __C64__                          // reu.h is my own library that enables
#include                         // targeting the C64 RAM Expansion
#else                                   // Cartridge from C/C++ code
#define reucpy(...) ((void)0)           // On native PC, reucpy() is a no-op
#endif
#include "vec.h"
#include "tick.h"

char instr[310];                        // Contains the instructions "LRLRL..."
int instrlen;                           // Length until instructions loop

struct node {                           // A node in the traversed graph
  u16 name;                             // name: Id of this node, 10 bits long
  node *l, *r;                          // left and right directions in map
  bool is_end() { return name > 1000; } // If true, this is a goal node
} nodes[2048];                          // Static hash table to find nodes by
                                        // their ID.
int find_or_add(vec &v, u24 n) {   // To make the code computable on C64,
  int i = v.index(n);                   // find_or_add() reindexes the names of
  if (i >= 0) return i+1;               // map nodes to tight sequential order.
  v.push(n);                            // This way they take only 10 bits
  return v.size();                      // instead of 15 bits, to fit in REU.
}

vec names, ends;                   // Temp data structures for reindexing
u16 name(const char *s) {               // map nodes
  u32 n =((u32)s[0]<<16)|(s[1]<<8)|s[2];// First pack "AAZ" to 15 bits, and then
  if (s[2] == 'Z')                      // resequence it to 10 bits
    return 1024-find_or_add(ends, n);
  else return   find_or_add(names, n);
}

node *find(u16 name) {                  // Given a node name, finds an existing
  u16 hash = name & 2047;               // node structure to it
  while(nodes[hash].name                // Linear probing
   && nodes[hash].name != name)
    hash = (hash+1) & 2047;
  nodes[hash].name = name;              // Add node if did not exist
  return &nodes[hash];
}

struct hash {                           // Single hash table for three purposes:
  u48 key;                              // 1) dist from node*map to next goal
  i32 value;                            // 2) dist from node1*node2*map to next
#ifdef __mos__                          //    time both nodes are at goal
} HASH;                                 // 3) end node if node*map travels N
#else                                   //    steps
} hash[65536];                          // On PC use 64K element hash table, on
#define HASH hash[hsh]                  // C64 a REU memory location
#endif
u16 hsh;                                // Hash index of last accessed entry

void hash_table_find(u48 key) {         // Given a key, access the element from
  hsh = ((u16)key) ^ ((u16)(key >> 16)) // hash table
                   ^ ((u16)(key >> 32));
  reucpy(&HASH, hsh*sizeof(hash),       // On C64 call REU to read table entries
         sizeof(hash), REU2RAM);
  while(HASH.key && HASH.key != key) {
    ++hsh;                              // Linear probing
    reucpy(&HASH, hsh*sizeof(hash),
           sizeof(hash), REU2RAM);
  }
}

void hash_table_update(u48 key,i32 val){// Updates hash table entry
  HASH.key = key;
  HASH.value = val;
  reucpy(&HASH, hsh*sizeof(hash),       // Write new entry to REU
         sizeof(hash), RAM2REU);
}

node *step(node *n, u16 pos, int steps){// Advance from node/pos given # steps
  u48 p = pos | (((u48)n->name)<<9)     // Search if we have memoized this in 
        | ((u48)steps << 19)|(1ull<<43);// hash table? (key=9+10+24 bits+1-bit
  hash_table_find(p);                   // marker sentinel == 44 bits)
  if (HASH.key == p)
    return find(HASH.value);            // Memoization hit, return result
  node *t = n;                          // No hit, must manually wind the result
  while(steps--) {
    t = (instr[pos] == 'L') ?t->l:t->r;
    if (++pos >= instrlen) pos = 0;
  }
  hash_table_update(p, t->name);        // Save result in hash table
  return t;
}

int dist1(u16 pos, node *n) {           // Return distance of node n to goal
  u32 p = pos | (u32)(n->name << 9);    // Check if we have memoized this in 
  hash_table_find(p);                   // hash table? (9 + 10 == 19 bits)
  if (HASH.key == p) return HASH.value; // If hit, return existing result
  int steps = 0;                        // Otherwise do manual walk
  node *t = n;
  do {
    ++steps;
    t = (instr[pos] == 'L') ?t->l:t->r;
    if (++pos >= instrlen) pos = 0;
  } while(!t->is_end());
  hash_table_update(p, steps);          // Save result in hash table
  return steps;
}

int dist2(u16 pos, node *n, node *n2) { // Return distance of node pair n1*n2
  u48 p = pos | (((u48)n ->name) << 9)  // to goal. (hash table key 9+10+10 =
              | (((u48)n2->name) << 19);// 29 bits)
  hash_table_find(p);                   // Test hash table hit first
  if (HASH.key == p) return HASH.value;
  int steps = 0;                        // No hit, do manual walk
  do {
    int s1 = dist1(pos, n);
    int s2 = dist1(pos, n2);
    if (s2 > s1) s1 = s2;
    n = step(n, pos, s1);
    n2 = step(n2, pos, s1);
    steps += s1;
    pos = (pos + s1) % instrlen;
  } while(!n->is_end()|| !n2->is_end());
  hash_table_find(p);                   // Save result to hash table. Must do a
  hash_table_update(p, steps);          // new lookup first, because calls to 
  return steps;                         // dist1() have also done table lookups.
}

bool all_at_end(vec &v) {        // Returns true if all nodes in vec are
  for(node *n : v)                      // at goal state
    if (!n->is_end()) return false;
  return true;
}

int main() {
  FILE *f = fopen("in8.txt", "r");
  fgets(instr, sizeof(instr), f);       // Read walk instructions
  instrlen = strlen(instr);
  instr[--instrlen] = 0;                // Remove pesky \n that fgets adds
  vec cur;
  char s[8], l[8], r[8];
  while(fscanf(f,"%s = (%s %s",s,l,r)>2)// Read map nodes
  {
    node *n = find(name(s));            // Add nodes to graph and populate left
    n->l = find(name(l));               // and right children
    n->r = find(name(r));
    if (s[2] == 'A') cur.push(n);       // Store start states
  }
  fclose(f);
  u64 steps = 0;
  u16 pos = 0;
  tick_t t0 = tick();
  while(!all_at_end(cur)) {
    u32 s1 = dist2(pos, cur[0], cur[1]);// Walk nodes in three pairs, in
    u32 s2 = dist2(pos, cur[2], cur[3]);// memoized fashion.
    u32 s3 = dist2(pos, cur[4], cur[5]);
    if (s2 > s1) s1 = s2;               // Calc max distance to next goal?
    if (s3 > s1) s1 = s3;
    for(node *&n:cur) n=step(n,pos,s1); // Walk all nodes that amount
    steps += s1;
    pos = (pos + s1) % instrlen;
  }
  (tick()-t0).print();
  printf("%llu steps\n", steps);
}

In the above code, I use memoization in three different ways to speed up computation:

  1. Each node caches how many steps it will need walk to reach the next goal node.
  2. Each pair of nodes cache how many steps they will need to walk together until they both reach goal nodes simultaneously, and
  3. Each node caches which node it will reach after having walked a given number of steps.

Then instead of individually advancing six "walkers" through the map, I formed three pairs of nodes from these, and handled them effectively as a cartesian product of each other, memoizing their results together. I used the same hash table to memoize all three results to simplify the REU accesses for the C64. (even though this code never really finished execution on the C64)

I realized that due to the problem construction, each goal node must follow a cycle to reach back to the goal node, so that the goals are visited infinitely many times (or otherwise solutions would not generally exist, or if a solution was acyclic, it would be very quick to compute)

Today's puzzle is very similar to Day 17 from last year, where an infinitely tall tetris board would exhibit a periodic cycling pattern.

However the big issue that I got stuck at was wondering "what if a node reaches a cycle that has two goal nodes in the cycle's path?" Then these two would have the same cycle length, but the phase would be different, so one would have to figure out which one of those goal nodes would be the first one to reach a solution. (and any of the six walkers might have different numbers of such offseted periodic visits to goal nodes, creating a combinatorial question)

When browsing the subreddit after having solved the puzzle, I discovered that the problem inputs had been deliberately constructed in such a way that multiple visits to goal nodes would not occur during the cycle from goal to goal. Additionally, the problem inputs had been further simplified so that the distance of each start node to a goal node was the same as the length of each cycle.

This property made the code shorter and faster, as one can simply compute the Least Common Multiple of the distance from each start node to their first visit to a goal node. This was a blind spot, it did not occur to me to analyze the problem inputs to have been "rigged" like this.

The thing is, most programming exercises teach people to think about general cases and corner cases, and to not make any unsound assumptions that might not hold. The challenge in today's puzzle was to opportunistically discover specific properties that existed in the provided puzzle input and to exploit them as discovered. This is not easy!

I was still quite happy to have solved the problem in less than a second on a PC with my general memoization based solver that didn't assume the unique cycles and matching start-to-cycle-length property - even though the resulting code became quite a bit longer than usual.

Day 8 Part 2 Code Listing (LCM calculation)

#include 
#include "vec.h"

struct node {                           // A node in the traversed graph
  u16 name;                             // name: Id of this node, 15 bits long
  node *l, *r;                          // left and right directions in map
} n[2048];                              // Static hash table to find nodes by
                                        // their ID.
node *name(const char *s) {             // Returns (or adds) node by name
  u16 name = 1 + (((s[0]-'A') << 10)
    | ((s[1]-'A') << 5) | (s[2]-'A'));
  u16 hash = name & 2047;               // Hash to table
  while(n[hash].name
   && n[hash].name != name)
    hash = (hash + 1) & 2047;           // Linear probing
  n[hash].name = name;                  // Add node if did not exist
  return &n[hash];
}

u64 lcm(u64 a, u64 b) {                 // Computes Least Common Multiple of
  u64 prod = a*b, t;                    // a and b
  while(b) t = b, b = a%b, a = t;
  return prod / a;
}

int main() {
  FILE *f = fopen("in8.txt", "r");
  char instr[310], s[8], l[8], r[8];
  fgets(instr, sizeof(instr), f);       // Read walk instructions ("LRLRLR..")
  int len = strlen(instr);
  instr[--len] = 0;                     // Remove pesky \r that fgets adds
  vec cur;
  while(fscanf(f,"%s = (%s %s",s,l,r)>2)// Read map nodes
  {
    node *n = name(s);                  // Add nodes to graph and populate left
    n->l = name(l);                     // and right children
    n->r = name(r);
    if (s[2] == 'A') cur.push(n);       // Store start states
  }
  fclose(f);

  u64 total = 1;
  for(node *n : cur) {                  // For each start node, find how long
    int steps = 0;                      // until they reach the goal node
    while((n->name & 31) != 26)
      n = (instr[steps++ % len] == 'L')
        ? n->l : n->r;
    total = lcm(total, steps);          // And compute the Least Common Multiple
  }                                     // as the result when they all need to
  printf("%llu\n", total);              // reach the goal node (by problem
}                                       // construction)

Finally, as an exercise, I wrote a solution that utilizes these assumptions for comparison, as shown above. This one does complete on the C64 in a total of 1 minute and 46.6 seconds, quite a good time.

Phew, for a while I thought this might be the first day when the C64 would get a DNF - but nope, still in the game.

Time to solve: 2 hours 49 minutes on video + 2 hours offline.

Mistakes made:

Day 9

After the involved problem yesterday, today's problem was nice to be considerably shorter.

In this puzzle, a list of numbers was given, and the task was to "extrapolate" this list using a specific rule that repeatedly derived pairwise differences between subsequent numbers.

Then a new column/edge of numbers needed to be added at the right end of this computed triangle.

Doing this naively would result in a O(n^2) time and space complexity. That is typically a warning sign in AoC puzzles, so I started by checking the length of these input lists, and discovered that in my input, the lists all had exactly 21 elements in them. So 21*21*sizeof(u32)=1,764 bytes. Admittably, even though quadratic, that would not really yet be much of a problem for the C64 memory capacity.

But there were a lot of problem instances to solve, 200 total. So performing a quadratic time calculation 200 times might be a bit intensive for the C64. Not that it couldn't calculate it.. but it might take on the order of minutes.

Since I have this advanced dual screen technology with inking support, might as well use it. I started to dabble at the pairwise differences on the second display in a small four numbers problem instance to see what a solution formula would look like. Hopefully I would get a clue to not have to expand out the full intermediate triangle for each input.

Doing that made it apparent that the numbers from Pascal's triangle are involved. If the input has four numbers, the fifth extrapolated number will be a dot product of the four input numbers and the coefficient vector (4, -6, 4, -1). That is almost the fifth row of Pascal's triangle, except the first 1 is missing, and the signs alternate.

Surely this pattern would carry all the way up to inputs with 21 numbers. So if I computed the 22nd row of Pascal's triangle, I would have the coefficients that I'd need to extrapolate the 22nd number of the sequence.

This approach has the benefit that I will only need to do a quadratic number of operations once at program startup to enumerate Pascal's triangle, but processing each of the 200 problem instances is then doable in O(n) linear time each.

The final code shows a function pascal() that does this enumeration. In my solution, I computed the full triangle (for O(n^2) memory usage), even though it would have been possible to only do the 22nd row in linear time and memory usage (using the n-choose-k formula for the cells of the Pascal triangle). I was anticipating that maybe I'd need the full triangle for Part Two, so I didn't jump into any kind of optimizations here.

After some negative number parsing fixes, a couple of index fixes, and double checking against i32-to-i64 overflow (which didn't occur), I got the right answer submitted on the first try. Yay, that always feels good.

Day 9 Part 2 Code Listing

#include 
#define SZ 22
#define P(y,x) p[(y)*SZ+(x)]            // Access pascal triangle at row=y,col=x

u32 p[SZ*SZ];                           // A square storage fit for a triangle
void pascal() {                         // Precomputes Pascal's triangle
  P(0,0) = P(1,0) = P(1,1) = 1;         // First two rows are all ones
  for(i8 y = 2; y < SZ; ++y) {
    P(y, 0) = 1;                        // Leftmost column/edge is all ones
    for(i8 x = 1; x <= y; ++x)
      P(y,x) = P(y-1,x) + P(y-1,x-1);   // Each cell is sum of their parents
  }
}

int main() {
  pascal();
  FILE *f = fopen("in9.txt", "r");
  char str[200];                        // Input lines
  i32 num[SZ], oasis = 0;
  while(fgets(str, sizeof(str), f)) {   // Read input line by line
    i8 n = 0;                           // n: total number of numbers in input
    for(char *s = str; *s;) {           // Parse numbers from line, use atol as
      num[n++] = atol(s);               // on C64 int is 2 bytes, long 4 bytes
      while(*s >= '-') ++s;             // Skip the number (any of '-', '0'-'9')
      while(*s && *s < '-') ++s;        // Skip the whitespace to next number
    }
    for(i8 i = 0; i < n-1; i += 2)      // Accumulate with alternating signs in
      oasis += P(n, i+1) * num[i]       // + and - pairs to avoid modern silly
             - P(n, i+2) * num[i+1];    // -1 * multiplications or branches
    if (n%2) oasis += num[n-1];         // And the last leftover number
  }
  fclose(f);
  printf("%ld\n", oasis);
}

In the second part of the problem, the task was slightly modified that instead of the right edge, the new column of numbers needed to be extrapolated to the left edge of this triangle.

On video I did quite the verbose rederivation of what the solution for this new case would look like in terms of my original indexing. My notation got a bit wieldy, and this degenerated to a not-much-thinking involved brute force calculation.

Fortunately in the end I didn't make a mistake, and came to the right conclusion that this scenario was a mirrored calculation from the first part, so changing indexing from num[n-1-...] to num[...] to flip the number list backwards was all it took to account for that.

Coarsely clocking the execution time from the video, it took about 1 minute and 18 seconds for the Commodore 64 to compute the result. The Pascal triangle definitely helped here.

Afterwards it made me chuckle a bit to realize that when solving the first part, I accidentally made a mistake that resulted in correctly computing the answer for the second part in advance! (video at about 1h 08 min mark)

Time to solve: 1 hour 49 minutes.

Mistakes made:

๐ŸŒŸ๐ŸŒŸ and onwards to the next day.

Later, I found out there is a very relevant Mathologer video that illustrates the mathematics of where this puzzle comes from.

Day 10

I was somewhat hoping that today's puzzle would be another quick and easy one so that I would find more time to debug some of the recurring bugs in my editor setup.

And today kind of could have been that, but instead my code writing went into the silly bugs camp.


"It's a new day... " - Reddit u/jovani_lukino

In the first part the task was to follow a loop of pipes on a 2D map to calculate the farthest point in the loop from a start position. In this kind of pipe loop, the farthest point is just half of the length of the loop, so I measured the whole loop length to find out the answer. I finished this one on the first attempt in just under an hour.

The second part of the puzzle was an interesting one. In this problem the task was to calculate how many map cells were enclosed fully inside the loop that the pipe complex formed.

My strategy was to run a flood fill starting from "seed" points that I could positively conclude to lie inside the interior of the shape during the initial perimeter walk.

Here issues started to pile on top of each other. I wrote a couple of silly typo bugs that turned out to be hard to find on the C64 editor in the printf() debugging land. The program code started getting bigger, and again I was in a situation where I would be running out of memory in my editor, and it created all sorts of problems.

After an hour of debugging part two, I had a lunch break, and then ran the code natively in the Visual Studio debugger (yay for portability, I can take my C64 C/C++ code to Visual Studio just like that). There I immediately noticed my bug: I had defined a macro AT() for a different purpose that made it unsuitable to call in the flood fill code. Duh.

There was another logical mistake I had made: at first I though that only fully horizontal and vertical pipes could enclose cells inside the pipe polygon, but that was not correct: any shaped pipe piece can do that. To figure that one out, I did have to write an interactive visualizer for the problem, which can be seen above. Unfortunately with a 40x25 screen, I would only be able to visualize the examples, but fortunately that was enough to highlight the mistake in my logic.

I was then pretty close to be able to do a run of the computation through for part two. But then surprisingly, I got into a situation where the program code file I had saved from memory to the 5,25" disk was too large to fit back to memory (in a clean slate). Now how could that be possible?!

This editor problem was a hard blocker for continuing on the C64, so I had to finish off the puzzle on a PC.

Resuming on a PC and after submitting a wrong solution that was too low, I understood that the design I had crafted before hitting the editor block was still not logically complete. I had missed a (literal) corner case: when passing a corner piece, there are three possible neighboring cells that can end up "winding" on the inside of the pipe complex area.

I added all those possibilities into the flood fill "seed" set and that fixed up the mismatch in the calculation. With that, the second star appeared.

Getting It Back To The C64

Porting the code to run on the C64 required one more trick. Naively implementing the flood fill algorithm would result in very long 30KB+ entries in the flood fill set. This would no longer fit easily in the C64 RAM because we also have the main map in RAM that takes another 20KB and a flood fill "visited" bitmap for another 2,5KB. Program code was ~10KB.

To resolve this memory challenge, I needed to make sure that the flood fill set would not acquire any unnecessary or duplicate seed entries. This was resolved by walking the pipe loop twice: the first time marks all the cells that are part of the pipe. Then only on the second go-around all the unmarked inside cells are actually enqueued. This way the flood fill set would not append any duplicate entries, or entries that would be part of the pipe loop itself.

With this optimization, the flood fill set fit in under 256 entries, removing all memory issues from the C64.

Another difference between the code on the video and the finished code below was that I threw in an optimization to remove all intermediate multiplications. Recall, the C64 6510 CPU does not have a multiplication instruction, so variable-length multiplications need to be emulated (which the compiler fortunately does automatically whenever needed)

My code only used multiplications to linearize a 2D address into a 1D array via the common y*w+x code, so it was possible to rewrite it to use these 1D linearized addresses in the first place.

Under such scheme, y+1 and y-1 calculations instead become pos+width and pos-width respectively.

With these memory and CPU optimizations, the finished code runs on the C64 in 1 minute and 59 seconds. Not bad!

Day 10 Combined Parts 1 and 2 Code Listing

#include 
enum DIR { U, D, L, R } dir;            // Which direction are we walking?
u16 width, fill_queue[256];             // Contains a flood fill set
char map[20000];                        // Receives the read map
u8 fill_seen[20000/8], qsz, queueing;   // fill_seen is a bitmask of done cells
#define SEEN(pos) \
  (fill_seen[(pos)>>3] & (1<<((pos)&7)))// qsz: current fill queue size
#define MARK(pos) \
  (fill_seen[(pos)>>3] |= 1<<((pos)&7)) // SEEN & MARK: test and set flood fill
                                        // bitmap elements.
void queue(u16 pos) {                   // Enqueues the given position for flood
  if (queueing && !SEEN(pos)) {         // fill purposes, but only if queue
    fill_queue[qsz++] = pos;            // writing is currently enabled.
    MARK(pos);                          // Mark the position so it is not queued
  }                                     // again. (which would waste memory)
}

u16 loop(u16 pos, u16 endpos, DIR dir) {// Loops through the pipe complex.
  MARK(pos);                            // Mark the starting position as seen.
  u16 steps = 0;                        // Calculates total pipe length.
  while(pos != endpos) {                // Walk the pipe until end(start)
    if (map[pos] == '|') {              // At each character, queue the left-
      queue(pos+(dir==D?1:-1));         // hand side of that character to be
      pos += (dir==D) ? width : -width; // flood filled later. Then look at the
    } else if (map[pos] == '-') {       // character under current position, and
      queue(pos+(dir==L?width:-width)); // combine the information with the dir
      pos += (dir==R) ? 1 : -1;         // we came from, to figure out which
    } else if (map[pos] == 'F') {       // direction in the pipe we will walk
      if (dir == U) {                   // next.
        queue(pos-1);                   // Gotcha with corners: add the three
        queue(pos-1-width);             // outer edge corner positions into the
        queue(pos-width);               // flood fill set as well, to not miss
        ++pos;                          // any possible important cells.
        dir = R;
      } else {
        queue(pos+1+width);
        pos += width;
        dir = D;
      }
    } else if (map[pos] == 'L') {       // For example,
      if (dir == D) {                   //      .|a
        queue(pos+1-width);             //      bL-
        ++pos;                          //      bb.
        dir = R;                        // If we are entering an L from top side
      } else {                          // then cell a winds to the left. If
        queue(pos-1);                   // instead we enter L from right side,
        queue(pos-1+width);             // then all the cells marked b wind to
        queue(pos+width);               // the left, so add all of them in the
        pos -= width;                   // fill queue.
        dir = U;
      }
    } else if (map[pos] == '7') {
      if (dir==R) {
        queue(pos+1);
        queue(pos-width);
        queue(pos+1-width);
        pos += width;
        dir = D;
      } else {
        queue(pos-1+width);
        --pos;
        dir = L;
      }
    } else if (map[pos] == 'J') {
      if (dir==R) {
        queue(pos-1-width);
        pos -= width;
        dir = U;
      } else {
        queue(pos+1);
        queue(pos+width);
        queue(pos+1+width);
        --pos;
        dir = L; 
      }
    }
    MARK(pos);
    ++steps;
  }
  return steps;
}
int main() {
  BENCH();                              // Print total execution time at the end
  FILE *f = fopen("in10.txt","r");
  fread(map,1,sizeof(map),f);           // Read the whole 2D map in memory
  fclose(f);
  while(!isspace(map[width])) ++width;  // Find stride (width+\n) of input map
  ++width;                              // Account +1 for the newline
  u16 pos = strstr(map, "S")-map;       // Find linear index of start pos S.
  u16 endpos = pos;                     // We want to end at start.
  char r=map[pos+1], d=map[pos+width];  // Calculate which way from S to start
  if (r == '-' || r == 'J' || r == '7') // walking. This choice is a bit
    ++pos, dir = R;                     // arbitrary and will lead to 50%-50%
  else if (d=='|'||d=='J'||d=='L')      // right choice. My strategy was to
    pos += width, dir = D;              // calculate the result, and if wrong,
  else --pos, dir = L;                  // then flip the winding order to
                                        // the opposite handedness.
  loop(pos, endpos, dir);               // Because C64 has so little memory, 
  queueing = true;                      // find I need to walk the pipe twice.
  u16 steps = loop(pos, endpos, dir);   // First time to mark which cells are
  printf("Farthest: %u\n",(steps+1)/2); // part of the pipe complex. Then on the
                                        // second loop, we actually start
  u16 num_tiles = 0;                    // enqueueing neighboring tiles into the 
  while(qsz--) {                        // flood fill set. This saves memory
    ++num_tiles;                        // as the queue will then not receive
    MARK(fill_queue[qsz]);              // any cells that end up to be part of
    queue(fill_queue[qsz]-1);           // the pipe complex itself. (this two-
    queue(fill_queue[qsz]+1);           // pass strategy shrunk the queue size
    queue(fill_queue[qsz]-width);       // from ~11k nodes down to < 256 nodes)
    queue(fill_queue[qsz]+width);       // Flood fill loop 4x neighbors.
  }
  printf("Tiles: %u.\n", num_tiles);
}

The above code actually has a 50%-50% chance of being incorrect for different problem inputs. This is because the winding order around the pipe loop is not checked, and it is assumed that while walking the pipe loop, the left-hand side will form the inside of the pipe. If this assumption did not hold, the program will likely just crash. If so, then just flip the winding assumption and rerun again. Absolutely love it, that is the level of robustness that is fit for my AoC solutions :)

Time to solve: 3 hours 10 minutes on video on the C64 + 25 minutes offline on the PC.

Mistakes made:

After having solved the problem and simmering in my thoughts in a hot sauna, I realized that it should be possible to use the parity-based inside-outside-inside point-in-polygon algorithm to solve the second part as well.

Late in the evening, I itched to implement that code. And indeed, that version is about 30% shorter (77 lines vs 113 lines for the original), uses less memory, and is twice as fast as well (56.6 seconds on the C64 instead of the 1 min 59s from above)! It is also easier to understand and quicker to write than the winding order based code above.

Day 10 Point-in-Polygon Variant Code Listing

#include 
enum DIR { U, D, L, R };                // Which direction are we walking?
u16 w;                                  // Map width (actually +1 since newline)
u8 map[20000];                          // Map chars
#define MARK(pos)    ((pos) |= 0x80)    // Flags given map pos as visited
#define VISITED(pos) ((pos) &  0x80)    // Checks if given pos has been visited

u16 loop(u16 pos, u16 endpos, DIR dir) {// Walks through the pipe loop, marking
  u16 steps = 0;                        // all visited cells of the pipe.
  while(pos != endpos) {                // Loop until we reach back at S
    u8 ch = map[pos];
    MARK(map[pos]);                     // Mark the current map position visited
    if      (ch=='|')pos+=(dir==D)?w:-w;// Move on the map according to the 
    else if (ch=='-')pos+=(dir==R)?1:-1;// direction of the pipe at the current
    else if   (ch == 'F') {             // cell. Also update the current walk
      if (dir == U) pos += 1, dir = R;  // direction accordingly.
      else          pos += w, dir = D;
    } else if (ch == 'L') {     
      if (dir == D) pos += 1, dir = R;
      else          pos -= w, dir = U;
    } else if (ch == '7') {
      if (dir == R) pos += w, dir = D;
      else          pos -= 1, dir = L;
    } else if (ch == 'J') {
      if (dir == R) pos -= w, dir = U;
      else          pos -= 1, dir = L; 
    }
    ++steps;
  }                                     // Finally remember to mark the last map
  MARK(map[pos]);                       // position, and return farthest point
  return (steps+1)/2;                   // on the loop.
}

u16 num_inside() {                      // Runs the point-in-polygon test to
  u16 num = 0;                          // calculate the number of cells inside
  bool inside = false;                  // the closed pipe loop.
  for(u8 *m = map; *m; ++m)
    if (*m < 32) inside = false;        // If we are at newline, reset parity
    else if (VISITED(*m)) {             // If this cell is part of the pipe
      u8 c = *m & 0x7F;                 // Read pipe direction
      if (c=='|' || c=='L' || c=='J')   // If pipe direction intersects with
        inside = !inside;               // horizontal line, flip point-in-poly
    }                                   // parity
    else if (inside) ++num;             // This cell was not part of loop, incr.
  return num;
}

int main() {
  BENCH();                              // Return elapsed time at end of main
  FILE *f = fopen("in10.txt","r");
  fread(map,1,sizeof(map),f);           // Read the whole 2D map in memory
  fclose(f);
  while(!isspace(map[w])) ++w;          // Find stride (width+\n) of input map
  ++w;                                  // Account +1 for the newline
  u16 pos = strstr((char*)map, "S")
          - (char*)map, ep = pos;       // Find linear index of start pos S.
  char r=map[pos+1], d=map[pos+w];
  char l=map[pos-1], u=map[pos-w];
  u8 s = 0;
  if (r=='-'||r=='J'||r=='7') s|=(1<<R);// Calculate what kind of pipe character
  if (d=='|'||d=='J'||d=='L') s|=(1<<D);// must lie under the S symbol
  if (l=='-'||l=='F'||l=='L') s|=(1<<L);
  if (u=='|'||u=='F'||u=='7') s|=(1<<U);
#define SHAPE(a,b) ((1<<(a))|(1<<(b)))
  if (s == SHAPE(R,D)) map[pos] = 'F';  // Write the correct character to
  if (s == SHAPE(R,L)) map[pos] = '-';  // replace the S symbol on the map, so
  if (s == SHAPE(R,U)) map[pos] = 'L';  // that the point-in-polygon test will
  if (s == SHAPE(D,L)) map[pos] = '7';  // work correctly.
  if (s == SHAPE(D,U)) map[pos] = '|';
  if (s == SHAPE(L,U)) map[pos] = 'J';
  DIR dir;
  if      ((s&(1<<R))!=0)pos+=1,dir = R;// Calculate which direction we should
  else if ((s&(1<<D))!=0)pos+=w,dir = D;// start walking from the start cell
  else                   pos-=1,dir = L;
  printf("Dist: %u\n",loop(pos,ep,dir));// Part 1: Find farthest cell in loop
  printf("Inside: %u\n",num_inside());  // Part 2: Count how many cells in loop
}

One drawback here is that this code does not have that lovely 50%-50% chance to crash if run with the wrong starting winding orientation. Darn, that feeling of danger really started to grown on me.

Maybe the story here is that sometimes even if you thought you had something smart going the first time, it can happen that an even smarter method still exists. Maybe there is a method that is even simpler than the above?

As a closing note, running the same code on a Ryzen 5950X yielded a 0.29 *milli*second runtime. This means the 5950X PC ran the code 195,172 times faster! Yes, that is about two hundred thousand times faster. Although most of that speed difference is attributable to the ridiculously slow 1541 disk drive.

Interlude: My Original Commodore 64 Games

When I was a kid, I only had a couple of original Commodore 64 games, which I treasured a lot. I still have all of them. Here are the most of them:

All my games are on cassette, since I never had a disk drive when I was a kid.

I know I have a couple of more original games, but I think they are in a box in the basement somewhere, and I couldn't find them right now. (I am missing at least Delta, Road Runner, Motos, Jack The Nipper 2 and a Yogi Bear game that I can remember)

My best original C64 games are Turtles 1 (the blue box), Bubble Bobble and Rainbow Island.

Other games you can see there are Turtles 2 (Turtles: The Coin-op!), The New Zealand Story, Topper The Copper, Captain Fizz, Montezuma's Revenge, and then a six pack collection of Eagle's Nest, Batty, Ace, Shockway Rider, International Karate and Light Force.

I've sometimes pondered why we have exactly these games.. I was way too young to understand or read game reviews, my mom followed C64 games industry very little, and only one of my friends at the time had a Commodore 64 (although by that time he was more into the NES). Back at the time, I didn't yet read any games magazines to know about games before buying.

Maybe my mom would go in the local book store and asked the salesperson there to recommend a game, or maybe she asked my uncle for a recommendation..

On one occassion I recall visiting a store "Komentokeskus" in Oulu, Finland, and she asked me to pick one that had interesting cover art. I would probably have been seven or eight years old (although I vaguely recall that maybe during that visit it would have already been the PC era, and I would have chosen Bionic Commando for the 386 just based on cover art, which turned out to be a horrible game).

(Btw, searching information about Komentokeskus in Oulu uncovered this rather raunchy and non-politically correct NSFW Oulu University engineer's frat guild Easter celebration brochure.. that was quite the wild read. That would never have gotten a permission to be published in this day and age - sorry, in Finnish only)

In any case, the original games we ended with weren't all that bad at all. That "Addicted To Fun: Rainbow Collection" was a serious 10/10 selection of some of the greatest C64 games.

Montezuma's Revenge I also recall quite fondly, even though the movement controlling scheme was pretty unforgiving. It somehow just "felt" cool.

Into The Eagle's Nest I also remember to be super cool, but it was also too hard that I never got anywhere in it.

Topper The Copper and Captain Fizz on the other hand were pretty poor games.

I always thought you were some kind of miner on an alien planet in Topper The Copper. Now reading the game inlet cover, and looking at a magazine review, you're instead supposed to be a detective solving a crime. Lol, the graphics didn't quite portray the game itself.

I never had any idea what you were supposed to do in that game..

Oh, and International Karate of course was legendary.

Here's a look inside the Turtles 1 box:

There was a manual, the game cassette, a post card, and a paper slip that cautioned that the game might not be exactly like portrayed in the pictures or in the manual, but that it would only certainly be better if the pictures didn't match (lol).

The manual has a pretty fun quirk to it. The middle center fold looks like this:

The title reads "Teenage Mutant Hero Turtles Password Book".

Yeah, that is the copy protection scheme. The game would ask a four digit number on a certain row-by-column when it starts up.

The background texture was made so to resist photocopying easily.

I do recall an afternoon at my child daycare (so I must have been pre-school back then), when that friend of mine (who owned both a C64 and a NES) would come to visit, and an older boy at the daycare would offer to make a copy of my C64 Turtles cassette to my friend, and then we sat down on a table and recited all the numbers in the copy book, and my friend wrote all those numbers by hand on a new piece of paper.

I mean, you could have tried to educate the 5-6 year old me about software piracy back then, although that might have been tough, since at that age I was probably still struggling to understand what a job even is, beyond the "your mom and dad go there every day" bit.

Some time later, my friend would teach me how to actually beat Turtles 1. I don't know if he figured it out, or if he read it in a magazine, but he was a hero in my eyes ever since then!

Tried to finish the game now again.. The cassette does still load, and I got past the swimming level, but after that I didn't remember how to get into the Turtles van, and lost my whole Turtle team. Bummer.

Anyhow, that's a glimpse into my original C64 games collection. It wasn't a big collection, though now with help of Protovision and Psytronik Software, it has grown a bit bigger.

Some of my recent favorites are Lโ€™Abbaye Des Morts and Fix It Felix Jr. Antonio Savona is my modern day hero.

That's everything for now, back to puzzling Advent of Code!

Day 11

Ugh, it's Monday. I still have this week of work left before I head out to my Christmas vacation. So for the next few days, I'll have to tighten my schedule a bit.

Today I started to do the puzzle in the morning before I went to work. I thought I'd look at it for an hour and then come back to do the rest later.


"Cosmic Expansion as digital art" - Reddit u/encse

In today's puzzle, we get a 2D grid of galaxies:

...#......
.#.....#..
#..#....#.
..........
......#...
.#.....#..
.........#
.#........
.......#..
#...#.....

and the task is to compute the sum of distances of all pairs of galaxies. But there's a twist. Before calculating the distances, whenever there is a full empty row, or an empty column on the map, the row or column should be repeated, first just two, and then in the second part a million times.

First reaction was that naively pairwise distances would mean a O(N^2) calculation, which would be too much for the Commodore 64, as my problem input has 441 galaxies, so 441*440/2 = 97,020 galaxy pairs to process. That would certainly mean a very long execution time on the C64.

Again, in AoC puzzles a quadratic solution is rarely the best thing that is possible, so I started to look for a faster solution.

After scribbling a bit, I realized that calculating Manhattan distances is separable: I can separately calculate the pairwise X distances, and then the pairwise Y distances, and sum all of those together.

In particular what this means is that I can project all of the galaxies onto two 1-dimensional arrays, where the number in an array cell specifies how many galaxies exist in that projected row or column. For example, the above map would project down to two separate arrays like this.

..#....... -> 1
.....#.... -> 1
#......... -> 1
..#.....#. -> 2
.......... -> 0
.......... -> 0
..#....... -> 1
.......... -> 0
.......... -> 0
#...#..... -> 2

||||||||||     
vvvvvvvvvv     

2030110010     

Then the problem would simplify to calculating pairwise distances along both axes separately.

This could be solved quadratically by looping over all index pairs of both arrays, but I really wanted to avoid all quadratic runtime.

After a little bit of pondering, I settled with an idea to do just one linear scan of the 1D array, and to conceptually "shift" the galaxy positions as I move along the 1D array.

This way a single linear pass over the 1D array would be able to calculate all the pairwise distances in one go without needing to rely on a pairwise scan.

Let me try to explain visually how this shifting idea works. When scanning, say, columns, first find the two leftmost columns that contain galaxies, as shown in green and red below.

..#....... -> 1
.....#.... -> 1
#......... -> 1
..#.....#. -> 2
.......... -> 0
.......... -> 0
..#....... -> 1
.......... -> 0
.......... -> 0
#...#..... -> 2

||||||||||     
vvvvvvvvvv     

2030110010     

Then, it is possible to calculate the pairwise X-distances of all the green galaxies against all the red galaxies in one multiplication: the total X-distance between all these five galaxies is 2(green)*3(red)*3(distance)=18.

Then to continue, conceptually update the galaxy map by moving all the red galaxies over to the same column where the green galaxies are:

#......... -> 1
.....#.... -> 1
#......... -> 1
#.......#. -> 2
.......... -> 0
.......... -> 0
#......... -> 1
.......... -> 0
.......... -> 0
#...#..... -> 2

||||||||||     
vvvvvvvvvv     

2000110010     

This way, all the galaxies seen so far are always placed in the same (first) column, so that when continuing the iteration over to the next column that contains galaxies (the fifth column), it will again be possible to calculate all the pairwise distances in one multiplication like above.

But of course shifting the three red galaxies over to the left incorrectly increases the calculated X-distances of all those red galaxies between all the remaining galaxies on the right side of the red column. So in order to keep the count correct, this perturbation that the shifting has created needs to be accounted for up front.

The realization here is that accounting for this perturbation is actually (computationally) super easy.

Three red galaxies were moved a distance of three units (or 1,000,001 units in Part Two) to the left, and in this example there are three galaxies to the right of the red column still remaining to be counted.

So the pairwise distances between all of the reds against all of the remaining galaxies got increased by the distance that they were moved, so this distance should be subtracted off early to account for the result.

In this case, the distance correction would be - 3(red galaxies)*3(remaining galaxies to the right of the red column)*2(distance units)=-18 units total.

#......... -> 1
.....#.... -> 1
#......... -> 1
#.......#. -> 2
.......... -> 0
.......... -> 0
#......... -> 1
.......... -> 0
.......... -> 0
#...#..... -> 2

||||||||||     
vvvvvvvvvv     

5000110010     

Then the same process can be repeated for the next column that contains galaxies in it. Here we accumulate an X-distance of 5(green)*1(blue)*7(distance)=35 to the total score. This is more than the distances really would be, but it is correct since we already pre-accounted above for the overcounting we are doing here for those three shifted red galaxies.

Then we again shift the (above blue) galaxies to the left (only conceptually, no need to actually update the map), fix up the count, and so on and so on.

When scanning for the row array, identical code applies. All galaxies are repeatedly moved to the start of the array.

So.. in the very end with all this projective flattening both ways, I guess all the galaxies now reside in the same cell (0,0) of the map ...creating one supermassive singularity? Phew, good thing that we did all of that only conceptually, right?

The final code is shown below. The arithmetic optimizes down to a very simple expression in the inner loop of the 1D scan.

Day 11 Part 2 Code Listing

#include 

char line[300];                         // Read input here line by line
u8 row[300], col[300];                  // Project gxies to 1D rows and columns

u64 scan(u8 *arr, int last, int ngxies){// Processes pairwise distances in 1D
  i64 dist = 0;                         // Sums total distance
  u32 d = 1;                            // Expanded distance from x=0 to x=i
  i32 gxies = arr[0];                   // gxies: track num of galaxies passed
  for(int i = 1; i <= last; ++i, ++d)
    if (arr[i]) {                       // If this pos contains any galaxies?
      dist += (2*gxies-ngxies + arr[i]) // Calculate distance from all these
            * (u64)arr[i]*d;            // gxies to all the gxies we have seen
      gxies += arr[i];                  // before, and then move all those gxies
    } else d += 999999;                 // to column 0. Then account for the
                                        // distance "penalty" we added my moving
  return dist;                          // all those galaxies.
}

int main() {
  FILE *f = fopen("in11.txt", "r");
  int frow = INT_MAX, lrow = 0;         // Track first and last galaxy pos each
  int fcol = INT_MAX, lcol = 0;         // row and column. r: current row, n:
  int r = 0, n = 0;                     // total number of galaxies.
  while(fgets(line, sizeof(line), f)) {
    for(int i = 0; line[i]; ++i)
      if (line[i] == '#') {             // Find all galaxies on current row
        ++n;                            // Increment total number seen galaxies
        ++col[i];                       // Project galaxy to 1D column array
        ++row[r];                       // Project galaxy to 1D row array
        if      (i > lcol) lcol = i;    // Update first and last column extents
        else if (i < fcol) fcol = i;
      }
    if (row[r]) {                       // Also update first and last row
      lrow = r;                         // extents.
      if (r < frow) frow = r;
    }
    ++r;                                // Advancing to the next row.
  }
  fclose(f);
  printf("%llu\n",                      // Count total distance by separating
      scan(row+frow, lrow-frow, n)      // out into two projected 1D distances.
    + scan(col+fcol, lcol-fcol, n));
}

Complexity Analysis

In main(), the code loops over all cells of the input. This takes ฮ˜(W*H) == ฮ˜(C) time, where W is width of the map, H is the height of the map, or equivalently, C is the number of cells in the map.

Then it calls the scan() function, which takes a linear number of steps to the length of the 1D array that was passed. So ฮ˜(W) and ฮ˜(H). The total runtime is therefore just ฮ˜(C), or colloquially, "linear". Note that in particular this runtime does not depend on the number of galaxies on the map at all! Even if every single cell was filled with a galaxy, the runtime would be the same.

What Is On Video?

In the morning I was able to first derive the general idea for this algorithm. I then tried to implement this arithmetic correctly, but ran into a lot of indexing and other logical problems, and did not yet realize a key multiplicative part of the formula to get the correct combinatorics in. During the lunch break I came back to try to implement this calculation, but ran into an integer overflow issue and submitted the wrong answer.

Later off stream, I was finally able to fix the issue and got the answer correct.

Second part amounted then to only changing a ++d; into a d += 999999;. All this work solved for both parts in one go.

The C64 has no trouble running through this code as it is super light weight and linear time and space. Felt pretty pleased with myself to have avoided feeding the C64 a quadratic time computation here.

Time to solve: 1 hours 50 minutes on video + 20 minutes offline.

Mistakes made:

To this point the C64 has kept up with all of the puzzles, earning the full 22 stars possible so far. However, this all feels like a calm before the storm... [1], [2]

Day 12

Half way to Christmas Eve. And quickly getting closer to the more computationally intensive problems.


AI Art - Reddit u/encse

In today's problem, we needed to basically implement a one-dimensional Picross solver.

##.??..???.### (2,1,2,3)
"How many solutions does this 1D Picross puzzle have?"

I was particularly pleased about this day. Not because I was able to solve both parts, but because before getting to the solution, I found myself in a deep debugging scenario with the recursion logic, and was able to debug it natively on the C64, without having to give up to troubleshoot the issue on the PC like has happened a few times now.

Surviving adversity by sticking with the C64.. So feelgood.

Today's puzzle screamed out for recursive memoization. Combinatorial string matching and generation problems tend to go that way.

However the recursion part in this problem is computationally super intensive (exponential complexity), and memoization caches are needed to fix the exponential behavior, which on the other hand tend to be hungry on the RAM.

Fortunately the problem instances were quite small, so turns out that in Part One, the memoization cache needed to be only 512 bytes tiny! And in Part Two, the size had to be increased only modestly to 20,480 bytes (on video in my first solution I used a bigger cache size of 32,768 bytes, such a big spender).

So memory-wise, the C64 says no problemo!

In the first part of the problem, I was going directly for memoization, and was taken by surprise by how few hits my memoization cache got at first. Computing the solution for the example input was slower with the cache than without. This smelled like a bug to me, but then glancing at the example inputs, they didn't have much branching in them at all, so this kind of made sense.

On video the first working code for Part Two took the C64 16 minutes and 40.8 seconds to compute the result. Had to re-submit the answer once, because of course the individual problem instances got so large that they overflowed a 32-bit integer.

Optimizations

Later in the evening, I wanted to give a go at optimizing the solution. One thing that worked well was to change 64-bit integers to just 40-bit integers. Yes, the C64 can have 40-bit integers. It is an 8-bit system, so 32-bits or 64-bits don't really have any meaning to it, so any byte multiples of integers can be formed, by using the C23 _BitInt(x) type that Clang readily supports.

Pause to think of that for a second: how fantastic is it that a computer released in 1982 is able to fully leverage a new development in a programming language standard published in April 2023, 41 years later! And not only that, but also have that standardized feature with the full performance capabilities that the native C64 CPU instruction set can offer - no tradeoffs or sacrifices. Arithmetic on the _BitInt(40) type compiles down to follow exactly the assembly that you'd hand write manually to add or subtract two 5-byte integers on the 6502. Just absolutely mind-bending.

And that same code will compile portably, unmodified, to run on my x64 system as well. So it's not like my C64-targeting code would be obsolete and have to be rewritten to bring it to the modern computers. So wickedly fantastic!

Another optimization that worked well to cut the execution time was to add a little bit of global lookahead information into the recursion.

Whenever a '?' character is seen, the code recurses to the two possible choices, the "don't root a spring sequence here" path and the "do place a spring sequence here" path. Skipping lots of ?s tends to lead to dead-ends, because there will still be a lot of springs left to place, but the solver doesn't yet know that it won't possibly be able to fit all of them in the remaining space (due to having lazily skipped so many of the ?s before).

So by propagating global information to the intermediate stages of the recursion to answer "how many springs can even fit anymore?", the search can terminate early when it realizes that it has skipped too many ?s to satisfy the amount of space that the remaining springs need.

This global lookahead cut-off reduced 33.22% of the calls to the recursive num_ways function from the search, shortening the execution time roughly by as much. In the final code, the arrays blk_left and max_left implement this global lookahead state.

I also experimented with a symmetric min_left min bound for the global cut-off test, but that had very minimal effect to pruning state on the given inputs, so left it out from the final code.

The final code looks like follows.

Day 12 Part 2 Code Listing

#include 
char s[128];                            // s: the current input string
u8 n[32], blk_left[32], max_left[128];  // n: lengths of broken input springs
u40 memo[4096];                         // 20,480 bytes of memoization cache

bool fits(u8 i, u8 k) {                 // Measures if a spring sequence of
  for(u8 num = n[k]; num--; ++i)        // length n[k] fits at position
    if (s[i] != '#' && s[i] != '?')     // s[i]...s[i+n[k]]
      return false;
  return s[i] != '#';                   // Must be delimited at end of sequence
}

u40 NW(u8 i, u8 k);                     // Memoized front of num_ways()

u40 num_ways(u8 i, u8 k) {              // Calc # of ways to fit blocks to input
  if (blk_left[k]>max_left[i]) return 0;// Global cut-off lookahead
  while(s[i] == '.') ++i;               // Skip past '.'s without recursing to
                                        // avoid stack overflow on the C64
  if (s[i] == '#')                      // If we see a '#', there is only one
    return fits(i,k)?NW(i+n[k]+1,k+1):0;// linear choice, either fits or won't.

  if (s[i] == '?') {                    // In case of '?', we have two choices:
    if (n[k] && fits(i, k))             // either root the next seq. at here, or
      return NW(i+n[k]+1, k+1)          // leave this ? empty. Recurse into both
           + NW(i+1,      k);           // choices.
    else return NW(i+1,k);              // If we can't root here, must skip this
  }
  return !n[k];                         // If at end of input string, must also
                                        // be at the end of the group array.
}

u40 NW(u8 i, u8 k) {                    // Same as num_ways(), but first gets
  u16 key = ((u16)i << 5) | k;          // the computation result from memo
  if (memo[key] == (u40)-1ll)           // cache, if it exists there.
    memo[key] = num_ways(i, k);
  return memo[key];
}

void dup5(char *s, u8 len) {            // Duplicates given string four more
  for(u8 i = 0; i < len; ++i)           // times in memory
    for(u8 j = 1; j < 5; ++j)
      s[j*len+i] = s[i];
  s[5*len] = 0;                         // Null terminate with a sentinel
}

int main() {
  BENCH();
  FILE *f = fopen("in12.txt", "r");
  u64 total = 0;
  while(fscanf(f, "%s", s) == 1) {      // Read input lines one at a time
    int len = strlen(s), num_blocks = 0;
    s[len++] = '?';                     // Join input 5 times with '?' between
    dup5(s,len);                        // Dup string to 5-fold
    s[5*len-1] = 0;                     // Remove last written '?' char

    max_left[5*len-1]=max_left[5*len]=0;// Fill in max_left array, which tells
    for(int i = 5*len-2; i >= 0; --i)   // for each index i of input string, how
      max_left[i] = max_left[i+1]       // many blocks can still possibly fit
              + (s[i]=='#'||s[i]=='?'); // to the rest of the string

    while(fscanf(f, "%d,", &len) == 1)  // Read input spring sequence lengths
      n[num_blocks++] = len;

    dup5((char*)n, num_blocks);         // Dup input spring lengths five-fold.
    blk_left[5*num_blocks] = 0;         
    for(int i=5*num_blocks-1;i>=0;--i){ // Fill blk_left array, which tells how
      blk_left[i] = blk_left[i+1]+n[i]; // many bad springs still need to fit.
    }
    memset(memo, 0xFF, sizeof(memo));   // Clear memo cache from previous run.
    total += NW(0, 0);                  // And calculate the result.
  }
  fclose(f);
  printf("Arrangements: %llu\n", total);
}

A further optimization to this code would be to turn the recursive function call into iteration over a state stack or into dynamic programming built-up solution, to avoid the function call overhead. However the compactness of recursive memoization is so nice, so I left the code as-is for posterity.

This optimized version clocks 9 minutes 14.6 seconds on the C64, and 2.58 milliseconds on my AMD Ryzen 5950X. AMD might be a little show-off here, but great numbers on both!

Time to solve: 2 hours 59 minutes.

Mistakes made:

Day 13

Day 13 tips the half-way point. This December has gone past really quickly.

In today's problem, the task is to find a horizontal or a vertical split in a given 2D map so that the two sides of the split form exact mirror images of each other (up to the extent that is visible on both sides).

Instead of doing a search of the mirror images on a 2D grid of characters one by one, I formed binary numbers to represent all the numbers in each row and column. The maximum size of the maps was 18 wide by 11 tall, so I could utilize 24-bit integers (another nice use of C23 _BitInt(), see yesterday's post) to store these binary "bitmap" values.

This reduced the problem into scanning two 1D integer arrays, quite resembling Day 11. When later browsing the community solutions in the subreddit, I found this was a popular approach to take.

For the code itself that searches for the mirror positions, I just wrote a naive loop, technically a O(N^2) solution, but given that the input was practically random and not adversarial, the complexity was effectively linear time due to the ability to early out in the search (the return false; block in the function mirrors() in final code).

I later learned that Manacher's algorithm would solve this in O(N) time, although given the shape of the problem inputs, I think this algorithm would likely be slower in clocking msecs.

Solving the first part felt so easy and straightforward, that given the mirroring theme, I was dreading that the second part would be some kind of variation of the pancake flipping problem, around which a lot of NP-hard questions can be posed. If so, that would surely mean an end to the C64's so far stellar performance to solve these puzzles.

Fortunately that was not the case, but in Part Two, the task was to flip exactly one cell of the map to uncover a new axis of mirroring that was not the same as the first one.

The initial decision to go for a bitmap representation helped guide the thinking here, and after a little bit of pondering, I realized that the second part could be solved using the same search algorithm, by adding a single xor and a popcnt instruction extra.

Since the C64 does not actually have a popcnt instruction, the lovingly cryptically packed n_smudges function is used as a stand-in.

Day 13 Part 2 Code Listing

#include 
u24 x[40], *y = x+20;                   // Project maps to X and Y axes

u8 n_smudges(u24 x) {                   // Returns the number of bits set
  return x ? 1 + !!(x & (x-1)) : 0;     // (the number of smudges) in the given
}                                       // input. C64 has no popcnt() so do a
                                        // manual calc. of 0, 1 or >=2 bits set.
bool mirrors(u24 *arr, u8 len, u8 i) {  // Returns true if input array mirrors
  u8 s = 0;                             // around index ... i | i+1 ...
  for(u8 j=0; i>=j && i+1+j<len; ++j) { // Walk index j backwards from mirror 
    s +=n_smudges(arr[i-j]^arr[i+1+j]); // center, and calculate # of smudges
    if (s > 1) return false;            // we have accumulated. If > 1, we can
  }                                     // early out as this pos cannot mirror.
  return s == 1;                        // Mirror with exactly one smudge found?
}

int solve(u8 w, u8 h) {                 // Find mirror score for given input.
  for(u8 i = 0; i < w-1; ++i)           // Test horizontal splits.
    if (mirrors(x,w,i)) return i+1;
  for(u8 i = 0;; ++i)                   // Test vertical splits.
    if (mirrors(y,h,i))return 100*(i+1);
}

int main() {
  BENCH();                              // Print time taken at the end
  FILE *f = fopen("in13.txt", "r");
  u8 w, h = 0;                          // w,h: Receives width and height of map
  u32 sum = 0;                          // sum: Accumulates total score
  char l[20];                           // l: Receives input line by line
  while(fgets(l, sizeof(l), f)) {
    if (isspace(l[0])) {                // Cur. line is empty? -> solve map
      sum += solve(w, h);               // Find the mirror pos with smudge
      w = h = 0;                        // Reset height to zero for next input
      memset(x, 0, sizeof(x));          // And also clear projection arrays.
    } else {
      if (!w) w = strlen(l)-1;          // Calc width for current map
      for(u8 i = 0; i < w; ++i) {       // Project map to X and Y axes, building
        x[i] = (x[i]<<1) | (l[i]=='#'); // bitmask values for each cell of the
        y[h] = (y[h]<<1) | (l[i]=='#'); // projected 1D arrays
      }
      ++h;                              // Mark that we've seen one more line of
    }                                   // the input.
  }
  sum += solve(w, h);                   // Solve the last leftover map that did
  fclose(f);                            // not have a trailing newline.
  printf("Result: %lu\n", sum);
}

Then the C64 flew through this computation with no trouble.

Runtime on Commodore 64: 51.3 seconds.

Runtime on AMD Ryzen 5950X: 0.264 milliseconds. (194,318x faster than the C64)

Loved-it or hated-it?

In today's puzzle, there was an interesting subtle gotcha that was planted. In Part Two after flipping a cell to uncover the smudged mirroring, it could then happen that both a smudged and an unsmudged mirroring would then exist on the resulting map. As the task was posed as "fix the smudge and find the different line of reflection", the developer was supposed to ignore the original unsmudged reflection in the search, if that still existed.

This brought on a number of complaints about bad problem statement, resembling the objections on Day 1, where some people complained that having lines like "threeightwo" in actual input but not showcased in the example input was unfair.

While solving today's puzzle, I too tripped up straight into this "two mirrorings" trap at first (I had not registered this possibility), but fortunately in today's puzzle example inputs, this special case was presented on display as the very first example, so I fixed it almost as a passing thought. Only later when watching back the recording (at 1h 01min mark) understood the meaning of what I really had fixed there.

It would have been a far more insidious challenge to solve if this puzzle had gone the way of Day 1, and the special case had been hidden in the midst of the actual inputs only - although that would certainly have caused an uproar and split the opinions into the "more-of-these" and "boo-that-sucked" camps.

Time to solve: 1 hour 15 minutes.

Mistakes made:

The Commodore 64 has now solved more than half of all the puzzles! Not bad for a 1MHz and 64KB machine.

Day 14

Oh boy, what a mess did I get myself into today.

In today's problem the task is to roll rocks (marbles) on a 2D grid. Part One was a simple warm-up round to simulate just one step of the actual problem. After reading the description, I though I'd be done in no time and first part would be a breeze.

But then I hand-calculated my input board dimensions wrong, and wrote a silly array size bug for the 2D map that I was simulating. What resulted were two wrong submissions for the first part, and full 40 minutes of debugging total nonsense and misblaming my shoddily put up libc (which admittably have had many bugs in the past days), before realizing that my map size was incorrect all that time.

After finally fixing that silly mistake, I was rewarded with the first star. That star was really hard to earn today.


Reddit u/encse

In Part Two, the programmer was asked to roll the grid in four directions a total of one billion(!) times.

The mathematical property behind this kind of puzzle is fairly enjoyable as far as programming puzzles go, but there was one gotcha: I have seen this trick already. So that unfortunately diminished the puzzle a little bit and made me wish that I hadn't done Advent of Code in 2022 so that I could have met this puzzle fresh. I recall that when I saw this trick last year, it was a decent fun exploration to do for the first time.

Nevertheless, there was still some fun to be had in today's puzzle.

The trick here is that while simulating one billion board rotations would take infeasibly long even for a PC, the idea is to recognize that after a while, the repeated rotations end up taking the state space into a cycle that repeats. (I actually first thought that in this instance there would have been just a straight up stable fixed point, but that was not the case)

My strategy on the C64 at first was to go fishing for the answer, and just make the C64 print out the board score values, until I could maybe visually spot a cycle (and then separately calculate the end value for the cycle).

I left the C64 calculate board states for ~50 minutes on video (yay the content), and now looking back, it was able to finish about 373 board rotations in that time. (avg. ~8 seconds per rotation) I couldn't see any repeating pattern in the results, but then again, the C64 screen only could fit 25 lines. So either the cycle would have to be longer, or would have to start only after 373 rotations.

While the C64 was busy at work, I had a shower, ate lunch, and then copped out and ran the same code on the PC to identify the cycle.

Turns out that on my input, the cycle started after iteration 96, and the length of the cycle was 78 board rotations long. So the C64 had essentially completed the full computation, but naturally I couldn't see the cycle since it was too long.

Now I wish I would have made the C64 print out the computation results on the dot matrix printer, it would have been awesome to identify the cycle by reading the paper printout!

Actually Solving Part Two On The C64

The challenge to programmatically detecting the cycle is that one will need to capture full board states in memory in order to observe when the same state is repeated.

However in order to completely accurately do that, one would have to store a sequence of all those states in RAM. The board in today's puzzle was 100x100 units in size, so representing the board will require 10000 bits = 1,250 bytes. To find the cycle, one would need to store 96+78 = 174 of these in memory, so that would be 217.5 KB of RAM; not an option for the C64. (well, the REU could do that, but I didn't feel like using it today)

So instead, I decided to hash the board states into a smaller digests, and play with the hash until I'd have one that would not produce any collisions on my problem input. Well, interestingly just a 16-bit hash was enough. So I wrote a small 256 element hash table with a 16-bit digest key of the board position, and now the C64 could also run the code to identify the cycle.

However, board simulation was still slow. After some pondering, I realized that it would be possible to optimize the process of rolling the boulders by keeping track of the "landscape" or "skyline" of the boulders while rolling them. This way the destination location for each rolled boulder could be updated in O(1) constant time, rather than needing to walk each boulder one cell at a time until stopping.

This was a huge speedup, now the C64 takes only about 4.4 seconds per full board rotation. With that optimization, the C64 was back in the game, and the final code for Part Two looks like follows.

Day 14 Part 2 Code Listing

#include 
#define AT(x,y) map[(((u16)(y))<<7)|(x)]// Returns the char at (x,y) of map
#define S 128                           // The stride of the map, pow2 for speed
char map[S*128];                        // Contains the simulated 2D map
u8 w, h, landscape[S];                  // w,h: map dimensions
                                        // landscape: Temp store to know in O(1)
void roll_up() {                        // time where rocks boundary is at.
  memset(landscape, 0, w);
  for(int y = 0; y < h; ++y)
    for(int x = 0; x < w; ++x)
      if (AT(x,y) == 'O') {
        AT(x,y) = '.';
        AT(x,landscape[x]++) = 'O';
      } else if (AT(x,y) == '#')
        landscape[x] = y+1;
}

void roll_left() {                      // Rolls rocks to the left.
  memset(landscape, 0, h);              // Clear the temp landscape.
  for(int y = 0; y < h; ++y)            // Move rocks in all (x,y)
    for(int x = 0; x < w; ++x)
      if (AT(x,y) == 'O') {             // If current char is a rock
        AT(x,y) = '.';                  // Remove the rock at its cur position
        AT(landscape[y]++,y) = 'O';     // And place it to the max extents
      } else if (AT(x,y) == '#')        // If we hit a wall,
        landscape[y] = x+1;             // update the landscape skyline.
}

void roll_down() {                      // Four separate copies of rolling to
  memset(landscape, h-1, w);            // each direction for good performance.
  for(int y = h-1; y >= 0; --y)
    for(int x = 0; x < w; ++x)
      if (AT(x,y) == 'O') {
        AT(x,y) = '.';
        AT(x,landscape[x]--) = 'O';
      } else if (AT(x,y) == '#')
        landscape[x] = y-1;
}

void roll_right() {
  memset(landscape, w-1, h);
  for(int y = 0; y < h; ++y)
    for(int x = w-1; x >= 0; --x)       // Gotcha: need to loop in the right
      if (AT(x,y) == 'O') {             // order so boulders are simulated to
        AT(x,y) = '.';                  // move all the way that they can.
        AT(landscape[y]--,y) = 'O';
      } else if (AT(x,y) == '#')
        landscape[y] = x-1;
}

u32 load() {                            // Calculates the total load of a map
  u32 load = 0;
  for(int y = 0; y < h; ++y)
    for(int x = 0; x < w; ++x)
      if (AT(x,y) == 'O')
        load += h-y;
  return load;
}

u16 hash() {                            // Calculates a hash value to idenfity
  u16 hash = 0;                         // this current board state uniquely.
  for(int y = 0; y < h; ++y)            // This hash is actually not safe, so
    for(int x = 0; x < w; ++x)          // we rely on getting lucky that there
      if (AT(x,y) == 'O')               // will be no hash collisions.
        hash = ((hash<<1) | (hash>>15))
             ^ ((((u32)(y))<<7)|(x));
  return hash;
}

struct {                                // 256-entry hash table that stores
  u16 h;                                // previously seen board states.
  u8 val;
} ht[256];

u8 find_or_insert(u16 hash, u8 val) {   // Tests if hash table contains this
  u8 i = (u8)hash;                      // board as previously seen.
  while(ht[i].h && ht[i].h != hash)     // Linear probe through the table to
    ++i;                                // find our board, or an empty slot.
  if (ht[i].h == hash) return ht[i].val;// If this board exists, return it.
  ht[i] = { hash, val };                // Otherwise insert this board to the
  return 0;                             // table, and return 0 to indicate "not
}                                       // found". (another assumption: cycle
                                        // can't start at index i=0, fair guess)
int main() {
  BENCH();
  FILE *f = fopen("in14.txt", "r");
  char *m = map;
  while(fgets(m, S, f)) m += S, ++h;    // Read input line by line to strided
  fclose(f);                            // map.
  w = strstr(map, "\r") - map;          // Calculate map width.
  
  bool found_cycle = false;
  for(u8 i = 0; i < 255;)               // Loop max 255 times to find the start
  {                                     // of the state cycle.
    roll_up();
    roll_left();
    roll_down();
    roll_right();
    ++i;
    if (!found_cycle) {
      u32 h = hash();                   // Uniquely hash this board state
      u8 prev = find_or_insert(h, i);   // Insert or find if this board has been
      if (prev) {                       // seen before.
        u64 left = 1000000000ull - i;   // If so, zoom our iteration count
        u8 cycle_length = i - prev;     // through all the multiples of cycle
        i = 255 - left%cycle_length;    // length. Update i to track how many
        printf("Cycle pos=%u,len=%u\n", // iterations we still need to reach
           prev, cycle_length);         // 1 billion nodes.
        found_cycle = true;             // No need to look for cycles in rest of
      }                                 // the loop.
    }
  }
  printf("Total load: %lu\n", load());
}

The code is quite verbose, because for good performance it was beneficial to keep four copies of the boulder rolling function, as one generalized rolling function would impact performance - the C64 only has three CPU registers, A, X and Y.

With this code, the runtimes were as follows:

Runtime on Commodore 64: 16 minutes 7.2 seconds.

Runtime on AMD Ryzen 5950X: 15.9 milliseconds. (60,830.19x faster than the C64)

What happened here to the AMD being 200,000x faster than the C64? Well, in today's puzzle the computation was not dominated by disk drive performance, but it was more computation heavy. So these numbers now better reflect just the CPU performance differences.

Time to solve: 2 hours 57 minutes on video + 15 mins offline.

Mistakes made:

Day 15

Ouch, today was yet another of those bugfests. The silver lining was that this occurred on an otherwise easy problem - debugging my issues today against a very hard problem might have been really tricky.

In today's puzzle, the developer is hand-held to implement a linked-list backed hash map. Well, that is pretty much what I've been doing already in previous days (although I tend to lean towards linear probing, as is seen in the code from earlier days), so solving this puzzle on the C64 was operating on the home turf.

In Part One, the task was to hash the input in ASCII encoding. The challenge I had here is that on the Commodore 64, the PETSCII encoding is natively used.

There are actually two different encodings, "Business PETSCII" and "Graphics PETSCII". I use the first one, since it is the only one that contains both uppercase and lowercase characters (the Graphics PETSCII charset only has uppercase letters).

Fortunately the input only contained letters a-z and '=' and '-', so writing this (partial) Business PETSCII->ASCII conversion was easy. The '=' and '-' characters coincide in encoding between PETSCII and ASCII, so only had to cover the a-z values.

But before even getting to implementing that charset conversion, I got into a really mysterious corruption issue with my Advent of Code client program, which started crashing on me when attempting to save the input to disk. After spending some 30 minutes battling this (which I still need to figure out later for good), I managed to get through the first part.


Reddit u/encse

In the second part, the puzzle description guided the programmer to implement a full hash map with linear linked list bucketized collision backing. This exercise comes straight from university course textbooks, e.g. the widely used Introduction to Algorithms. Cormen, Leiserson, Rivest, Stein, which is now going on its 4th Edition print since last year.

The task was straightforward from that background, but here I got again to visit a cryptic bug zoo, where it turned out that my libc implementation of strcmp() was incorrect. This took more than an hour to figure out. My libc is not unit tested, it is field tested! I'm pretty sure I must have been looking like that green goggles wearing reindeer when I wrote my libc.

Before realizing that my strcmp() was busted, I thought that I had a memory corruption somewhere in string termination. Knowing that, I did intentionally go fishing for a risky answer, but that didn't pan out, so had one missed guess in the second part.

Final code is quite clean. No mathematical tricks in today's puzzle, but a task to apply programming engineering skills.

Day 15 Part 2 Code Listing

#include 
char s[24000];

u8 bpet2asc(char c) {                   // On the C64, all input files are
  if ((c>=65&&c<=90)||(c>=97&&c<=122))  // in C64 "Business PETSCII" text
    return c ^ 0x20;                    // encoding. So convert A-Z to ASCII
  return c;                             // to be able to correctly hash them.
}

struct node {                           // Hashmap node.
  node *next;                           // Linked list next pointer.
  char *str;                            // The original string label (key)
  u8 val;                               // Focal length number of lens
} *box[256];

u8 hash(char *s) {
  u8 h = 0;
  while(*s) h = (h + bpet2asc(*s++))*17;
  return h;
}

void insert(char *s, u8 val) {
  node **n = &box[hash(s)];             // Indirectly access the target node
  for(;*n; n = &(*n)->next)             // pointer so that inserting below can
    if (!strcmp(s, (*n)->str)) {        // avoid code repetition.
      (*n)->val = val;                  // This key already exists, so update
      return;                           // its value.
    }
  *n = (node*)calloc(1, sizeof(node));  // Allocate new node and append it to
  (*n)->str = s;                        // the linked list.
  (*n)->val = val;
}

void erase(char *s) {
  u8 i = hash(s);
  node *n = box[i], *prev = 0;
  for(;n; prev = n, n = n->next)
    if (!strcmp(s, n->str)) {
      if (prev) prev->next = n->next;   // Remove from linked list
      else box[i] = n->next;
      free(n);
      return;
    }
}

int main() {
  BENCH();
  FILE *f = fopen("in15.txt", "r");
  fgets(s, sizeof(s), f);
  fclose(f);
  for(char *s0=s, *i = s; *i>32; ++i) {
    if (*i == '=') {                    // "foo=5" inserts an entry to the map
      *i = 0;                           // Delimit "foo" part to its own null
      insert(s0, i[1]-'0');             // terminated string (s0) and insert it.
      s0 = i+3;                         // Skip s0 to start of next string
      i += 2;                           // And current index likewise
    } else if (*i == '-') {             // "foo-" removes an entry from map
      *i = 0;
      erase(s0);
      s0 = i+2;
      ++i;
    }
  }
  u24 score = 0;
  for(u16 i = 1; i <= 256; ++i) {
    u16 slot = 1;
    for(node *n=box[i-1]; n; n=n->next)
      score += i * slot++ * n->val;
  }
  printf("Focus pow: %lu\n",(u32)score);
}

Time to solve: 2 hours 37 minutes.

Mistakes made:

Runtime on Commodore 64: 2 minutes 47.0 seconds.

Runtime on AMD Ryzen 5950X: 0.385 milliseconds. (433,766.23x faster than the C64)

The speed difference between the C64 and AMD was quite drastic today. Wow. Not quite sure why that is.

Reading the subreddit afterwards, I was surprised to see how the majority of solutions that people came up with used existing maps, dictionaries or hash tables of their respective languages, rather than implementing the hashmap that the puzzle asked for. Feels like Mr Wastl's attempt of a pedagogical point was well missed, and the path of least effort always goes its own way :)

Interlude: My Unoriginal Commodore 64 Games

Previously I showcased my original Commodore 64 games.

As one might expect, those weren't all the games that I had for the Commodore 64.

Additionally, I had lots of unoriginal C64 games on tapes. Yes, the "Arrrrr Shiver Me Timbers!" kinds of unoriginals, like so:

Many of these tapes would have come to us as part of the sale when we bought the C64 second hand (which was in 1989). By that time, I think cassette tapes had been practically obsoleted, as 5.25" ruled the C64, and people were already moving on to 3.5" disks as well, as they migrated over to Amiga and PC.

Here is one such tape:

Pitstop 2 was great, so was Lady Tut and Amaurote (ooh, Amaurote, I have a thing or two to say about that game at a later date maybe).

Snoopy, Mr. Robot, Tooth Invaders and Zaxxon were also fun ones. The other ones not so much so. (Pitfall 2 was amazing, but Pitfall 1 a bit meh)

For the first few years that I had the C64, I was not old enough to know how to copy tapes. (the one tape I do recall I have copied myself is that Alien Syndrome one from my friend who had the original game.. but I know I must have been in elementary school by then since it came from a classmate, so later than 1992)

Rather, almost all of this collection of tapes I bought from a local home appliances store(!)

Yeah, that's right. I remember there was this store in our local town (of about 8000 people) that sold TVs and washing machines, and right by the entrance, there was this big open box of pirated cassette tapes that you could buy maybe for 10 Finnish Marks (less than two euros) a piece.

The guy who ran the shop didn't look at what was on them, or if they even worked. He would buy any tapes for 5(?) Marks and then just dump them there, and then people could pick them up with a tenner.

So anyone could offload their tapes there, and kids like me might then buy them in turn.

This was open business in plain sight, one of the two or maybe three electronics appliances stores in the whole town. I get a feeling that it wasn't just my pre-school mind who struggled being educated about piracy, but the whole system!

Anyhow, that is what it was, so this collection of games that I have, I've mostly biked down to that store, and bought a tape or two on the weekend. Most of it was kind of crap, and then you'd try to sell that same tape back to the store. Lol, remembering about the whole thing just makes me facepalm-laugh so bad.

That's a small town in the middle of nowhere in Northern Finland for ya.. Arrrr Scallywags!

I would overwhelm the reader trying to list all of the great unoriginal C64 games that I had, but if I had to just pick three from my selection, these come to mind first off the top of my head:

To organize my C64 games collection, I had created this booklet.

Inside the booklet I would dedicate a page for each side of a tape, and write down the tape positions where those games would be loadable at.

So many childhood summers in the attic of our summer cabin, staring at the TV screen waiting for tapes to load.

Out of the above games, I remember Moby Dick would have been cool, and (Usagi Yojimbo) Samurai Warrior was absolutely fantastic. The Mission in Finland game was also a cool text adventure, I think mainly because it was in Finnish. (although still I barely understood any of it)

Looking back at this list of copied games, I really wish I could go back in time, and be a little bit older so I would have that purchasing power to see some of the store game shelves filled with C64 games, and bought some original tapes. That would be amazing.

Nowadays I occassionally visit Pelimies in Oulu to browse what's new, and chat with the owners, but the games industry is no longer quite the same. Games take too long to complete that it tends to be a sell-your-soul-and-time-to-the-devil kind of a deal to pick up playing a new game.

(Although rambling on at this.. it reminds me, I should probably go and grab a copy of Alan Wake 2 from Pelimies for the PS5..)

Isn't it kind of funny that back then, you played so many games but legally paid for so few.. and nowadays you buy (and re-buy) so many games that your digital library oozes them, and barely actually play any of them! I guess we're compensating now..

Anyhow, that's enough of a tangent for now.

Day 16

Today is another simulation puzzle day. After focusing all the light yesterday, all that power is shining into a cave that is melting rock into lava. Sounds epic.


Reddit u/jovani_lukino

In Part One of the puzzle, the task was to compute the path that the light takes through the 2D grid, with mirrors and light splitters placed inside it.

In the 2D map there are these "light splitters", that break the light in two separate directions. So this means that the path was not just a single linear route, but a queue or a stack data structure is needed to hold the branching points.

I decided at first to use the regular call stack for this and recurse. Since I had done a heavily recursive solution on Day 12 and it had worked out "just fine(TM)", I didn't pay too much attention to avoiding this today. This would later bite in Part Two.

Another effect of the light splitters that exist on the map was that the light could enter an infinite loop. So tracking which cells have been visited is necessary to avoid repeated cycling. On the solution presented in the video, I would use two bits per cell to mark horizontal and vertical visited cells.

Later in the evening I realized that only one bit per map square is actually necessary instead of two, and it simplified some bitmask handling in the code, shaving about half a minute of execution time on the Commodore 64. It is nice how such small optimizations can gain large yields on this system. :)

The day wasn't without bugs, as usual. I got bit by a somewhat spectacular editor crash (which I fortunately figured out the cause of), and my .cpp file saving seems to now be hanging again in the middle of a disk write. (I hypothesize this happens when the 5.25" disk gets nearly full)

Fortunately none of the issues were blocking and I was able to proceed past them.

Unsurprisingly, the first version of my simulation code calculated some things wrong for the example input.

A nice way to debug these types of simulation problems on the C64 is to point the 2D map data structure to reside straight in Video RAM, and the C64 will visualize the screen memory contents "for free", without any (other) code changes needed.

The resulting visualization won't be particularly pretty, but it is plenty enough to help follow the logic. In the above picture, the white characters result from the light traversing horizontally, and the purple characters correspond to light that has traveled vertically.

This helped me figure out the logic errors I had in my light walk code, and were rewarded the first star.

Day 16 Part 2 Code Listing

#include 
#define AT(x,y) ((((y)+1)<<7)|((x)+1))  // Calc. linear index in sentineled map
#define S 128                           // Use map of fixed stride of 128
#define PIPE 0x7D                       // '|' char in PETSCII not same in ASCII
char map[S*128];                        // map: the input grid, sentineled.
bool seen[S*128];                       // seen: marks already visited cells
int w, h, score = 0, max_score = 0;

struct {                                // Use a stack for DFS search
  u16 pos;                              // Linear start pos of search
  int dir;                              // Index delta (+1,-1,+S,-S) to walk to
} stack[1024], *sp = stack;             // walk. sp: DFS stack pointer

int mirror(int dir) {                   // Calculates the new direction for
  if      (dir ==  1) return -S;        // a ray when it hits a mirror.
  else if (dir == -1) return  S;
  else if (dir ==  S) return -1;
  else                return  1;
}

void shine(u16 pos, int dir) {          // Runs the light ray through its path.
  for(;;) {
    char ch = map[pos];                 // Check what type of char we are at?
    score += !seen[pos];                // Accumulate distance light traveled.
    if      (ch=='\\')dir=-mirror(dir); // Handle backslash mirror
    else if (ch=='/' )dir= mirror(dir); // Handle forward slash mirror
    else if (ch=='-' || ch==PIPE) {     // Handle light splitter
      if (seen[pos]) break;             // If we already visited here, abort
      u8 d = 1, d2 = S;                 // The two directions we might continue
      if (ch == '-') SWAP(d, d2);       // Horiz. splitter opposite from vert.
      if (dir == d || dir == -d) {      // Do we need to split the light ray?
        *sp++ = {(u16)(pos-d2),-d2};    // Remember one end of the ray in stack
        dir = d2;                       // and continue iterating the other end.
      }
    }
    seen[pos] = true;                   // Mark this cell visited
    if (ch <= 32) break;                // Abort at map edge sentinels (0 or \r)
    pos += dir;                         // Advance light ray to next square.
  }
}

void dfs(u8 x, u8 y, int dir) {         // Does a DFS search through the light
  u16 pos = AT(x, y);                   // path from an edge.
  for(int y = 0; y < h; ++y)            // Clear visited flags to start a new
    memset(seen+AT(0,y), 0, w);         // search. But do it by scanlines to not
  score = 0;                            // forget visited edges.
  stack[0] = { pos, dir};               // Add the light start pos to DFS
  for(sp = stack+1; sp-- > stack;)      // And process the DFS stack.
    shine(sp->pos, sp->dir);
  max_score = MAX(score, max_score);
}

int main() {
  BENCH();
  FILE *f = fopen("in16.txt", "r");
  char *d = map+S+1;                    // Read input file in memory into
  while(fgets(d, S, f)) ++h, d += S;    // sentinel border guarded 2D array
  fclose(f);                            // with a stride of fixed 128 bytes.
  w = strlen(map+S+1)-1;                // Calculate effective width.
  for(int y = 0; y < h; ++y) {          // Shine light from all edges. However,
    if (!seen[AT(-1,y)]) dfs(  0,y, 1); // if the current edge corresponds to a
    if (!seen[AT( w,y)]) dfs(w-1,y,-1); // direction where light previously came
  }                                     // out from, then we can safely skip
  for(int x = 0; x < w; ++x) {          // that edge, because shining in from
    if (!seen[AT(x,-1)]) dfs(x,  0, S); // that spot can never result in better
    if (!seen[AT(x, h)]) dfs(x,h-1,-S); // score that the opposite end gave.
  }
  printf("Lit squares: %u\n",max_score);
}

In the second part, the question was to figure out from which edge coordinate the light should enter into the 2D map to maximize the amount of cells lit in the map.

After a moment of thinking, I said out something like "wait, I can just brute force test every edge coordinate, the C64 should be plenty fast enough for that", and then realized how ridiculous that must have sounded.

It was just a small addition to the code to make it enumerate all possible light input locations, but after that change, unexpectedly the program started crashing on the big input, even though it had passed on the big input in part one, and correctly computed the example input.

Turns out that the native call stack indeed was not large enough for this recursion *for all light paths* in Part Two, even though it had been just enough to compute the light path in Part One. Well, figures.

I changed to a manual stack data structure to hold the light traversal state (see the stack variable in the code listing).

This worked well, and the C64 brute forced to the solution on video in 11 minutes and 6 seconds.

A Simple Optimization

While the C64 was churning through the computation naively, I was pondering how to improve the runtime.

I thought about assigning linear light paths to some kind of groups, and the light splitter would branch the light paths to child groups, but none of that quite works, since the overlap between groups is tedious to compute.

Then I realized that there is a far simpler way: when the light is shined in from one edge cell, and it comes out from the map from N exit cells (because of possible light splitters), all those exit cells can be immediately written off from later consideration: shining light in from any of those exit cells cannot produce a better score than the score that was produced by the initial input cell. (this property would trivially hold true if there were no light splitters, but curiously even with the light splitters, the same behavior holds)

So to optimize, in the final code above I mark edge cells with a visited flag whenever the light exits the map at any edge location. And later when considering input locations to shine in, skip all these already visited cells.

This doubled the total runtime speed for the C64 (by practically halving the number of light entrances to consider), so the final execution times are as follows:

Runtime on Commodore 64: 5 minutes 26.1 seconds.

Runtime on AMD Ryzen 5950X: 2.703 milliseconds. (120,643.73x faster than the C64)

Well, today was not quite the "brace yourself, the end is coming" day that I was anticipating yesterday.

Instead, it was an "aced both parts on the first answer" day, leaving a warm fuzzy feeling that I didn't need to resort to the PC today. Good job C64.

Day 17

Wow, today was quite the challenge. But, full spoiler, by 10pm in the evening, the C64 finally did it and computed the result to the question in Part Two. Aww yeah! ๐Ÿ˜Ž

Though let's wind back to the start and go through the day in order.


Reddit u/encse

In today's puzzle, the Elves are moving crucibles of molten lava through a city represented by a 2D map. The task is to find the shortest path from top left of the map to the bottom right of corner, according to the accumulated weights or costs in the map.

2413432311323
3215453535623
3255245654254
3446585845452
4546657867536
1438598798454
4457876987766
3637877979653
4654967986887
4564679986453
1224686865563
2546548887735
4322674655533

In the first part of the problem, the task is to find the shortest path through the city following a rule that the path can move only a maximum of three times in any one direction, before having to take a step in another direction.

In the second part of the problem, Elves are instead moving "ultra crucibles", so the task was changed so that the path would always have to move long jumps of at least four or more steps in a direction, but at most ten steps straight.

This is definitely a task that is calling for A* pathfinding.

Looking back at my notes and code after having solved the problem, I feel that I did capture the problem domain quite well. Here are my observations:

  1. The state space of the problem is not just a pair of (x,y) coordinates, but a tuple of (dir, len, x, y) information, where we need to track how many straight steps the search as done to arrive at any specific (x,y) coordinate.
  2. The problem state space is not Euclidean, so there is no guarantee that paths in the frontier set couldn't improve during the search. This means that either a DecreaseKey operation would be needed, or one will need to handle duplicate nodes in the open frontier. I decided to deal with duplicates in the open set to keep CPU usage more manageable. (although I did briefly look into how frequent DecreaseKey opportunities would be, and that didn't seem to be common at all)
  3. There are no negative weights on the map, so this means that the path search is a monotone problem.
  4. There is an obvious bound to the edge weights as the map consists of cells 1 - 9, so this is a bounded key problem.

Given the monotone bounded-key property, it is possible to use a O(1) constant time priority queue implementation, rather than needing to rely on a O(logN) min-heap or others. That is super appealing!

So I put together a very small monotone bucket queue implementation (and then botched a critical expression in it and spent a few dozen enjoyable minutes debugging it).

Then, running the implementation yet without a closed set, I experienced there were a lot of duplicates accumulating in the search. So I used whatever little bit of memory I had left to add a hash set for closed nodes. I was wishing that a "lossy" closed set would have been enough to track most of the duplication, so that older closed nodes could be forgotten as the search progresses. But unfortunately I had to realize that using a miniscule 1024 entry Closed set was not at all enough to achieve this.

So I gave up on the C64, and decided to solve the problem on the PC, and to port it back, because I had no idea how large the memory requirement would be. After solving on the PC, I could then examine how much memory was needed, and if there was any chance to get the open and closed sets small enough for the C64 to handle.

Elusive Logic Flaw

Switching over to the PC and adding an unbounded sized closed set, I got to realize that my program logic was flawed, and I was not getting the right answer path in Part One at all. Huh.

I tried everything. I verified my A* code, my open set, my closed set, my super-fancy and simple monotone bucket queue, my neighbor node generator. I doubted everything and anything in the code, switched into using my production-quality unit tested priority queue and A* solver, visualized the path (that looked very good by eyeballing) and more than two hours later, I was still getting the same wrong answer in my search. Oh boy, what a sinking feeling. How am I not getting this!

Then, by completely random chance, I realized what I had been doing wrong all this time.

I had initialized the first search start node with a default zero-initialized "direction=0,length=0" state, meaning that the first node has walked zero steps to, in this case, up direction which I had established to have a value of 0.

I had thought, "walking a distance of zero into any direction is a safe search start node, because with zero distance, it can't accumulate against the limit of 'max 3 steps allowed'."

But then, there was that other rule (that I had correctly implemented): crucibles cannot move backwards.

So my start node had started with a recorded state of having moved upwards (even if zero squares), meaning that it wasn't allowed to start the search by moving straight back downwards. Aaarrggh.

And as it happened, in the example input this flaw did not manifest, so I had been looking for all types of "problems at scale" on overflows and such on the big input. Oh dear.

Anyways, with that issue fixed, I had the first part then solved on the PC, and the second part was fortunately just a small modification to my code, and the second star came through promptly.

But How About The C64?

Indeed like mentioned above, by the end of the day I was running the puzzle on the C64 in a decent time. But boy did it take some work to get there.

Initially, looking at the size of the A* search space on the PC for Part Two, I got the following statistics:

Yikes, these are some pretty huge numbers of the C64, which only has 65,536 bytes of RAM. Out of this 64KB of RAM, the input map (141x141) took up 20,736 bytes alone. So almost half of the easily accessible RAM was already gone (I could squeeze the map to half the space if really necessary though by shaving bits off the map).

This huge size of the closed set was the big question. Almost a million nodes in the closed set. A single entry in the closed set consists of x:y:dir:len information for 8+8+2+4=22 bits ~= 3 bytes per entry in a hash table. That certainly won't fit in the C64 RAM.

I experimented with the idea of maintaining a kind of a "lossy" closed set, i.e. where only closed nodes that would be neighboring the active open nodes would be tracked. (I recall I had success with this approach in last year's AoC in the elephants in the cave problem) So a hash table that would be able to drop old entries by replacing them on a hash collision.

This did work to some extent, but it more than tripled the runtime on the PC already, and seemed to be very sensitive to the hashing function that was used. So I didn't feel too confident about this.

I decided to take the code instead to use the RAM Expansion Unit (REU). I utilized that already on Day 8, so I was already a bit familiar with it.

With some refactoring, the hash table to store the Closed set turned into a full bit table of 4MB in size. This is located in starting address of zero in the REU address space, and each bit represents one closed cell. Back in the day (in 1985), Commodore had REU cartridges only up to 512KB, so this solution is not exactly period correct. Though I bet the Elves won't care. :)

Even after moving the Closed set into the REU address space, I realized that the Open frontier set was too large, ~23K entries at a minimum of 7 bytes per entry, so ~161KB+ total. Well, into the REU the Open set went as well. (later I found an optimization that likely would have lifted this requirement - see next section - but I didn't bother to refactor the code back)

In no time the REU was split into two regions, lowest 4MB for the Closed set, and the RAM above that for the Open set. A handle-based mechanism then refers to the linked list nodes in the monotone bucket queue data structure. Since the problem is bounded-key, a simple 64K entry bump allocator is more than enough to cycle through allocating the needed number (~23K) of frontier nodes.

Search Space Cut-Off Optimization

There is one interesting (and effective!) algorithmic optimization that can still be employed to speed up the search.

In Part One, arriving to any cell with fewer linear steps "in history" is always superior to arriving to that cell with more steps. So for example, if we arrive at a certain (x,y) cell from the west with two linear steps to the east in our movement history, then if we've already searched a state that arrives to that cell with only one linear step to the east (and the same distance score), then this second arrival can be ignored: it just has worse options available to it and can't lead to a better result.

In Part Two, that heuristic cannot be used quite as easily because of the rule that an ultra crucible must move at least four squares in each direction before turning. But, with a small tweak to the rule, this cut-off rule can still be applied to reduce the search space.

The rule for ultra crucibles reads that if arriving at a (x,y) coordinate with four or more linear steps in the history, say k, then one can write off all future arrivals to the same cell with more than k linear steps from the same direction.

With this heuristic applied, the A* search space very efficiently prunes down:

This optimization effectively halved the total runtime on both the PC and the Commodore 64. Just what I needed!

(I also experimented with expanding this cut-off rule to apply in directionless fashion, but while that cut off a few nodes here and there, this seemed to be a net minus at least on my code structure)

With all the abovementioned changes put together, the final code looks like follows:

Day 17 Part 2 Code Listing

#include 
#include                         // Use RAM Expansion Unit for extra mem.
enum DIR { D,R,L,U };                   // Four directions we can walk in.
const DIR LEFT[4] = { R, U, D, L };     // Rotate dir left 90 degrees.
#define XDIR(d) ((d)==L?-1:(d)==R?1:0)  // Given a direction, returns X delta
#define YDIR(d) ((d)==U?-1:(d)==D?1:0)  // and Y delta vector, -1/0/+1.
#define OPEN_IDX(i) ((1ul<<22)|((u24)i \
                     * sizeof(node)))   // Split REU in two areas: closed|open
#define S 144                           // S: Stride of the map
struct node {                           // Open node, represents a yet to be
  u8 x,y,dir:2,len:4,dummy;             // searched frontier of the A* search.
  u16 dist, nexth;                      // nexth: Handle to next open node in
} reu;                                  // REU memory.
char map[S*144];                        // The input map weights
u8 w, h, P, visited;                    // Map width x height.P:bucket queue pos
u16 pq[16], node_alloc = 1;             // Monotone bounded-key bucket queue
                                        // for A*. The problem instance has
bool close(node *n, u8 len) {           // bounded-key edge weights 1-9, so 16
  u24 idx = n->x|((u16)n->y<<8)         // buckets is enough.
   | ((u24)n->dir<<16) | ((u24)len<<18);
  u8 bitmask = (1<<(h&7));
  reucpy(&visited, h>>3, 1, REU2RAM);   // Load closed bit from REU to RAM
  if ((visited&bitmask)!=0) return true;// Return if we've visited this node.
  visited   |= bitmask;                 // Otherwise mark this node closed, and
  reucpy(&visited, h>>3, 1, RAM2REU);   // upload the closed bit to REU, and
  return false;                         // tell caller this node isn't yet done.
}

bool close_worse(node *n) {             // Apply the directional optimization:
  if (close(n, n->len)) return true;    // close off all nodes as visited that
  if (n->len < 4) return false;         // have more linear steps in their
  for(u8 i = n->len + 1; i <= 10; ++i)  // history than the current one.
    if (close(n, i)) return false;
  return false;
}

void insert(node n) {                   // Insert node to A* open frontier in
  u8 i = (n.dist - n.x - n.y)&15;       // monotone bucket queue heap.
  n.nexth = pq[i];
  pq[i] = node_alloc++;                 // Increment bump allocator to frontier.
  if (node_alloc == 0) node_alloc = 1;  // Skip node idx 0 (invalid node index)
  reucpy(&n, OPEN_IDX(pq[i]),           // memory, and upload the new open node
          sizeof(node), RAM2REU);       // over to the REU.
}

node *extract() {                       // Returns and removes the next node
  while(!pq[P]) P = (P+1)&15;           // in the A* queue.
  reucpy(&reu, OPEN_IDX(pq[P]),         // Read the open node from REU memory.
         sizeof(node), REU2RAM);
  pq[P] = reu.nexth;                    // Remove the node from priority queue.
  return &reu;                          // And return the removed node.
}

void walk(node *n,u8 d,u8 l,u8 x,u8 y) {// Queue neighbor node to open frontier.
  if (!map[y*S+x]) return;              // Can't walk out of map bounds.
  insert({.x=x, .y=y, .dir=d, .len=l,   // Enqueue the neighbor to A* open set.
   .dist=(u16)(n->dist + map[y*S+x])}); 
}

int main() {
  BENCH();
  reu_init();
  for(int i = 0; i < 64; ++i)           // Clear first 4MB of REU memory to init
    reuset((u24)i << 16, 0, 0);         // the A* closed set.

  char *d = map+S+1;
  FILE *f = fopen("in17.txt", "r");
  while(fgets(d, S, f)) ++h, d += S;    // Read input sentineled
  fclose(f);
  w = strlen(map+S+1)-1;
  for(char &c : map) c=(c<'0')?0:c-'0'; // Convert weights from text to binary.
  insert({.x=1, .y=1, .dir = R});       // Add the initial A* start coordinates.
  insert({.x=1, .y=1, .dir = D});
  node *n;
  for(;;) {                             // A* pathfinding loop
    n = extract();                      // Get next node in the open frontier.
    if (!close_worse(n)) {              // Check if it is closed, and close it.
      if (n->x == w && n->y == h) break;// Are we at the end? If so, finished.
      i8 X=XDIR(n->dir), Y=YDIR(n->dir);// Convert heading to direction vector

      if (n->len < 10)                  // Ultra-crucible can walk straight for
        walk(n, n->dir, n->len+1,       // 10 steps.
             n->x+X, n->y+Y);           // Ultra-crucible can turn only after 4

      if (n->len >= 4) {                // straight walks.
        SWAP(X, Y); Y = -Y;             // Swap and flip to rotate vec left.
        u8 d = LEFT[n->dir];            // Rotate dir heading left as well.
        walk(n,  d, 1,n->x+X, n->y+Y);  // Turn left.
        walk(n,3-d, 1,n->x-X, n->y-Y);  // Turn right.
      }
    }
  }
  printf("Heat: %u\n", n->dist);        // Final distance from start to end.
}

In this code, reucpy() is like memcpy(), but it copies between the C64 RAM and the REU memory address space.

Likewise, reuset() is like memset(), but for filling memory on the REU.

I ran into another cryptic issue with this code, and had to lower the compiler optimization flag to just -O1. In -O2 or -Os the code would seemingly fail to read or write the REU, and I am not sure at all why that is. It might be my code doing something undefined, not at all sure yet.

But that means that for today's problem, I only used the -O1 compiler flag to build the code. Fortunately that was enough to get the C64 to charge to action. Here are the final runtimes:

Runtime on Commodore 64: 14 minutes and 53.3 seconds.

Runtime on AMD Ryzen 5950X: 14.046 milliseconds. (63,598.18x faster than the C64)

I am super impressed that the C64 was able to compute this so quickly, especially since in the morning, I was doubting that maybe this would be the day that the C64 might be out of the game.

Time to solve: 1 hours 49 minutes on video + 3 hours offline + maybe 8 hours porting to the Commodore 64. Yeah, this took my whole Sunday, but we're not leaving the C64 behind, not this close to Christmas! :')

Mistakes made:

Day 18

Today, the Elves are digging a huge swimming pool of lava. The task is, given a list of walking directions that carve the perimeter loop of the pool, to calculate the area of the swimming pool.

This smelled like a flood fill problem. However, Day 10 already had a flood fill, so at the very start of the day, this made me remark

"A flood fill problem? We already saw one of those, why do we have a second one of those? Oh now I know, bet Eric didn't intend Day 10 to be solved with a flood fill, but with a point-in-polygon test. So now we are flood filling again.

Well, that is boring. Now hold on, now I know what I can do. When I was reading the solutions for Day 10 puzzle, some solutions used Shoelace's formula to calculate the area of the rectilinear polygon. So maybe I'll use that formula now instead of doing a second flood fill.

[...] So if I calculated the corner points of the polygon, I could calculate the area of the polygon without needing to do a flood fill. Although the trouble is, that with these RGB points, I bet I will need to implement that flood fill for Part Two then.

[...] But it doesn't matter, I'll do [the formula] for Part One, and then implement the flood fill for Part Two."

Yeah, that was my first thinking. Man, that would have been a great plan, had I just followed that plan through!

Anyways, I set out to implement an area calculator using the trapezoidal rule (and not Shoelace's formula, which looks just needlessly more complex).

The difficulty here is that if straightforwardly calculated, the (x,y) vector coordinates produce a polygon that essentially passes through the midpoints of each ASCII character on the 2D map, instead of contouring the outer corners of the map.

You can see the difference between the midpoint coordinates and the outer contour coordinates in the above drawing. Directly summing up the x/y walking distances would result in calculating the midpoints.

But in order to be able to use the Trapezoidal formula to calculate the area of the whole swimming pool, I need the outer contour coordinates.

At first I thought this would have been easy to do by examining the winding directions (clockwise vs counter-clockwise) of consecutive edges of the polygon. So I started my coding to try to come up with a solution for that.

Except that I soon got to realize I was not quite able to make it work, and I didn't completely understand why.

After having worked on the coordinate generation code for some 40 minutes, and given that I felt it would be a dead-end for Part Two anyways given the RGB hint in the problem description, it made little sense to go into deep debugging mode with that code, so I abandoned it.

Then I switched tracks to implement that flood fill, that I would surely need for Part Two anyways.

Flood Fill Problems

Well, as it happens, I got to realize that in Part One the 2D map is quite large for the C64, 379x422 in size. This means I needed ~20KB of memory to store the flood fill state. After that, I had relatively little memory left to store the flood fill contour stack, and kept running out of memory when attempting to do the fill.

To fix this, I changed my implementation into operating on scanlines before branching out (effectively reinventing a form of span filling). This took a bit of debugging to get right, but in the end, it did fix memory usage for Part One, and now the C64 could do the fill with just a 1024 entry flood fill stack.

There was also a revelation somewhere in the middle that unlike the example, the actual problem input did not start at top-left corner of the map, but extended further up to y=-256 from the starting position. This was addressed by adding a little bit of an offset to the map coordinates.

Finally, I was able to run the fill code on the C64, and submitted a wrong answer that was too low. Darn.

At this point I was already at the two hour mark, so I decided to conclude the video, and after lunch debug the rest of the flood fill code on my PC, where I could actually visualize the whole filled map.

Well, as it happens, the fill code was all correct, just the count of filled squares had overflowed a 16-bit integer. Classic.

Rerunning the final code I ended up with on video, but with the overflow fixed, gave the first star.

Baited

In Part Two, the story was switched so that instead of the six digit hex character values on input lines representing some color values (that weren't used in Part One, but hinted to be RGB colors to be used in Part Two), it was revealed that the RGB was a lie, and these hex characters would instead encode new walking direction instructions.

The trick here is that these new instructions form a rectilinear polygon that is about a million times bigger than in Part One, so any flood fill would likely run out of memory and time even on modern PCs.

This problem statement was done deliberately to bait people into implementing a flood fill, so they wouldn't be able to build on the same code to solve the second part. Eric Wastl has mentioned that the reason he does this is to teach people that in the software industry, requirements always change on you. This is why Advent of Code puzzles have two parts, and sometimes the first part deliberately deceives the reader.

Fitting this pedagogical lesson into the game can feel a bit misplaced (and that hardly epitomizes the whole industry, but only a subset). So I can appreciate how this kind of misdirection might rub some people the wrong way. However it is still an added twist purely for gameplay effect, and basically gives two puzzles to do in one day.

Nice or not, I got to throw the newly written memory efficient span-based flood filler from first part in the bin, and revert back to working on the trapezoidal rule code that I originally had started with. Now I just needed to figure out what was wrong with my contour walking code.

After looking at it by drawing a couple of different scenarios, I realized that my original approach went wrong because I was only looking at the winding direction of the next corner to decide how far to walk in the current segment. Instead, the correct approach is to look at the winding directions of both current and the next corner to determine the correct walk distance.

Accounting for this correctly, it enables a very compact loop that walks the outer contour coordinates directly. Final solution looks as follows.

Day 18 Part 2 Code Listing

#include 

int read(FILE *f, char *d, i32 *len) {  // Reads one line of input. Output dir
  int l, ret=fscanf(f, " %c %d (#%lx)", // in d and walk length in len.
                    d, &l, len);        // Parse from hex #value, doing dummy
  *d = *len & 3;                        // reads into d and l at first.
  *len >>= 4;                           // Parse length and direction from the
  return ret;                           // hex value.
}

int main() {
  FILE *f = fopen("in18.txt", "r");
  char dir, ndir, cw = true;            // Direction for cur and next segment
  i32 len, nlen, y = 0;                 // Length of cur and next segment
  i48 area = 0;
  read(f, &dir, &len);                  // Read first line for current state
  while(read(f, &ndir, &nlen) > 0) {    // and keep reading the next lines.
    bool ncw = (ndir == ((dir+1)&3));   // Next segment turns CW from previous?
    len += ncw + cw - 1;                // Fix up polygon exterior distance.
    if ((dir&1)) y += dir<2?len:-len;   // up/down: Accumulate y coordinate.
    else area +=(i48)(dir>1?len:-len)*y;// l/r: Sum signed trapezoid segments.
    cw=ncw, dir=ndir, len=nlen;         // Retire next state to current state.
  }
  fclose(f);
  printf("Area: %lld\n", area);
}

In this code, the while loop keeps track of the winding direction at the current polygon corner (cw), and the winding direction at the next polygon corner we are traveling to (ncw).

The walking distance len from current corner to the next is "fixed up" at each step based on these ncw and cw states to convert the length of each polygon edge from midpoint coordinates to the outer contour coordinates.

The general rule is this:

In short, the outer contour distance is equal to the midpoint distance + (number of clockwise rotations in cur and next corner) - 1. This is realized by the line len += ncw + cw - 1; above.


Visualization of trapezoidal formula, Wikipedia

With these outer contour coordinates at hand, computing the area of the swimming pool with the Trapezoidal formula can then be done at a cost of only 0.5 multiplications on average inside the inner loop. Pretty lean!

What About That "Pick+Shoelace"?

After solving the problem, looking at the subreddit, I find most solutions first used Shoelace's formula to calculate the area of the polygon formed by the midpoints directly, and then they calculated the length of the polygon perimeter.

Then Pick's theorem provides an identity that connects the surface area of a polygon to the number of integer points on the interior and the boundary of that polygon.

I found this repository code to be a good compact representation of how that strategy works.

The thing is, even after examining this for a while, I can't wrap my head around why that formula can be applied in this situation. Somehow Pick's identity allows us to calculate the area of the outer contour polygon if we only know the area of the smaller midpoint polygon and the number of integer points on the perimeter of the smaller midpoint polygon.

But I am not quite sure at all how that follows, even after reading this nice proof of Pick's theorem. I'll have to think about that some more. (Update: eventually I got it: the theorem is applied to the midpoint polygon and not the exterior polygon.. a neat theorem indeed)

In any case, I like my approach better (isn't that what we all thought when solving AoC problems, even when our code was more convoluted? :D). Tracking the winding directions to generate the outer contours is a bit more complex, but I was able to make it a fast one-pass run through the input, and with just one multiplication per every second input line needed.

In other words, performance is a breeze for the C64. Which gets us to the runtimes.

Runtime on Commodore 64: 39.3 seconds.

Runtime on AMD Ryzen 5950X: 0.481 milliseconds. (81,704.78x faster than the C64)

Mistakes made:

That's all for today, until tomorrow!

Day 19

We are getting to the home stretch of this year's Advent of Code. Only a handful of puzzles to go.

In today's puzzle, we are processing machine parts through a validation "workflow" graph. Each part contains an ID (four numbers), and at each node of the graph, these part numbers are compared to find which edge of the graph the part should follow up to. A Finite State Machine (FSM) traversal problem.

In today's problem, parsing the input played a major part. Looking at the video, it took me 1 hour and 5 minutes to finish implementing the parsing. Then, implementing the validation logic to solve Part One took only 20 minutes. Surprisingly, even though the code was about 130 lines long, I only had one small mistake in it (that hung my C64 in an infinite loop), and after fixing that, I got the first part correct directly on the C64 on the first try.

In Part Two, the challenge was modified so that instead of matching specific (given) part numbers through the validation logic, the puzzle was to calculate all the possible validation numbers that would be accepted by the validation logic flow.

If yesterday's problem and Day 10 were roughly the same problem, then today's problem was quite similar to Day 5, where we were also asked to traverse ranges of IDs through a data structure, clipping the ranges into parts as necessary. The general logic from Day 5 repeated, and my mental model for writing today's solution followed the idea of processing ranges from that day.

A Hint From Chat

The difficulty here, I thought, was, what if the FSM graph had cycles in it, or if it was a DAG?

I was starting to think about some kind of merge and duplication tracking process where I'd be able to filter out duplicate copies of my packet ranges flowing into a single node.

At this point a viewer revealed that the input would be a proper tree, so no cycles or DAGs. That eradicated many of the contingencies that I was worrying about, and I decided to do a naive step-by-step processing of the ranges in a fashion that assumed that property.

However, I was still uncertain whether there might be a way for subsequent validation rules to eventually lead to accepting ranges that could overlap in the final Accept nodes. But I decided to not worry about that either for now, and see how a solution would look like without addressing any overlaps.

Developing the code took about half an hour, and to my surprise produced the right answer for the example input. Then, just like that, running the code on the actual input popped out the correct answer immediately for Part Two.

I was a bit surprised by how easy it was. No hard bugs, deep debugging or missed logic (and only relatively few horrible crashes today overall).

Thinking about the overlap scenario I was concerned about, I now realize that is not really a possibility, since each packet range is split into proper disjoint subranges at each step. Even if the input was a DAG or had cycles, the code would work just fine. (Any cycles would effectively have to be inert, since otherwise it would mean that some part numbers would not have a well-defined Accept or Reject result)

Sometimes it helps to not try to take everything into account at once, as it might turn out you don't need to deal with some possibilities at all.

Afterwards I wondered what might have happened if I had not gotten that hint to help me simplify. Would I have tried to write a more complicated solution immediately? Or would I still have written the simple assuming solution first? I don't really know.

Day 19 Part 2 Code Listing

#include 
struct wf;                              // wf: "workflow"
struct rule {                           // Transition rule for the FSM
  u8 i, cmp;                            // cmp: 0=invalid, or '<' or '>'
  u16 limit;                            // The limit that is compared against.
  wf *dst;                              // Destination FSM node for this edge.
};
struct wf {                             // Workflow node in the FSM.
  char name[4];                         // String name, max three chars.
  rule r[3];                            // Max three comparison rules per node.
  wf *end;                              // The last unconditional target node.
} wfs[1024], *S, *A, *R;                // Hash table, and Start, Accept and
                                        // Reject nodes.
wf *get(const char *n) {                // Adds or gets a workflow node by name.
  u16 h = (n[0]^((u16)n[1]<<1)          // Hash the three chars into 10bit hash.
        ^ ((u16)(n[1]?n[2]:0)<<2));
  for(; wfs[h].name[0]; h=(h+1)&1023)   // Linear probing to find the right slot
    if (!strcmp(wfs[h].name, n))        // in the table.
      return &wfs[h];
  strcpy(wfs[h].name, n);               // Add new node with this name to table.
  return &wfs[h];
}

wf *getnam(char *s, char **n) {         // Parse and get workflow node from
  char *e = s;                          // input line s. Output continuation of
  while(isalpha(*e)) ++e;               // input line in n. Search name chars on
  *e = 0;                               // input line, and break it with a null
  *n = e+1;                             // terminator, then search the workflow
  return get(s);                        // hash table for existing, or add new.
}

void parse_wf(char *s) {                // Parse a workflow from input line.
  char *e;
  wf *w = getnam(s, &s);                // Get or create current node by name.
  int i = 0;
  while(s[1] == '<' || s[1] == '>') {   // Parse the comparison edges.
    rule &r = w->r[i++];
    r.i=*s=='x'?0:*s=='m'?1:*s=='a'?2:3;// Convert x/m/a/s to indices 0-3.
    r.cmp = s[1];                       // Compare rule, '<' or '>'
    r.limit = atoi(s+2);                // Parse cmp limit compared against.
    r.dst = getnam(strstr(s,":")+1, &s);// And the destination wf node.
  }
  w->end = getnam(s, &e);               // Finally parse unconditional end node.
}

struct range {u16 min,max;} r={1,4000}; // min-max range, and the default range.
struct packet {                         // A range bundle through the FSM.
  range x[4];                           // Four ranges for X/M/A/S each.
  wf *f;                                // The current node this packet is at.
} stack[512], *sp = stack;

void propagate(packet &p) {             // Clips the ranges of a single packet
  for(int i = 0; i < 4; ++i) {          // and propagates it to neighbor nodes.
    if (i == 3 || !p.f->r[i].cmp) {     // Are we at end of rule list?
      p.f = p.f->end;                   // If so, remaining packet goes to the 
      *sp++ = p;                        // last unconditional jump destination.
      return;
    }
    packet p2 = p;                      // Splice this packet to accept/reject.
    rule &r = p.f->r[i];
    range &x = p.x[r.i], &x2 =p2.x[r.i];
    if (r.cmp == '<') {                 // Compare this packet range against the
      x2.max = r.limit-1;               // < and > rules, then splice off a new
      x.min  = r.limit;                 // packet range off of the numbers that
    } else {                            // pass and fail the comparison.
      x2.min = r.limit+1;
      x.max  = r.limit;
    } 
    p2.f = r.dst;
    if (x2.min <= x2.max) *sp++ = p2;   // If pass range is not dead, queue it.
    if ( x.min  >  x.max) return;       // If fail range's not dead, add it too.
  }
}

u48 size(packet &p) {                   // Counts the total number of IDs in the
  u48 s = 1;                            // given packet range.
  for(range &r:p.x) s*= r.max+1-r.min;  // Multiply all range lengths together.
  return s;
}

int main() {
  BENCH();
  FILE *f = fopen("in19.txt", "r");
  char s[100];
  while(fgets(s,100, f)&&!isspace(s[0]))// Read input lines until a newline.
    parse_wf(s);                        // Parse input workflows into hashtable.
  fclose(f);
  
  S=get("in"); A=get("A"); R=get("R");  // Find special start and end nodes.
  *sp++ = { { r,r,r,r }, S};            // Initialize first default packetrange.
  u48 sum = 0;
  while(sp > stack) {                   // While packet ranges to process, get
    packet p = *--sp;                   // a range from stack.
    if      (p.f == A) sum += size(p);  // Add up size if range is Accepted, or
    else if (p.f != R) propagate(p);    // propagate the range while splicing it
  }                                     // if the range is not yet Rejected.
  printf("Sum: %llu\n", (u64)sum);
}

In any case, today was a lot of code, but fortunately ended up being somewhat painless. My biggest trouble for today was that the disk drive save routine is still crashing, and this caused loss of work a couple of times. Ouch, that bug is really proving to be a nasty one, but I just have not been able to figure out what causes it, or how to fix it!

The final solution totals at 98 lines of code, one of the longer ones so far.

The runtime here is undoubtedly dominated by the slow 1541-II disk drive access speed.

Runtime on Commodore 64: 1 minute 51.3 seconds.

Runtime on AMD Ryzen 5950X: 0.729 milliseconds. (152,674.90x faster than the C64)

Time to solve: 2 hours 22 minutes.

Mistakes made:

I was really happy to get to both solutions today directly on the C64 and also on the first try. A perfect score for the C64.

Day 20

Now only a few puzzles left!

The problem that we have today is to write a Verilog HDL simulator, kinda.

The Elves have built a communication network graph to boot up their snow generator machines. This graph consists of weird flip flops that negate their state on a zero input signal and do nothing on high inputs, and then a bunch of NAND nodes (nodes that AND their inputs, and then output the logical NOT of the result of the ANDing).

There is no clock signal in the graph, but the graph is pumped asynchronously from a single signal source that sends low bits into the line. (The line is an odd tri-state disconnected/low/high kind of a messaging line).

In Part One, the task is to simulate this processing graph, which turns out to have several cycles in it. I worked on first part for two hours on video, and in the end I got to a stage where I had the logic simulated, but there was some kind of an error. I had the right answer on sample inputs, but an incorrect result when submitting the answer. Meh, one of those days, eh?


"It's a new day... " - Reddit u/jovani_lukino

I decided to take a break, and then take the debugging to the PC to figure out what I had done wrong.

With a bit of digging, I realized that it was not a programming error this time, but a logic mistake that I had snuck in while attempting to cut a corner to simplify. Fixing that won the first star.

In Part Two the question was posed to compute how long it would require to pump zeros into the circuit, until the single output node in the graph would in turn output a zero.

Naively running the code from Part One would naturally never finish if you waited. The reasons for this, I initially hypothesized, to be twofold:

  1. The flip-flop nodes have this property that they each effectively halve the signaling rate. To output a single zero from this Elf flip flop, one needs to send two zeroes into the flip flop. In turn, if there are two of these flip flops in sequence, this would result in a zero being outputted only after four steps, and with three ffs, eight steps, and so on. And in the input graph, there were lots of flip flops configured in a long line, so the last one takes exponentially longer to trigger a zero.
  2. Another issue that slowed down the circuit is that the NAND nodes have lots of inputs into them, and each of the inputs would have to be a one in order to output a zero. Then finally, there were NANDs of NANDs to even further slow down the output rate. (This immediatelly smelled like cycles, and LCM periodicity of those cycles, but I had not quite discovered the exploit here yet)

Those Darn Cycles

At first I started to tackle this problem of the flip-flops slowing down the signaling rate. My idea was to use a vector of {time, value} data pairs to represent the whole periodicity of a given output node in repeating modular arithmetic fashion. This way I'd be able to compute and represent the behavior of a given circuit node as a function of time, and be able to evaluate quickly at which time points it signals.

This representation is pretty tight, since it skips evaluation of any time when the flip flops don't fire at all (in comparison, naive looping would waste time passing inputs to these flip flops that didn't act onwards).

I wrote a function that, given multiple inputs to a node, would compute the periodic waveform of the node's output as a function of time using these {time, value} tuples. The function would interleave the input signals (since signaling is asynchronous), and process the outputs and track the time values when the outputs would fire.

I felt that this worked out great, and I was able to build up the whole periodic waveform in just one walk through a graph.

But.

Only as long as there wouldn't be any feedback cycles in the graph.

My approach coped with DAGs fine, but any feedback cycles would mean that I'd have to discard all information computed up until the timestamp where the feedback started, to "correct" the output for the next loop of that cycle. This was a problem.

At this point I started looking for a way to maybe collapse the looping cycles into individual nodes, perhaps using a connected components algorithm.

Seeing Through The Problem

When I had switched over to the PC after lunch break, I had visualized the input graph when I started on Part Two, because I was wondering if there might have existed a way to walk the graph backwards from the goal.

However, the default layout from Graphviz was so messy with the edges crossing each other, that I didn't actually realize what I was looking at when I was staring at that default Graphviz output layout.

To figure out this cycle collapsing, I took to restricting my visualization of the graph into observing just a single cycle in the graph.

And, uh, then a moment of total disillusionment followed.

Here is what the input graph looks like after manually helping Graphviz to reorganize it a little:

The single input node named "broadcast" resides on the left (red), and the output node is "rx" (green) is on the right. The purple nodes are NAND gates, and the white circles are the flip flops.

The input graph is four separate 12-bit binary counters, with different number and locations of target counts. These are NANDed together, then inverted, and then NANDed once more to produce the final output.

(At first I thought these whould have been LFSRs, hence the naming in the graph above.. but that was not quite accurate)

So the problem today boils down to the same trick from Day 8 Part Two.

I dropped the fancy time-compacted simulator that I had built for about two hours, and adjusted my code to identify and cut out these four sub-complexes from the graph, then simulated these individually (using code from Part One), and finally computed the LCM of their cycle lengths.

Redoing all that took maybe ten minutes, and the second star came loose.

It was not a bad puzzle, but it did make me wish a little bit that the Day 8 puzzle hadn't preceded this one. For some reason, the "aha..!" moment did not make me feel smart to have figured the puzzle out.

Rather, I struggled with thoughts of, "oh, we are doing this again, are we?"

In any case, there were several interesting things to think about when trying to simulate the graph, so it was a nice puzzle overall (definitely a more enjoyable version of Day 8!) Here is the final code I ended up with.

Day 20 Part 2 Code Listing

#include 

struct node {                           // A single circuit node in the graph.
  char name[4];                         // Name identifier, "broadcast"->"roa".
  bool ff, hi;                          // ff: is a flip-flop? hi: currently in
  u8 n_in, n_out;                       // high state? n_in/_out: # of input and
  node *in[10], *out[10];               // output nodes.
} ht[256];

struct signal {                         // Tracks a single propagated signal
  node *n;                              // to the given target node.
  bool hi;
} Q[256];                               // A ring buffer queue to perform
u8 qb, qe;                              // Breadth-First simulation on graph.

node *get(const char *n) {              // Gets or adds a node to graph by name.
  char c = n[1] ? n[2] : 0;             // Hash the name to a hash table index.
  u8 h = (n[0]^((u16)n[1]<<1)
              ^((u16)c   <<2));
  for(;ht[h].name[0]; ++h)              // Do linear probing to find a/the spot
    if (!strcmp(ht[h].name, n))         // in the hash table.
      return &ht[h];
  strcpy(ht[h].name, n);                // If we didn't find the name in table,
  return &ht[h];                        // add it as a new entry.
}

node *getnam(char *s, char **n) {       // Parses a string name from input line,
  char *e = s;                          // and returns new or existing graph
  while(isalpha(*e)) ++e;               // node by that name. Output in 'n' the
  *e = 0;                               // continuation position on the input
  *n = e+1;                             // line after the parsed name string.
  return get(s);
}

bool all_inputs_one(node *n) {          // Returns true if all inputs to this
  for(int i = 0; i < n->n_in; ++i)      // node are currently high.
    if (!n->in[i]->hi) return false;
  return true;
}

u64 lcm(u64 a, u64 b) {                 // Computes Least Common Multiple of
  u64 prod = a*b, t;                    // a and b.
  while(b) t = b, b = a%b, a = t;
  return prod / a;
}

node *nand_sink(node *s) {              // Given a FF node, finds the next NAND
  while(s->ff) s = s->out[0];           // node that this FF flows into.
  return s;
}

int sim(node *start, node *end) {       // Simulate from given start to the end,
  node button = { .ff = false,          // until the end node fires a low.
    .n_out = 1, .out = { start } };     // Create a virtual node that sends lows
  qb = qe = 0;                          // to this node. Reset BFS queue.
  for(int i = 1;; ++i) {                // i: how many button presses we've got?
    Q[qe++] = { &button, 0 };           // Queue the button press
    while(qb != qe) {                   // While BFS queue is not empty...
      signal &s = Q[qb++];              // Read the signal in queue, and get
      node *n = s.n;                    // the node this signal targets.
      if (n == end) {                   // If signal is to the end node, check
        if (all_inputs_one(n)) return i;// if the end now sees all inputs as 1?
      } else if (!n->ff || !s.hi) {     // Otherwise, if target is a flip-flop,
        if (n->ff) n->hi = !n->hi;      // propagate the signal to its outputs,
        for(u8 j = 0; j < n->n_out; ++j)// if the flip flop was sent a low.
          Q[qe++] = {n->out[j], n->hi};
      }
    }
  }
}

int main() {
  BENCH();
  FILE *f = fopen("in20.txt", "r");
  char s[100];
  while(fscanf(f, "%s -> ", s) > 0) {   // Read the input graph.
    if (strlen(s+1) > 3) s[4] = 0;      // Truncate hack: "broadcast" -> "roa".
    node *n = get(s+1);                 // Add node to graph.
    n->ff = (*s == '%');                // Mark it as FF or NAND.
    fgets(s, sizeof(s), f);             // Read all the outputs of this node.
    for(char *c = s; *c > 32;) {        
      node *out = getnam(c, &c);        // Get output node
      n->out[n->n_out++] = out;         // Connect this node to the output, and
      out->in[out->n_in++] = n;         // the output back to this as input.
      while(isspace(*c)) ++c;           // Skip to parse the next output name.
    }
  }
  fclose(f);

  node *BC = get("roa");                // Get starting "broadcast" node.
  uint64_t len = 1;
  for(int i = 0; i < BC->n_out; ++i) {  // Find all counters that this broadcast
    node *start = BC->out[i];           // node is connected to, and which nodes
    node *end = nand_sink(start);       // each counter complex sinks to.
    len = lcm(len, sim(start, end));    // Then simulate graphs and incorporate
  }                                     // the cycle length against the other
  printf("%llu\n", len);                // counter cycle lengths.
}

The program code totaled 98 lines. This was another computationally light problem after the veil of the problem was revealed, so the MOS 6510 could flex its computing performance.

Time to solve:2 hours 01 minutes on video + ~2 hours 30 minutes offline.

Runtime on Commodore 64: 34.6 seconds.

Runtime on AMD Ryzen 5950X: 0.399 milliseconds. (86,716.79x faster than the C64)

Mistakes made:

Part one could have been something possible to figure out on the C64, but Part Two definitely not, as visualizing the graph required extra tooling.

Still, so far the Commodore 64 has been able to compute the solutions to all the puzzles, which is much more than I anticipated!

Interlude: Three Scary Commodore 64 Games

A couple of days ago, I wrote about my unoriginal C64 games collection, and happened to picture a tape that had the game Amaurote on it, and then mentioned that I'd have something more to write about this.

Well, that game brought a memory of C64 games that were too scary that I didn't dare to play them as a kid.

When I'd browse my hand-written index booklet of C64 games and I'd find one of those games that I remember being too scary, I would pause for a moment to think if I would dare to load it up and play.

And then after a short think, I'd most often skip it and decide to load something else.

Here are three such games that I remember that gave me the chills, in no particular order.

#1. Amaurote

In this game you are traveling with some kind of space donut, cleaning maps of flies.

I remember that this game had two really scary things: first was the gameplay, where you couldn't see too much far off from your space donut. The flies you were supposed to be shooting might jump right on you before you could react!

That, paired with heavy suspenseful music, made this game feel scary. Listen for yourself (the ingame music starts at 0:34):

The initial mainmenu tune at 0:00 - 0:34 is pretty good btw.

Then after killing all the flies in the map, you are supposed to kill the fly queen (which on the video the player luckily finds immediately at 0:45).

That music absolutely haunted me, and I remember feelin so afraid to try to find all the flies on the map, thinking that the fly queen would jump off the screen at me, or something else unexpected would happen.

#2. Forbidden Forest

On side B of Tape 16, I have scratched down the name "Forpidded forest".

This game was absolute horror (so much so that I surely didn't just typo the name, but was just afraid to spell it out correctly :p).

Everything about it. The graphics, the theme, the gameplay enemies, the music, and the sound effects evoked horror.

In the very first level, the spiders would come to eat the player and the only thing you knew to do was to run away. If you didn't, you'd get the most realistic animation of being eaten by a spider along with loud horrible noises, that I had to close my eyes and put my fingers in my ears whenever I died.

And because at first you don't know what to do in the beginning (how to time your movements properly), the only thing you can do is to run away from the spiders to not die. Oh my god.

If that reads ridiculous now, mind you, I would have been like 6 or 7 years old at the time, and never seen anything as realistic. Here is a nice video review of the game:

I know that a couple of years later, maybe at age of 10 or 11, I did come back to finish the game on the C64, and felt absolutely like a champ to do it.

Just ridiculous how chilling this game was. I don't think even Alien: Isolation managed to scare me as much. Well, maybe that's because I am (supposedly) an adult now.

#3: Short Circuit

This one is an absolute mind-bender. I have this original C64 tape that reads "Other Side: Short Circuit" (to turn to load from B side).

In this game, you are a robot, walking inside a building of isometric sorts.

I vividly remember how much of a scare this game was.

First, you get to wait for the loading screen, with energetic music, and some skeleton-like thing being hit by a lightning, or casting lightning or similar. That picture was scary of anticipation. Like waking from the dead, or something like that.

Then, the game starts, and the music, oh my god, is this kind of almost amelodic oppressive torment.

You *play* this super scary looking skeleton robot thing that doesn't have any legs.

There are other lifeless looking humanoid robots on the walls, and then you meet this other robot that all of a sudden is alive! And it mirrors your every action and doesn't let you pass.

You have no idea what to do, and the other robot is just there, lifelessly, blocking your way and mirroring all your actions.

All while the music keeps haunting you, as if something horrible is about to happen.

When you finally figure out how to get past that robot, you get to other robots that also mimic your moves.

In the game you apparently are supposed to search these desks and shelves for robot parts, and the clock is ticking.

Then at some random point, you run out of time(?), and a security guard(?) character comes from the left and shoots you in a split second, and it is game over.

I remember when I first played this game, I didn't want to see the horrible robots any more, and I didn't want to listen to that eerie tune any more, and I was constantly anxiously waiting when the guard would come from the left and shoot me, and what to do about it (turns out there is nothing you can do, as that is the game over animation).

I just wanted to turn off the C64 and think about something else. Oh wowz what a game.

Then... a couple of decades later, I look up the game, and realize that it was actually a game based on a movie: Short Circuit is a movie from 1986!

And uh.. it is a Sci-Fi Family Comedy film.

Wait what? A comedy?

Just imagine being a C64 game developer, hired to turn a comedy movie into a game, and the thing you build ends up haunting a poor kid up until their adulthood.

Haha, I just can't stop laughing trying to wrap my mind around what happened there. :D

I'd really love to sit down to chat with the developer one day <3 (another game where I'd really love to have a word with the dev is Jack The Nipper 2, but that's another story that)

Well, that was the trio of games that got me. There may have been others, e.g. the loading image of Gutz, or Wizard Of Wor, but I don't remember them being quite as scary overall as the above three.

Alright, enough about reminiscing, let's get back to solving the last five puzzles, shall we?

Day 21

It is the 21st of December. The date is etched in my mind by the silly old Finnish Kummeli show, from 1994.

Today we got a puzzle that really gave a full run for the money.

Returning back to Island Island, we meet the gardener from Day 5. Today he has a step counter, and he'd like to get his steps in for the day. The task is to calculate, given a 2D map, how many different end locations in his garden he can reach if he walks a give number of steps.

Looking at my scribbles, I was happy to immediately observe the parity property in the problem, and that the set of squares that are reachable form a growing subset sequence as the number of steps grow.

The parity property means that after an odd number of steps, the gardener can only reach squares (x,y) that have an odd parity: (x+y)%2 == 1. After an even number of steps, the gardener can only reach squares that have an even parity: (x+y)%2 == 0. This behavior is important - it is an annoying obstacle, but needs to be observed to solve Part Two later.

You can see how the parity property works in action in the above image. After an odd number of steps, the Gardener must be located in one of the cells marked with 'o', and after an even number of steps, the Gardener must be located in one of the cells marked with 'x'.

The second "growing subset sequence" property helps the BFS search to not have to visit any previously visited cell on the map again. This is a nice optimization that helps search performance considerably, as it collapses the BFS frontier into a single outwards scanning "Manhattan diamond" (minus the rocks in the map).

What Manhattan diamond you ask? Well, in a 2D grid where one can only walk discretely in the four cardinal directions, a circle of radius r forms a diamond shape, and not a true circle like if walking a distance r away from an origin in real world would.

So the first part was a straightforward flood fill/Breadth First Search. I would have had a chance to solve it in a good time on the C64, but I made two mistakes that cost a bit of debugging time, and a missed answer. Nevertheless, I did get the solution on the C64 natively.

In Part Two, the bomb dropped.

The task was changed so that the Gardener's garden is now an infinitely long wrapping map, and the Gardener has a humongous 26 and a half million steps to walk. The classic "scale up the numbers to astronomical values" twist.

I didn't even dare to attempt to solve this part of the problem on the C64, this one surely would need some visualization to get going.

Although before concluding the video I hypothesized that I could explicitly pathfind only the boundary, and then with memoization that could be made fast to walk around only the boundary of cached values (~linear time), instead of needing to walk the whole area of the Manhattan diamond (~quadratic time).

This intuition turned out to be correct, but getting to a working solution was somewhat challenging. I knew what I wanted to do, but in this case, it is very easy to do to many indexing bugs, so getting a final working solution took quite a bit of time.

What exaggerated the development challenge a bit was that there was a very cheeky play made on the puzzle inputs. See, the example input, if scaled up in this fashion, would present an extremely difficult challenge to correctly pathfind, because of the obstacles that prevent walking quickly towards the bottom right boundary of the map.

So the many "over-helpfully" placed example input test case numbers were a complete red herring. Developing a BFS flood filler that would actually compute those example values would be a complete waste of time to get to the real solution.

Fortunately in the real problem input, there are several "strategically placed" obstacle-free highways that allow making static calculations about finding the shortest path in the Gardener's actual garden. The important obstacle-free paths are shown in green lines above.

Also, notice a second set of obstacle-free highways in the garden, marked in fat yellow lines. These highways provide for a way to produce a completely different solution to the problem, one that simply subtracts the number of rocks in different regions of the garden (after running a reachability floodfill search to discard a couple of unreachable squares on the map. Those can be replaced by rocks as well) from the total area that the Manhattan diamond of generated by the step counter covers.

I did not solve the problem by leveraging those yellow sections, since my original solution was quite far along when I realized that would have been a possibility. My solution only assumes the existence of the green highways above, and would work even if the yellow highways were filled with rocks.

If the green lines did not exist, the problem could prove to be exceptionally difficult to solve. But with these green free paths, it is easy to compute a formula to calculate how many steps it will take for the Gardener to reach any single duplicated copy of the garden in the infinitely repeating map, without needing to run a pathfinding algorithm to find out.

The example input did not have either of the green or the yellow highways, which would lead the user astray attempting to do something considerably more complex.

Solving Part Two

The general idea to solving the second part is to chart out how the gardener's step count limit "Manhattan diamond" shape intersects/overlaps with the different copies of the garden.

Btw, it looks like the desired step counter value, 26501365, has been carefully selected to be equal to 202,300 * 131 + 65. The numbers 131 and 65 are magic values: 131 is the width and height of the square-shaped garden, and 65 is the distance that the Gardener takes to walk from the starting location in the middle of the garden, to the edge of a single copy of the garden. I.e., garden width 131 = 65+1+65.

This is so that the edges of the Gardener's step counter diamond will coincide with the yellow highways of the map, and it enables the alternate solution I briefly described above.

Anyways, my solution does not assume that property, but I trucked on by doing the actual flood fills that I had started with.

Above diagram is a visualization of the Gardener's step counter counter limit (the Manhattan diamond in red), the Starting square of the Gardener in the middle, S, and the different infinitely repeating copies of the 2D garden map (in green and gray).

The general idea to solve this problem is to avoid flood filling the whole diamond, but instead, just flood fill each unique section once, and then multiply the result by the count of how many sections there are of that type.

That is, there is an observation to be made, that the different garden copies that the diamond intersects can be classified into the following groups:

Groups A-L are shown above. Additionally there are two more groups, the gardens that fully reside inside the diamond and do not intersect with it, which come in two different parities, shown in GREEN and GRAY.

The realization here is that indeed all the copies of the garden that have been assigned the same letter above, will "solve" to the same BFS step count result.

That is, it is only necessary to compute the BFS search once for each distinct group of a garden, and then the result can be multiplied by the total number of such copies that exist.

Then summing up the possible locations in all the groups yields the final answer.

Computing the number of garden groups A-L is relatively easy, but computing the number of GREEN and GRAY copies of the garden is a bit more involved. That led to a lot of scribbling on pen and paper to come up with the right formulas to answer "how many cells with an even and odd parity are there inside a Manhattan diamond of radius r?".

Finally I was able to figure it out, but it did take some re-dos.. I was shocked how rusty my elementary math series manipulation has gotten. (later I realized that you can rotate the diamond 45 degrees to basically just read out these formulas without any math.. duh!)

Here is the final code, 70 lines in total.

Day 21 Part 2 Code Listing

#include 
#define S 136                           // Map stride, 128+8 for low ones count.
u8 m[S*136], color = 1;                 // m: map, color: flood fill color
u16 Q[1024], b, e, reach[2];            // reach: cells visited, per parity.
                                        // Q: BFS queue.
void visit(u16 pos, u8 color) {         // Visits a single 2D location on map.
  if (m[pos]==color||m[pos]==255)return;// Skip if already visited.
  m[pos] = color;                       // Mark this one visited.
  ++reach[0];                           // Grow reach count for this parity.
  Q[e] = pos;                           // Enqueue the node for further visit.
  e = (e+1)&1023;                       // And walk the queue ring buffer tail.
}

u16 solve(u16 steps,bool p, u8 x, u8 y){// Perform BFS flood fill to find how
  if (!((x+y+p+steps)%2)) --steps;      // many cells are reachable. Fix up
                                        // step count for correct parity.
  reach[0] = 0; reach[1] = 0;           // Reset search counts,
  b=e;                                  // and the ring buffer.
  visit((y+1)*S+x+1, color);            // Visit first node to start search.
  while(steps--) {                      // Propagate until all steps searched.
    SWAP(reach[0], reach[1]);           // Swap score according to parity count.
    for(u16 qe=e; b!=qe; b=(b+1)&1023) {// Consume walkable nodes from BFS queue
      visit(Q[b]-1, color);             // and visit all neighbors.
      visit(Q[b]+1, color);
      visit(Q[b]-S, color);
      visit(Q[b]+S, color);
    }
  }
  color = (color<0xFE)?color+1:1;       // Increment FF color for next search.
  return reach[0];                      // Return the number of locations found.
}

int main() {
  BENCH();
  FILE *f = fopen("in21.txt", "r");
  memset(m, 255, sizeof(m));            // Clear map to all sentinels.
  u8 w = 0, h = 0;
  for(u8 *s = m+S+1;fgets((char*)s,S,f);// Read line by line to sentinel array.
      s += S) ++h;
  fclose(f);
  w = strlen((char*)m+S+1)-1;           // Establish map width.

  for(u8 &ch : m)
    ch = (ch=='.'||ch=='S')?0:255;      // Rewrite map squares to be colorable.

  u8 c = w/2;                           // c: distance to center (to start pos)
  u32 steps = 26501365;                 // Desired number of steps to walk.
  u32 R = steps / w;                    // How many garden cells wide is the
  u64 evens = (u64)(R-1)*(R-1);         // diamond? Find # of even and odd 
  u64 odds  = evens + 2*(R-1) + 1;      // parity gardens that fit in diamond.
  u64 reach = evens*solve(w+h, 0, c, c);// Accumulate GREEN squares.
  reach    +=  odds*solve(w+h, 1, c, c);// Accumulate GRAY squares.

  reach += solve(w-1, 0,    0, c);      // Score Cell A
  reach += solve(w-1, 0,  h-1, c);      // Score Cell G
  reach += solve(w-1, 0, c,    0);      // Score Cell J
  reach += solve(w-1, 0, c,  h-1);      // Score Cell D

  reach += R*solve(c, 1,   0,   0);     // Score Cells K
  reach += R*solve(c, 1, w-1,   0);     // Score Cells H
  reach += R*solve(c, 1,   0, h-1);     // Score Cells B
  reach += R*solve(c, 1, w-1, h-1);     // Score Cells E

  reach += (R-1)*solve(c+w,0,   0,   0);// Score Cells L
  reach += (R-1)*solve(c+w,0, w-1,   0);// Score Cells I
  reach += (R-1)*solve(c+w,0,   0, h-1);// Score Cells C
  reach += (R-1)*solve(c+w,0, w-1, h-1);// Score Cells F

  printf("Reach: %llu\n", reach);
}

Consult to the section diagram above for visual meaning of the groups A-K.

Coming up with this solution did take a few hours, since getting the parity calculation just right was really error prone.

Runtime on Commodore 64: 1 minute 49.0 seconds.

Runtime on AMD Ryzen 5950X: 1.0824 milliseconds. (100,702.14x faster than the C64)

Time to solve: 1 hour 39 minutes on video + ~5 hours offline.

Mistakes made:

Solving Part Two By Counting Rocks

I already described this method in some detail, but here is a more detailed pseudo of how this method would work.

Observe the yellow rock-free highways in the problem input.

The problem has been crafted so that the outer boundaries of the Manhattan diamond will reside on this yellow highway, so there will be no confusion as to whether the squares close to the diamond border are "in" or "out".

To solve Part Two by counting the rocks in the Garden:

  1. Perform a flood fill to find all reachable squares in the garden. Mark all unreachable squares with rocks.
  2. Subdivide the 2D garden map into five sections (one "core" plus four corners) according to the yellow highways presented earlier. Count the number of rocks in each of these sections of the garden.
  3. Calculate the total number of copies of the garden of each of the types A-L and GREEN and GRAY, shown in the above diagram.
  4. Observe that the different sections pair up to produce new full copies of gardens. For example, a section C+H makes up one full copy of the garden. Likewise, F+K, I+B and L+E sections each pair up to make a full copy of the garden. (but there are one extra leftover copy of B,E,H and K each)
  5. The four sections A, D, G and J each accumulate two full copies of the garden, plus two "inner cores". One of these inner cores pairs up with the remaining B,E,H and K.
  6. Calculate the total number of full copies of the garden and their parities, and then calculate the total number of "checkered" surface area in the whole garden diamond, and subtract from it the total number of rocks with the appropriate parity. That will be the final result.

This method will probably be easier than my flood fill mechanism, but it still requires a bit of care to come up with the surface area formulas for even and odd parities of cells inside a Manhattan square. (The formulas are R^2 and R^2 + 2^R + 1 for suitable meaning of R). See the code above, and the scribbled notes for a derivation.

Day 22

Today was a fun puzzle, maybe the best one in this year's AoC so far.

It was creative and unique, and Part Two was different than scaling up the problem size. Nice!


Jenga, Wikipedia

Before being able to return from Island Island, we need to work on some falling bricks first. These bricks are of different lengths (1-10 units long), and fall in different orientations in an integer grid. It is like Jenga, but with varying brick lengths and rotations.

The initial part of the puzzle was to sort and drop the bricks from the sky. This was a bit of a distraction from the actual problem, just "preprocessing" before getting to the real problem at hand.

In Part One the question was posed to identify all bricks in the tower that could be removed without the whole tower toppling. In this instance, the task was not to simulate actual physics, but a brick was considered to be stable if it touched any brick under it.

Thinking about this a bit, the problem is solvable as a streaming online algorithm using constant amount of memory (for the playfield) if the input is already in sorted order.

Drop each invidual brick one at a time, and then calculate whether that brick would get a unique brick underneath it to support it. Mark all those unique support bricks, and then finally subtract the count of those unique supporting bricks from the total number of bricks to solve the problem.

I was able to solve this part live on video on first try on the C64, taking about 55 minutes, with just a single small programming mistake that was fortunately quickly identified.

Part Two

The second part was more difficult. The task was to calculate for each brick, the maximum number of bricks that would topple if that brick was removed.

This is a problem that definitely doesn't have an online streaming solution anymore.

I switched my thinking into graph mode, drawing a graph diagram of the example input, and a couple of variations of it.

With some pondering, I discovered that I could solve the problem by doing graph searches that would start at each brick vertex in turn, and then search the graph of supporting blocks towards the ground node, until hitting a support frontier at some z-value that would consist of only a single unique brick.

That unique brick then definitely would cause the original search source brick to topple if removed.

Though the drawback in that algorithm is, I thought, that it could exhibit quadratic behavior. After all, it performs a graph search (of N vertices) starting at each vertex in turn (so possibly times N).

I really wanted to find a way if the problem could be solved in a single pass search that would only traverse the support graph once. But I couldn't come up with one.

So I decided to take a pause, and concluded the video here.

After having eaten, I took the problem to the PC, and in the absence of ideas, coded up that naive quadratic feeling algorithm, just to observe that it actually exhibits linear time behavior and ran in mere milliseconds. The code is presented below.

Day 22 Part 2 Code Listing

#include 
#include "vec.h"

struct brick {                          // Represents a single dropped block.
  u8 x[2], y[2];                        // x,y in range [0-9]
  u16 z[2];                             // z is in range [0-400] or similar.
  u24 n_sup;                            // # of bricks this brick supports.
  u8 szb;                               // # of filled elements to 'below'
  brick *below[3];                      // Points to bricks that support this
                                        // brick. A small memory saving cheat:
  bool operator <(brick &r) const {     // in my input, max # of supports for
    return z[0] < r.z[0];               // any brick was three, so static cap=3.
  }
} bricks[1500];

static brick* map[256];                 // 10x10 playfield where bricks drop.
vec sup;                        // Temp memory for storing support set.

void drop(brick &b) {                   // Drop the given brick to the field.
  u16 z = 0;
  for(int y = b.y[0]; y <= b.y[1];++y)
    for(int x=b.x[0]; x <= b.x[1];++x)  // Find the z coordinate the brick falls
      z = MAX(z, map[(y<<4)|x]->z[1]);  // to.

  b.z[1] = z+1+b.z[1]-b.z[0];           // Adjust the brick's z coordinate to
  b.z[0] = z+1;                         // the fall position.

  for(int y = b.y[0]; y <= b.y[1];++y)
    for(int x=b.x[0]; x <= b.x[1];++x) {// Update the playing field to contain
      brick *s = map[(y<<4)|x];         // this newly dropped brick, and also
      map[(y<<4)|x] = &b;               // update the vector of all bricks that
      if (s->z[1] != z) continue;       // are under this brick.
      if (!b.szb||b.below[b.szb-1]!=s)
        b.below[b.szb++] = s;           // Brick s is under this brick.
    }
}

void load(brick &b) {                   // Propagates this brick's accumulation
  sup.clear();                          // of support to the critical underlying
  for(i8 i = 0; i < b.szb; ++i)         // brick that supports it.
    sup.push(b.below[i]);
  u16 z = b.z[0]-1;

  while(z > 0) {                        // Walk supports downwards in Z order.
    u16 nextz = 0;
    if (sup.size() == 1) {              // If we found a single supporting
      sup[0]->n_sup += b.n_sup+1;       // brick, then that is bearing the
      return;                           // weight of the current brick, and all
    }                                   // bricks the current brick beared.
    for(int i = 0; i < sup.size(); ++i){// Keep iteratively walking down the
      brick &s = *sup[i];               // bricks that support the supports of
      if (s.z[0] == z) {                // the current brick, and so on..
        for(i8 j = 0; j < s.szb; ++j)
          if (sup.index(s.below[j]) < 0)// Add the support to the current
              sup.push(s.below[j]);     // support set, if it didn't exist yet.
        sup[i] = sup.back();
        sup.pop();
        --i;
      } else if (s.z[0] < z)            // Find which z order we should expand
        nextz = MAX(nextz, s.z[0]);     // next, so that we process supports in
    }                                   // the proper order.
    z = nextz;
  }
}

int main() {
  BENCH();
  FILE *f = fopen("in22.txt", "r");
  int x[2],y[2],z[2], n = 1;
  char dummy;
  while(fscanf(f,"%d,%d,%d%c%d,%d,%d",  // Parse input bricks
    x,y,z,&dummy,x+1,y+1,z+1)>0) {
    bricks[n++] = {
      (u8)x[0], (u8)x[1], (u8)y[0],
      (u8)y[1],(u16)z[0],(u16)z[1] };
  }
  fclose(f);

  for(brick *&b : map) b = &bricks[0];  // Init map to point to ground brick.
  qsort(bricks, n);                     // Sort bricks to drop order.
  for(int i=1; i<n; ++i)drop(bricks[i]);// Drop each brick.
  for(int i=n-1;i>0;--i)load(bricks[i]);// Calculate load of bricks to their
                                        // lower supporting bricks.
  u32 total = 0;
  for(brick &b:bricks) total += b.n_sup;// Sum up the result
  printf("Total supports: %lu\n",total);// and print.
}

What I think makes this solution run in linear-ish fashion is that it transitively propagates the support numbers between subsequent blocks.

That is, the test "if (sup.size() == 1) { ... return; }" is something I considered to be an opportunistic early-out that might catch only in a couple of scenarios.

But it seems that this check is what makes the graph search run in linear-ish fashion.

Understanding The Algorithm

To figure out how this search code works, let's look at the following input.

The code first initializes the support score of all bricks to zero, and then proceeds from topmost brick down to the bottom brick in order, performing a graph search from every brick in turn.

So it would consider brick F first, and walk downwards its supports, and see that brick E is the unique brick that supports F. So the number of supports of E is incremented by one.

The graph search that started at F is then terminated here.

Then, the code does the next graph search starting at E. Walking downwards, it immediately sees that brick D is the unique brick supporting E. This time, the supports of D are incremented by two: one to account for E, and then +1 to account for everything that the search source node E in turn supports (which we've discovered to be brick F). Graph search at E finishes here.

Then the graph search starts at D. It looks at its supports, and sees B and C, so there isn't a unique supporting brick.

Then this same search (from D) proceeds forward, walking down the graph to find the supports of the supports C and B, and finds that they are both supported just by a single unique brick A. So A gets +1 for D, and then also it gets the support score from everything that D supported, so +2 more in this case. The graph search that started at D terminates here.

Note that at this point of the search, brick A has accumulated a support score of three. During the previous search, it doesn't accumulate supports from bricks B or C, because that could lead to overcounting: there might exist a sibling next to brick D that also rests on top of B and C, so that would lead to accounting B and C towards A multiple times.

The graph search then starts at C, and sees that it has unique support A. Same occurs for B. So A gets extra +1 from B and C, getting its final support count of 5 bricks.

In total, A has now correctly accumulated the number of bricks total that it supports, but the graph search never needed to individually walk quadratically from F to A, from E to A, from D to A, and so on. The propagation of the support scores and the "early-out" check took care of that.

In the above code, the function load performs the graph search starting at a given node.

The runtime was really nice, even though that algorithm does not have a guaranteed linear time property (to my understanding).

Runtime on Commodore 64: 1 minute 41.4 seconds.

Runtime on AMD Ryzen 5950X: 1.162 milliseconds. (87,263.34x faster than the C64)

I haven't yet had a chance to examine if this would be the "best" algorithm. After all, it can exhibit some quadratic-like behavior on adversarial inputs. But here it looks like the inputs were quite random, so the search was practically linear. Completing in less than two minutes on the C64, it does at least feel really tight.

Time to solve: 2h 13 minutes on video + 25 minutes offline.

Btw, it looks like this year I have been extra nice, and got my Christmas present a couple days early!

A ZX Spectrum Next!

At the beginning of the video stream, I take a brief look at this present that I got.

Mistakes made:

Today is the last day of AoC that I will be able to do live online. I am traveling for the holidays, so will not be able to do the remaining three puzzles as they appear.

I will update this page though for the last puzzles once I get to doing them after the 25th. Have a happy holiday, and check back to see if the Commodore 64 got all the way through all the puzzles, or if the last couple of ones caused trouble at the very last stretches!

Day 23

On the day before Christmas Eve I traveled to spend the holiday, so I am actually solving this puzzle only the 27th.

Reading the puzzle input, it is nice to see an unembellished computer science algorithms problem. In the puzzle for 23th, the challenge is to find the longest path in a 2D map.

While finding the shortest path in a graph can be done in polynomial time, finding the longest path in an arbitrary graph is a NP-hard problem.

However, fortunately, in Part One the graph is a DAG, made so by these one-directional passable "icy slopes" that exist in the input map. This allows solving the problem still in polynomial time.

For Part One I wrote a memoized recursive walker (memoized the longest path starting at each intersection to avoid doing it many times). That gave the first star.


Reddit u/encse

In Part Two of the problem, the puzzle was changed so that the one-directional slopes were removed, and the ask was to again calculate the longest path.

Given that there are now cycles in the graph, the best currently known general solution will unfortunately take exponential time.

Still, the code does not look much different from before. A Depth First Search can be used, but one has to be careful about how in applying it, because if directly applied, the DFS code would of course run into an infinite loop.

To avoid this, the search must ensure that the same 2D map position is not walked twice.

Instead of maintaining a history of each visited cell e.g. in a hash map and comparing each neighbor coordinate against to this visited map, a faster way is to "block off" each intersection when recursing in the search, and opening it back up again when returning back up from recursion.

This way the DFS search can be done in a manner that does not need to track any variable-length auxiliary state besides the original map. That is, avoiding any memory allocation.

Unfortunately developing the code to solve Part Two on the Commodore 64 proved to be infeasible. The code that worked so well for Part One was taking too long to run even on the example input.

So I took to the PC to resume development, to see if I still had an infinite loop or some other bug, or if my code was just slow.

It turned out that the code was just a bit slow, and I saw the second star from the PC.

To optimize the search, I added a preprocessing step that converts the 2D map into an abstract weighted graph, where each map intersection would be represented by one node, and the edges would weigh how many cells it would take to walk between those intersections.

Even with this speedup, the code took about 90 milliseconds to run on the PC. Yikes, that will be a long runtime for the C64!

When porting the code to the C64, I realized that it was crashing on running out of call stack in recursion.

So I turned the code from recursion into iteration on a custom stack. This was a bit more tricky than usual, given the need to block off visited intersections.

Typically when switching from recursion to iteration, the resulting algorithm tends to be more efficient on the PC, roughly, due to loop constructs being faster than function calls.

However in this case, I was not able to make the iterative version work as fast as my recursive version would. Not quite sure why.

Nevertheless, the iterative version, that would be safe for C64 execution, would run on the PC in about 200 milliseconds.

I loaded it up on the Commodore, and left it running and went for a walk. The code looks like follows:

Day 23 Part 2 Code Listing

#include 
#define S 144                           // 2D map stride: 128+16
char map[S*144];                        // Input 2D map
u16 maxlen;                             // Receives length of longest path.
struct node;
struct edge {                           // Path between two map intersections.
  node *dst;                            // Destination intersection
  u16 len;                              // Distance to that intersection
};
struct node {                           // An intersection
  u8 n;                                 // # of paths out from this intersection
  edge next[4];                         // The paths themselves
  bool visited;                         // Tracks if this node has been visited.
  u16 pos;                              // 2D map pos of this intersection.
} *G, *ht[256];                         // G: goal node, ht: hash table for
                                        // graph nodes.
node *get(u16 pos) {                    // Gets or creates a graph node for
  u8 h = pos ^ (pos>>8);                // given map pos.
  while(ht[h] && ht[h]->pos != pos) ++h;
  if (!ht[h]) {
    ht[h]=(node*)calloc(1,sizeof(node));
    ht[h]->pos = pos;
  }
  return ht[h];
}

bool connect(u16 a, u16 b, u16 len) {   // Connects two map intersections with
  node *A = get(a), *B = get(b);        // an edge node
  for(i8 i = 0; i < A->n; ++i)
    if (A->next[i].dst==B) return false;// Skip out if already connected
  A->next[A->n++] = { B, len };         // Undirected graph, A connects to B and
  B->next[B->n++] = { A, len };         // B connects to A.
  return true;                          // Tell caller we did connect the nodes.
}

struct {
  u16 pos, parent;
} gstack[256], *gsp = gstack;           // Stack for converting 2D map to graph.

void graph(u16 goal) {
  while(gsp-- > gstack) {
    u16 start=gsp->parent, next[4],
          pos=gsp->pos, parent=start;
    for(u16 len = 1;; ++len) {
      if (pos == goal) {                // Connect current pos to goal pos if
        connect(start, pos, len);       // we got there.
        break;
      }
      u8 n = 0;
#define VISIT(p) if((p)!=parent&&map[p]\
                != '#') next[n++] = (p);
      VISIT(pos-1);                     // Walk to all neighbors of this pos.
      VISIT(pos+1);
      VISIT(pos-S);
      VISIT(pos+S);
      if (n != 1) {                     // If there are more than one neighbor,
        if (n && connect(start,pos,len))// this is a map intersection, so
         while(n--)*gsp++={next[n],pos};// connect the parent position to this
        break;                          // and add all outbound directions to 
      }                                 // the traversal stack.
      parent = pos;                     // Keep track of parent pos so we don't
      pos = next[0];                    // walk back that way.
    }
  }
}

struct
{
  node *n;
  edge *e;
  u16 len;
} stack[256], *sp = stack;              // Walk stack for the map graph.

void path(node *n) {                    // Find the longest path starting at n.
  n->visited = true;                    // Mark n visited.
  for(i8 i = 0; i < n->n; ++i)          // And queue all its edges.
    *sp++ = { n, &n->next[i], 0 };

  while(sp > stack) {                   // While we have edges to walk.
    --sp;
    edge *e = sp->e;
    u16 len = sp->len;
    if (!e) {                           // In walk stack, null edges denote it
      sp->n->visited = false;           // is time to mark back to unvisited.
    } else if (!e->dst->visited) {      // Enter destination if not visited.
      if (e->dst == G) {                // Is destination the goal node?
        maxlen = MAX(maxlen,            // Accumulate path length if so.
          (u16)(len+e->len));
      } else {
        *sp++ = { e->dst, 0, 0 };       // Record a marker on stack that will
        e->dst->visited = true;         // unvisit the destination node later
        for(i8 i=0; i < e->dst->n; ++i) // when popping entries.
          *sp++ = { e->dst,             // And add all outbound edges of this
                   &e->dst->next[i],    // node to the stack.
                   (u16)(len+e->len) };
      }
    }
  }
}

int main() {
  BENCH();
  FILE *f = fopen("in23.txt", "r");
  u8 h = 0;                             // Load input map, and calculate map
  for(char *s=map;fgets(s,S,f);s+=S)++h;// height.
  fclose(f);
  u16 goal =strstr(map+S*(h-1),".")-map;// Find goal map position on last line.
  G = get(goal);                        // Create graph node for goal.
  map[1] = '#';                         // Seal off entrance to avoid oo-bounds.
  *gsp++ = { S+1, 1 };                  // Add start pos to graph stack, and
  graph(goal);                          // convert the 2D map to a graph.
  path(get(1));                         // Then walk the graph to calculate the
  printf("Length: %u\n", (int)maxlen);  // longest path length.
}

After some two hours, it was still on it. I had not added any intermediate status logging, so I was wondering if it might have hung, or if it was still computing.

Read a book, took a sauna, played some piano, still nothing.

A bit later, exactly 4 hours 16 minutes and 6.7 seconds later, in the middle of dinner, all of a sudden the C64 logged out the correct answer to my problem input. Well, that is a job done, even if a bit on the slow side!

Optimizing The C64 Runtime

Getting the result out on the C64 was already a success, but I still felt a bit unsatisfied by the result, because of the several hours long runtime.

How would you speed up the runtime of a NP-hard problem?

Well, with admissible heuristics of course!

My idea was to see if I could come up with some heuristics to cut off the search space, so that the runtime on the C64 would be a bit faster.

Initially I measured the inner loop of the search to visit 131,606,153 nodes during the search. No wonder the C64 took its time to calculate the result.

If we take a look at the input graph, it is pretty close to the worst case that an Euclidean 2D map graph could look like: loops everywhere!

In the above graph, each node represents a map intersection with (x,y) coordinates, and the number next to an edge denotes the distance between the two intersections.

Heuristic #1

The first heuristic I came up with is this idea to generate "level sets" to identify dead ends during the walk.

The idea with level sets is to group nodes according to their distance from the goal node, like so:

Then, at all times during the search, I keep track of how many unvisited nodes I still have available to enter in each level set category.

The heuristic goes: if I walk into a node that is the last available node from a level set (so the path has already visited all other nodes with the same level set d-value), then I must take the next step into a node with smaller level set value (i.e. towards the exit).

This is because if the search would do the opposite, and enter a node with a larger level set value, then it would have painted itself into a corner, as there would no longer exist any nodes to traverse to reach back the goal. (To reach the goal, the walk would have to go through a node with the level set value that we just exhausted).

Tracking this heuristic is easy: first do a BFS search through the graph starting from the goal node, to establish the distances of each node to the goal.

Then during the actual path search, keep a track of the number of remaining free nodes in each level set by incrementing and subtracting counters in an array indexed by the level set d-value.

So the availability of level sets can be tracked with only a constant number of operations (just like the visited/unvisited boolean).

Implementing this heuristic reduced the search space from 131.6M nodes down to 59.3M nodes. (-54.9%) Not bad, eh?

Heuristic #2

Even after this, the search space is pretty large for the C64 to handle.

I wanted to find a heuristic that would somehow exploit the fact that during the search, we will generally keep finding candidate paths that have some length from start to the end.

It would be nice to have some mechanism that would enable aborting a path early if it turns out that path cannot possibly lead to a better result than an earlier found path we've found.

This type of early-out tends to be effective in many problems.

However, it is a bit difficult to know how much a partial path can still improve at any given time during the search.

I could do a flood fill at each step of the way to find what is the max potential still remaining, but that seems slow, since a flood fill is a global operation that would have to look at the whole graph.

(Although a flood fill would then naturally find dead-ends as well, if the path curls up on itself with no means to exit anymore)

My idea instead was to observe that since each intersection can only be visited once, we can assign the "best potential score" for each intersection, that is the maximum length of any edge that connects to that intersection.

That is, by using that node, the potential score is the best distance we could travel by going through that node.

Note that here I consider only a single best edge around a node, and not two (entering and exiting) edges, because a full path is a trail of form node->edge->node->edge->...>edge->node, i.e. there are as many edges as nodes (plus the start node that accounts for no distance).

Then, the maximum potential of the graph is just the sum of all best potential scores of each node.

While searching, whenever visiting a node, I subtract the best potential scores from the remaining maximum potential of the graph. The remaining maximum potential gives a conservative bound to the maximum amount of distance that could still be traversed.

Then at any given time during the search, if the current path length plus remaining potential is less than the longest full path found so far, we can stop considering that partial path any further.

The bookkeeping for these potential scores can also be done in constant time, which is great for these types of heuristics.

Adding this heuristic on top of the previous heuristic cut the search space from 59.3M nodes down to just 20.8M nodes, a further -64.9% reduction!

Overall, these two heuristics together took the runtime on PC down from 252.3 milliseconds to just 66.48 milliseconds. (-73.7%)

The updated code with these two heuristics looks like follows:

Day 23 Part 2 With Heuristics Code Listing

#include 
#define S 144                           // 2D map stride: 128+16
char map[S*144];                        // Input 2D map
u16 maxlen;                             // Receives length of longest path.
struct node;
struct edge {                           // Path between two map intersections.
  node *dst;                            // Destination intersection
  u16 len;                              // Distance to that intersection
};
struct node {                           // An intersection
  u8 level;
  u16 best_edge;
  u8 n;                                 // # of paths out from this intersection
  edge next[4];                         // The paths themselves
  bool visited;                         // Tracks if this node has been visited.
  u16 pos;                              // 2D map pos of this intersection.
} *G, *ht[256];                         // G: goal node, ht: hash table for
                                        // graph nodes.
node *get(u16 pos) {                    // Gets or creates a graph node for
  u8 h = pos ^ (pos>>8);                // given map pos.
  while(ht[h] && ht[h]->pos != pos) ++h;
  if (!ht[h]) {
    ht[h]=(node*)calloc(1,sizeof(node));
    ht[h]->pos = pos;
  }
  return ht[h];
}

bool connect(u16 a, u16 b, u16 len) {   // Connects two map intersections with
  node *A = get(a), *B = get(b);        // an edge node
  for(i8 i = 0; i < A->n; ++i)
    if (A->next[i].dst==B) return false;// Skip out if already connected
  A->next[A->n++] = { B, len };         // Undirected graph, A connects to B and
  B->next[B->n++] = { A, len };         // B connects to A.
  return true;                          // Tell caller we did connect the nodes.
}

struct {
  u16 pos, parent;
} gstack[256], *gsp = gstack;           // Stack for converting 2D map to graph.

void graph(u16 goal) {
  while(gsp-- > gstack) {
    u16 start=gsp->parent, next[4],
          pos=gsp->pos, parent=start;
    for(u16 len = 1;; ++len) {
      if (pos == goal) {                // Connect current pos to goal pos if
        connect(start, pos, len);       // we got there.
        break;
      }
      u8 n = 0;
#define VISIT(p) if((p)!=parent&&map[p]\
                != '#') next[n++] = (p);
      VISIT(pos-1);                     // Walk to all neighbors of this pos.
      VISIT(pos+1);
      VISIT(pos-S);
      VISIT(pos+S);
      if (n != 1) {                     // If there are more than one neighbor,
        if (n && connect(start,pos,len))// this is a map intersection, so
         while(n--)*gsp++={next[n],pos};// connect the parent position to this
        break;                          // and add all outbound directions to 
      }                                 // the traversal stack.
      parent = pos;                     // Keep track of parent pos so we don't
      pos = next[0];                    // walk back that way.
    }
  }
}

struct {
  node *n;
  edge *e;
  u16 len;
} stack[256], *sp = stack;              // Walk stack for the map graph.

u8 free_levels[32] = {};
u16 potential = 0;
void path(node *n) {                    // Find the longest path starting at n.
  n->visited = true;                    // Mark n visited.
  potential -= n->best_edge;
  --free_levels[n->level];
  for(i8 i = 0; i < n->n; ++i)          // And queue all its edges.
    *sp++ = { n, &n->next[i], 0 };

  while(sp > stack) {                   // While we have edges to walk.
    --sp;
    edge *e = sp->e;
    node *src = sp->n;
    u16 len = sp->len;
    if (!e) {                           // In walk stack, null edges denote it
      sp->n->visited = false;           // is time to mark back to unvisited.
      potential += sp->n->best_edge;
      ++free_levels[sp->n->level];
    } else if (!e->dst->visited) {      // Enter destination if not visited.
      node *d = e->dst;
      if (d == G) {                     // Is destination the goal node?
        maxlen = MAX(maxlen,            // Accumulate path length if so.
          (u16)(len+e->len));           // Only one path to goal->skip others.
      } else {
        *sp++ = { d, 0, 0 };            // Record a marker on stack that will
        d->visited = true;              // unvisit the destination node later.
        potential -= d->best_edge;      // Reduce best potential remaining.
        --free_levels[d->level];        // And # of nodes in this level set.
        for(i8 i=0; i < d->n; ++i) {
          int from_level = d->level;
          edge &ed = d->next[i];
          if (free_levels[d->level]     // If we are about to exhaust level sets
            || ed.dst->level < d->level)// then only go towards the exit.
          if (len + e->len + ed.len -   // If there is enough potential
            ed.dst->best_edge +potential// remaining in the graph, add this edge
            > maxlen)
          *sp++ = { d, &ed,             // And add all outbound edges of this
                   (u16)(len+e->len) }; // node to the stack.
        }
      }
    }
  }
}

node *queue[256];
void levels(node *n) {                  // Generates level set values with a BFS
  u8 qb = 0, qe = 0;                    // search.
  queue[qe++] = n;
  for(int level = 1;qb != qe; ++level) {
    for(int e = qe; qb != e; ++qb) {
      n = queue[qb];
      n->level = level;
      ++free_levels[level];
      for(i8 i = 0; i < n->n; ++i) {
        if (!n->next[i].dst->level)
          queue[qe++] = n->next[i].dst;
      }
    }
  }
}

int main() {
  BENCH();
  FILE *f = fopen("in23.txt", "r");
  u8 h = 0;                             // Load input map, and calculate map
  for(char *s=map;fgets(s,S,f);s+=S)++h;// height.
  fclose(f);
  u16 goal =strstr(map+S*(h-1),".")-map;// Find goal map position on last line.
  G = get(goal);                        // Create graph node for goal.
  map[1] = '#';                         // Seal off entrance to avoid oo-bounds.
  *gsp++ = { S+1, 1 };                  // Add start pos to graph stack, and
  graph(goal);                          // convert the 2D map to a graph.
  levels(G);
  for(node *n : ht) if (n) {            // Calculate graph potential.
    for(int i = 0; i < n->n; ++i)
      n->best_edge= MAX(n->best_edge,
                        n->next[i].len);
    potential += n->best_edge;
  }
  path(get(1));                         // Then walk the graph to calculate the
  printf("Length: %u\n", (int)maxlen);  // longest path length.
}

With these two heuristics, surely the Commodore 64 would now be able to do much better than the 4 hours runtime!

Well, surprisingly, unfortunately not that much :(

Even though the code was very effective to shrink the runtime on the PC, for some reason it only reduced the C64 runtime by about 25 minutes from the version without any heuristics.

I randomly guess that this might be due to the inner loop of the search now doing so many things that register spilling so many 16-bit quantities becomes a massive problem.

There might be room to optimize, though I decided to cut the experimentation here, since I still have two other puzzles in the queue to finish to get to completing AoC this year. (on PC I expect I should still double the speed if I reverted back to the earlier recursive code)

The final runtimes are then:

Runtime on AMD Ryzen 5950X: 66.48 milliseconds. (207,319.49x faster than the C64)

Runtime on Commodore 64: 3 hours 49 minutes 42.6 seconds.

Even though the C64's execution time is measured in hours today, that all still counts, the C64 totally did it! :)

Time to solve: ~3 hours. I did this one at a later day so did not stream it live, so this number is only an approximation.

Mistakes made:

Day 24

For the puzzle for Christmas eve, we have a math problem. The snow machine is finally running, but it is outputting hail instead of snow.

The problem input is a list of parametric representations of 3D rays representing the flight paths of hailstones, and the task is to find all pairwise intersections of these paths inside a 3D box that is located far out in the number range of hundreds of trillions - at the higher end limit close to the largest safely consecutively representable integer in a 64-bit double.

In the distant past, I developed a 3D math library that included geometric objects for computing these sorts of queries. So parametric 3D ray algebra should not be too alien to me.

What complicated the implementation here was that I needed to avoid floating point numbers, since my compiler doesn't support them. For ray-ray intersection, it basically just means to avoid dividing by the determinant, and to perform the comparisons by multiplying the determinant to the other sides of the compared expression.

I spent a long time solving the issue on my C64, and submitted a wrong answer twice, first had a handling bug with parallel rays, and second, I realized that my libc fscanf() implementation did not properly handle signed 64-bit integer parsing (the %lld specifier).

So I had to take Part One over to the PC, and on the third try, got Part One correct.

After solving Part One, I realized the puzzle author could have made this puzzle meticulously difficult if they had added multiple parallel hailstones in the input set, of which some would reside on the same line going in the same direction, and others in opposite directions.

This would have resulted in requiring the developer to also write a ray-box intersection routine to detect collisions against the test box.

I was half-way towards implementing that corner case, but then realized to actually check the input data, and realized that even though there were some parallel ray pairs, the ray-box scenario would never occur.

Linear Algebra Headache

In Part Two, this "intersection test volume" from Part One was forgotten, and the task was instead to come up with a starting position and velocity for a rock that would be thrown to collide with all the hailstones head-on.

In other words, devise the parameters for a new ray, which would meet with each hailstone (and not just their trajectory paths) at some point.

With 300 hailstone rays in the input, this is an extremely overconstrained problem! So the first observation here is that the inputs must have been carefully crafted in a way so that all the hailstones would indeed all pass through the same line at these types of precise times to collide with a rock.

I started writing down the linear vector equations for this condition to occur, and came to the conclusion that with just six hailstones, I would have 18 linear equations with 18 unknown variables, that would solve the starting position and velocity of the rock.

Except.

Looking at my derivations, I understood that my "linear" equations weren't actually linear, but contained mixed products of the unknowns of the hailstone flight time parameter times the hailstone velocity.

My linear math education hasn't taught me how to solve this type of equation without numerical estimation, so I was blocked. :(

So I started to look for other ways to approach the problem.

Acquiring Velocity Limits In 1D

Given that the problem was so overconstrained, I started to think about the problem in 1D. I scribbled down a 2D graph with X axis being the X position of the hailstones, and the Y axis representing time, to get a feel of how the X positions of the hailstones would evolve as time advances.

From those scribbles I realized that if you have two hailstones in the input, that fly out in separate directions so that their flight paths projected to 1D never intersect, then that gives limits to the starting position and the velocity of the rock.

Here is a visual drawing of such a situation:

X-axis is the X position of the hailstones, and time advances upwards.

If two hailstones start at positions p1 and p2, p1 < p2, and fly out in their separate directions, (v1<0 and v2>0), then this constrains the rock:

  1. Either the rock must be thrown from the left of p1, with a velocity that is greater than v2 so that it can eventually also intercept the second escaping hailstone, or
  2. symmetrically, the rock could also be thrown from the right of p2, but with a velocity smaller than v1 so that it will eventually catch up with the first hailstone.

But the rock cannot start out between p1 and p2, because it couldn't go in two opposite directions to intercept both of these hailstones.

So processing all pairwise hailstones projected into 1D in X, Y and Z independently, gave these bounds for the velocity and position components of the rock.

I felt quite joyed after realizing this, but then observed that the position limits weren't much use at all. The velocity limits would however provide a useful optimization for the C64 (although not critical for a PC).

Deducing Rock Velocity

After a bit more pondering, I realized that I could change the frame of reference from ground to the rock.

That is, if I subtract the velocity of the rock from all the velocities of the hailstones, it is as if the rock was a stationary point, and I get a new modified set of hailstones that should all travel through this single point.

The velocity to subtract from each hailstone is just unknown, but conceptually that gives an easier way to deal with the problem. (This might have helped me fix my linear equations with the mixed products in the previous chapter, I just didn't realize it yet here)

Then, looking some more at the input, I observed that there were several hailstones that have the same 1D velocity when projected to X, Y and Z directions.

This is a fairly unique scenario, so I drew up a diagram to visualize what that means.

Examining the above equations, I realized that if I assume that the rock would always hit each hailstone precisely at time values that were integers (i.e. not fractional), I would have an equation v1 - vr | p2 - p1, i.e. that v1 - vr must be a factor of p2 - p1.

This is a safe assumption to make in the sense that after I get my final solution under such assumptions, I can run the collisions of the rock ray against each hailstone to double check that the path of the rock was good and indeed collides with each hailstone.

If not, I would then have to revise my assumptions and try out something else. Although it turns out in this case the integer collision times did hold.

So, what this means in practice that I could look at each pair of hailstones, and see if their velocities have the same X, Y or Z component. Then, if they do, I could divide the distance between the starting positions of those hailstones into their multiplicative factors, and then come up with possible values that vi - vr could have, and then from there solve for possible values that vr could have.

Then, because the problem is so overconstrained and there are multiple hailstones with identical X,Y and Z velocity components, I could repeat the same process for all such similar hailstone pairs, and I would get a different candidate set for possible values that vr could take.

Running this scheme separately for X, Y and Z components of the velocities, pairing the search with the limit bounds from previous section would *very* quickly yield the only possible values for the rock velocity. I measured that in total the loops in the search code that performed this search, ticked only 1308 times, or 436 times per X/Y/Z component, or ~1.45 times per velocity component of a single hailstone.

The C64 would have absolutely no problem to churn through this search.

If the rock would not haved collided with the hailstones at integer times, this process would have yielded an empty set, giving a clue that the integer time collision assumption failed.

But fortunately they didn't, so by relying on the integer quantization of the collision times of the hailstones gave a way to sidestep the mixed quadratic product in my otherwise linear set of equations that vexed my math skills.

Solving The Rock Starting Position

Now that I had the starting velocity of the rock (precisely, X=-213, Y=282, Z=52 for my rock), I still had to find out what the starting position would have to be.

With the same frame of reference method, I could subtract the rock velocity from the velocities of two different noncollinear hailstones, and those two modified hailstones would then both have to hit the rock starting position at some point.

Since two noncollinear rays can only intersect at one point, that point must be the starting point of the rock.

So the same code from Part One could be used here to solve for the intersection point, and that would be the starting point of the rock. And that's a job done!

The final code took 74 lines:

Day 24 Part 2 Code Listing

#include 
i8 sort_idx;                            // Which of x,y,z we are sorting to.
i16 n;                                  // Total number of hailstones.

struct hail {
  i64 x, y, z;                          // 3D Position
  i16 vx, vy, vz;                       // 3D Velocity
  i64 p(i8 i) {return i<1?x:i<2?y:z; }  // Return i'th position component
  i16 v(i8 i) {return i<1?vx:i<2?vy:vz;}// i'th velocity component
  bool operator <(hail &r) {            // Sort comparator
    return v(sort_idx) < r.v(sort_idx);}
} h[301];

void intersect(hail &d,hail &a,hail &b){// Computes intersection point of lines
  i64 dx = b.x - a.x;                   // a and b and writes it to d.
  i64 dy = b.y - a.y;                   // Calculate dx and dy
  i32 det = (i32)b.vx*a.vy
          - (i32)a.vx*b.vy;             // 2D determinant for matrix inverse.
  i64 t = (b.vx*dy - b.vy*dx)/det;      // Assume integer collision time.
  d.x = a.x + t*a.vx;                   // Output collision position.
  d.y = a.y + t*a.vy;
  d.z = a.z + t*a.vz;
}

i16 search(i8 d) {                      // Searches rock velocity in 1D. d:which
  sort_idx = d;                         // component, X,Y or Z to search.
  qsort(h, n);                          // Sort according to that coordinate.
  i16 min = 0, max = 0;                 // Micro-optimization: look at the data
  for(i16 i=0; h[i].v(d)<min; ++i)      // to find intervals [min,max] that the
    for(i16 j=n-1; h[j].v(d)>max; --j)  // velocity of the rock cannot have.
      if (h[i].p(d) < h[j].p(d)) {      // At the end this gives a condition
        max = MAX(max, h[j].v(d));      // vel <= min || vel >= max for the
        min = MIN(min, h[i].v(d));      // possible velocities, which cuts down
      }                                 // the size of the search loop below.

  struct { i16 v; i8 incr; } s[2] =     // Search possible velocities for the
   {{(i16)(min-1),-1},{(i16)(max+1),1}};// rock, starting at min/max limits and
                                        // spanning farther out.
  for(i16 i = 0; i < n-1; ++i)          // If we have two hailstones with same
    if (h[i].v(d) == h[i+1].v(d)) {     // velocity, then the distance between
      i64 dx = h[i].p(d) - h[i+1].p(d); // the two hailstones must be evenly
      while(s[0].v == h[i].v(d)         // divisible by (h.v - r.v) where h.v
        || dx % (h[i].v(d)-s[0].v)) {   // is the velocity of the hailstones,
        s[0].v += s[0].incr;            // and r.v is the velocity of the rock.
        SWAP(s[0], s[1]);               // Alternate between searching min and
      }                                 // max ranges.
    }
  return s[0].v;                        // Whatever velocity we stop the search
}                                       // at, must be the velocity of the rock.

int main() {
  BENCH();
  FILE *f = fopen("in24.txt", "r");
  char dummy;
  while(fscanf(f,                       // Read input.
    "%lld, %lld, %lld %c %d, %d, %d",
    &h[n].x ,&h[n].y ,&h[n].z, &dummy,
    &h[n].vx,&h[n].vy,&h[n].vz)>6) ++n;
  fclose(f);

  hail r = { 0,0,0,                     // Rock position.
    search(0), search(1), search(2) };  // Search rock velocity.

  hail a = h[0];                        // Shift the coordinate system of two
  a.vx -= r.vx;                         // hailstones so that their velocity is
  a.vy -= r.vy;                         // relative to the rock velocity. I.e.
  a.vz -= r.vz;                         // rock is stationary and hailstones
  hail b = h[1];                        // move. Then the intersection point of
  b.vx -= r.vx;                         // these adjusted hailstones must be the
  b.vy -= r.vy;                         // position where the rock is at.
  b.vz -= r.vz;
  intersect(r, a,b);                    // Find intersection point, and
  printf("Sum=%lld\n", r.x+r.y+r.z);    // output the result.
}

The runtime was nothing like from Day 23 in this case:

Runtime on Commodore 64: 1 minute 17.0 seconds.

Runtime on AMD Ryzen 5950X: 0.663 milliseconds. (116,138.763x faster than the C64)

Time to solve: ~3 hours 40 minutes. I did not clock this so precisely since I wasn't recording this on video.

Mistakes made:

Day 25

The last puzzle for Christmas day. ๐ŸŽ„

Last year taught that the last day is going to be a more or less of a silly problem.

In Part One of the problem, the challenge was to take a graph, and cut exactly three edges in it to separate the graph into two separate parts.

This is a bit of a curious condition - if you take a random graph, you might find tons of ways to cut it into two parts: just take any node with three or fewer edges, and carve that one node out of the original graph. (waste the two other cuts on some other random node if necessary)

I.e. there were no rules or constraints to split the graph into two equal sized parts, or anything like that.

So again like in the problem from yesterday, the input graph that is proposed must have some kind of special structure that the question does not become ambiguous in it.

To understand this, I decided to start with Graphviz to visualize the graph. I was thinking that likely the graph would have to consist of sub-clusters of complete graphs, or something like that.

I reused my code from Day 19 to load in a graph, and then added a bit of code to print out the loaded graph into dot syntax, and fed it to dot.

dot -Tsvg g.txt -o g.svg

It produced the following output:

Arggh.. yeah, that horizontal blurred smudge *is* the graph. It seems like such a huge tangled mess that Graphviz struggles to visualize it, and generates an insanely wide output.

Zooming in to a small part of it, it looks like this:

"Of course the AoC inputs would be tailored to resist easy deciphering", I thought.

But then, I remembered, Graphviz supports different graph layout engines. So let me at least try one of the non-default ones.

I selected the second one on the list after the default one, and typed in:

dot -Tsvg g.txt -o g.svg -Kneato

And was very amused to suddenly stare the following output instead:

Well, I think I may now have a wild guess which "exactly three edges" to cut to break the graph into two parts. :D

Then I adjusted the previous code that printed the graph, to do the cutting and then count up the size of one of the sides to come up with the final answer, and that was the star.

The code looked like follows:

Day 25 Part 1 Code Listing

#include 

struct node {                           // A single circuit node in the graph.
  u16 name;                             // Three chars condensed to 15 bits.
  node *next[9];                        // Output nodes from this edge (max 9)
  u8 n;                                 // Number of output nodes.
} ht[1536];                             // Hash table for graph nodes.

u16 name(const char *n) {               // Converts three char string into a
  return ((n[0]-'A')*26+(n[1]-'A'))*26  // compact 15-bit representation.
       +  (n[2]-'A') + 1;
}

int nodes;                              // Calculate total # of nodes in graph.

node *get(const char *s) {              // Gets or adds a node to graph by name.
  u16 n = name(s);                      // Convert string to 15-bit name.
  u16 h = (n ^ (n>>7)) & 2047;          // Hash the name.
  while (h >= 1536) h -= 1536;
  for(;ht[h].name; h=h<1535?h+1:0)      // Do linear probing to find a/the spot
    if (ht[h].name == n)                // in the hash table.
      return &ht[h];
  ht[h].name = n;                       // If we didn't find the name in table,
  ++nodes;                              // add it as a new entry, and add up
  return &ht[h];                        // # of nodes in the graph.
}

node *getnam(char *s, char **n) {       // Parses a string name from input line,
  char *e = s;                          // and returns new or existing graph
  while(isalpha(*e)) ++e;               // node by that name. Output in 'n' the
  *e = 0;                               // continuation position on the input
  *n = e+1;                             // line after the parsed name string.
  return get(s);
}

void connect(node *a, node *b) {        // Bidirectionally connect two nodes.
  a->next[a->n++] = b;
  b->next[b->n++] = a;
}

void cut1(node *a, node *b) {           // Cuts node B from A.
  for(int i = 0; i < a->n; ++i) {
    if (a->next[i] == b) {
      a->next[i] = a->next[--a->n];
      return;
    }
  }
}

void cut(node *a, node *b) {            // Bidirectionally cut B and A from each
  cut1(a, b); cut1(b, a);               // other.
}

node *stack[256], **sp = stack;         // DFS Flood fill stack to count the
int visit(node *n) {                    // size of a subgraph.
  *sp++ = n;
  int count = 0;
  while(sp > stack) {
    node *n = *--sp;
    if (!n->name) continue;
    ++count;
    n->name = 0;
    for(int i = 0; i < n->n; ++i)
      if (n->next[i]->name)
        *sp++ = n->next[i];
  }
  return count;
}

int main() {
  BENCH();
  FILE *f = fopen("in25.txt", "r");
  char s[100];
  while(fscanf(f, "%s ", s) > 0) {      // Read the input graph.
    node *n = get(s);                   // Add node to graph.
    fgets(s, sizeof(s), f);             // Read all the outputs of this node.
    for(char *c = s; *c > 32;)
      connect(n, getnam(c, &c));        // Connect outputs to current node.
  }
  fclose(f);

  cut(get("qfj"), get("tbq"));          // Do the cuts found by eyeballing
  cut(get("dsr"), get("xzn"));          // Graphviz output.
  cut(get("qqh"), get("xbl"));
  u32 count = visit(get("qfj"));        // And count up the size of one side of
  printf("%lu\n", count*(nodes-count)); // the cut.
}

Although fitting the full graph into C64 memory was a bit of a struggle, but with a little bit of bit shaving worked out nice.

Runtime on Commodore 64: 3 minutes 9.3 seconds.

Runtime on AMD Ryzen 5950X: 0.9697 milliseconds. (195,215.01x faster than the C64)

Time to solve: ~1 hour 30 minutes. The problem was quite fast to solve, but fitting in the C64 memory constraints to count the size of the cut graph was a bit of a workout.

There would definitely exist algorithms for programmatically finding a minimum-cut of a graph, although by now I am already busy filling my belly with chocolate instead.

In Part Two of the puzzle, the final one of Advent of Code this year, all that remains is to push the big red button.

After mustering courage to push big red buttons that typically should not be pushed, it is a relief to see that everything worked out, and the snow machine is finally fixed and working, and snowfall fills the page.

What a feel-good end to the journey.

Looking back at the calendar, all the islands we traveled have lit up one by one as we adventured back home.

And now snow is falling down from the sky. (maybe a bit too much? My mom was complaining on Christmas Eve how she has had too much snow to plow.. We have close to a two foot snow cover right now in Northern Finland. Sorry mom, it was the Elves!)

๐ŸŽ„๐ŸŽ„ The C64 totally did it! ๐ŸŽ„๐ŸŽ„ It solved all the puzzles of the year! ๐ŸŽ„๐ŸŽ„

Total Score

The primary goal I had for this Advent of Code was to be able to write code that computes the solutions to the puzzles on my actual Commodore 64.

A secondary goal was to be able to develop that code fully on the Commodore 64 alone. (with llvm-mos operating as the LAN network compiler)

On some days, this goal was not met, due to e.g. bugs, or due to the lack of productivity on the C64 because it did not have enough computational power for exploring the problem reasonably well.

In this section I'll summarize my "total score" that I achieved with the Commodore 64.

With the exception of the last day, each puzzle has two parts, so each part can achieve two C64s.

At the end of December, the score came out as follows.
1. Day 1: Trebuchet?! (1) (1)
2. Day 2: Cube Conundrum (1) (2)
3. Day 3: Gear Ratios (1) (1)
4. Day 4: Scratchcards (1) (1)
5. Day 5: If You Give A Seed A Fertilizer (1) (1)
6. Day 6: Wait For It (2) (1)
7. Day 7: Camel Cards (1) (2)
8. Day 8: Haunted Wasteland (2) (1)
9. Day 9: Mirage Maintenance (1) (1)
10. Day 10: Pipe Maze (1) (3)
11. Day 11: Cosmic Expansion (2) (1)
12. Day 12: Hot Springs (1) (2)
13. Day 13: Point of Incidence (1) (1)
14. Day 14: Parabolic Reflector Dish (2) (1)
15. Day 15: Lens Library (1) (2)
16. Day 16: The Floor Will Be Lava (1) (1)
17. Day 17: Clumsy Crucible (4) (1) (With RAM Expansion Unit)
18. Day 18: Lavaduct Lagoon (2) (2)
19. Day 19: Aplenty (1) (1)
20. Day 20: Pulse Propagation (2) (1)
21. Day 21: Step Counter (2) (7)
22. Day 22: Sand Slabs (1) (1)
23. Day 23: A Long Walk (1) (1)
24. Day 24: Never Tell Me The Odds (3) (3)
25. Day 25: Snowverload (1) N/A

So yes, I was able to have the Commodore 64 crank out the answers to all the puzzles! ๐ŸŽ‰

As for my secondary goal, I was able to develop 29/49 (~59%) of the solutions by only using my C++ code editor and Advent of Code client on the C64, without resorting to debugging on the PC.

I might have been able to get a little bit better score here if I had managed to create a more robustly tested libc before December, or if I had taken more patient time before giving up to troubleshoot on the PC.

.. but I also had to be careful to not fatigue myself with this challenge. (AoC fatigue is a thing)

Overall, I am super happy with this result! It was a fun month and the C64 did not fail me.

The Power Of llvm-mos

I attribute the fact that I was able to finish this challenge to the llvm-mos project. It is an extremely powerful compiler, and very well adapted to target the 6502.

Had I attempted to work through this challenge using Commodore BASIC v2, or 6502 Assembly, I am sure that they both would have overwhelmed me already at early days.

To my knowledge, Advent of Code has not been fully completed on a Commodore 64 before - this is the first.

Searching the web, Norbert Kehrer completed 10 first days of 2022 in C64 BASIC, although it is unclear what held back later days. (This was the year that I completed on my 16 MHz MikroMikko 386SX)

I also found this post, where someone completed the first day of 2020 in Assembly as a proof of concept, but did not continue on to subsequent days.

I should say that it isn't at all a competition, nor a point of "bragging rights" as the AoC survey put it.

Rather, I find it interesting to gauge the extent of possibilities of different (vintage) computing environments and languages, and Advent of Code gives a nice way to explore them in a way that includes not only performance, but holistically the full picture, so things like scalability, ease of use, ease to learn and productivity as well.

I truly found that llvm-mos plus a custom C64 C++ IDE was a marvellous sweet spot for this kind of a vintage computing challenge.

Runtime Benchmarks

One of the things I found interesting last year was to build statistics to compare the speed of my MikroMikko 386SX CPU against my modern AMD Ryzen 5950X PC.

So this year I did the same. At the end of every day's puzzle, I benchmarked the Commodore 64 against my AMD Ryzen 5950X on Part Two of the problem. (except for Day 25, on Part One, since there was no Part Two)

Here are those comparisons collected in one chart:
RuntimeC645950XPC faster
1. Day 1: Trebuchet?! 1 min0.7 s0.460 ms131,956.52x
2. Day 2: Cube Conundrum 35.4 s0.453 ms78,145.70x
3. Day 3: Gear Ratios 45.5 s0.368 ms123,369.57x
4. Day 4: Scratchcards 2 min59.9 s0.982 ms183,197.56x
5. Day 5: If You Give A Seed A Fertilizer25.4 s1.764 ms14,399.09x
6. Day 6: Wait For It 0.1 s0.06 ms1,666.67x
7. Day 7: Camel Cards 38.6 s0.379 ms101,846.97x
8. Day 8: Haunted Wasteland 1 min42.1 s0.623 ms163,884.43x
9. Day 9: Mirage Maintenance 1 min57.1 s0.386 ms303,367.88x
10. Day 10: Pipe Maze 47.4 s0.646 ms73,374.61x
11. Day 11: Cosmic Expansion 47.5 s0.354 ms134,180.79x
12. Day 12: Hot Springs 9 min8.3 s2.603 ms210,641.57x
13. Day 13: Point of Incidence 51.3 s0.264 ms194,318.18x
14. Day 14: Parabolic Reflector Dish16 min7.2 s15.9 ms60,830.19x
15. Day 15: Lens Library 2 min47.0 s0.385 ms433,766.23x
16. Day 16: The Floor Will Be Lava5 min26.1 s2.703 ms120,643.73x
17. Day 17: Clumsy Crucible 14 min53.3 s14.046 ms63,598.18x
18. Day 18: Lavaduct Lagoon 39.3 s0.481 ms81,704.78x
19. Day 19: Aplenty 1 min51.3 s0.729 ms152,674.90x
20. Day 20: Pulse Propagation 34.6 s0.399 ms86,521.63x
21. Day 21: Step Counter 1 min49.0 s1.0824 ms100,702.14x
22. Day 22: Sand Slabs 1 min41.4 s1.162 ms87,263.34x
23. Day 23: A Long Walk 3 h 49 min42.6 s66.48 ms207,319.49x
24. Day 24: Never Tell Me The Odds1 min17.0 s0.663 ms116,138.76x
25. Day 25: Snowverload 3 min9.3 s0.9697 ms195,215.01x

The PAL Commodore 64 has a 0.985 MHz CPU (985248 Hz to be exact), whereas I clock my AMD Ryzen 5950X to turbo-boost single-core workloads up to 4.5 GHz. So the AMD CPU has a 4,568.5x times higher clock speed.

The last column compares how many times faster the PC was at producing the solution, relative to the Commodore 64.

The geometric mean of these speedup factors is 100,118.53x. So, under these workloads, AMD Ryzen 5950X is a hundred thousand times(!) faster than a Commodore 64.

This means that microarchitecture differences cover for about ~20x speed boost on top of the difference in clock speeds.

The AMD Ryzen 5950X therefore "feels like" a Commodore 64 that is clocked up to 98.642 GHz! :D

... which is actually a slightly surprising result, because last year I measured that this very same AMD Ryzen 5950X PC "feels like" a 145.664 GHz clocked Intel 386SX CPU.

Huh wait, so is a 386SX then worse in performance than a Commodore 64?

No, not at all.

This difference can be attributed to two things:

  1. The problems that were solved were different. So it is not a complete apples-to-apples comparison, and
  2. The C64 enjoyed the benefit of the super-efficient llvm-mos compiler, whereas when on the 386SX, I used the obsoleted Borland Turbo C++ 3.0 compiler.

Now thinking about it, I would actually not be that surprised if the 1MHz C64+llvm-mos would compute some code faster than a 16 MHz 386SX with Borland Turbo C++ 3.0. Compiler development can be like that :)

Conclusion

This was quite the journey.

I was happily surprised to see the Commodore 64 compute all the results, in just under 3 minutes of computation time on average - if ignoring the Day 23 outlier.

When I was preparing for this project, I was anticipating that the C64 would end up taking hours and hours of computation time after the first few days, and so I had actually prepared two secret weapons.

The first was that the Turbo Chameleon 2 cartridge in my Commodore was equipped with a CMD SuperCPU cartridge emulator.

So if I had run into ridiculous execution times, I could have enabled SuperCPU to get somewhere between a 5x-10x speed boost for the C64, to regain my sanity.

The other secret weapon was the MEGA65.

I had originally bought MEGA65 when it became available, so I have one of those with me.

My original plan was, after things would get really heated, to switch to the MEGA65 and run code in whopping 40.5 MHz and with 384KB+8MB of RAM.

But surprisingly, things never really got that spicy, and the good old Commodore 64 held up really well all the way through.

Computationally the hardest two puzzles were Day 17 Part Two where I did rely on a RAM Expansion Unit, and Day 23 Part Two that took almost four hours to calculate.

All the rest completed happily in minutes, and within the 64KB memory limit of the C64.

Community Highlights

The best part of Advent of Code is the wonderful community, with a lot of creative ideas, funny conversations and educational threads.

In this year's Community Showcase, my participation was recognized on two counts.

The first was a mention for using the Commodore 64 from this thread I posted.

A shout out to MarvelousShade for also tackling AoC on a Commodore 64! (although looks like he was working on puzzles from earlier year)

I searched the community survey to see if there would be others targeting Commodore 64 this year,

(where Commodore 64 clearly was the seventh most popular target after Win, Linux, macOS, iOS, ChromeOS and Android, right? right??)

It looks like someone else was indeed doing a similar llvm-mos 6502 thing as me, although it is unclear if they were targeting Commodore 64, or some other 6502 target.

It was fun to reply in the survey to be using my own IDE :)

High five to the other person who was also doing their own editor.

Anyways, the other showcase mention was a highlight of a forum thread:

Throughout December when I wasn't writing code on the C64, I would browse and write on the subreddit. That showcase was from this comment post:

Happy to see when others find my rambling helpful.

Closing Up

For all this puzzling, I assembled the final Advent calendar view from my Advent of Code client program into a single panorama shot.


A 57x31 chars panorama stitch of the completed Advent of Code calendar on the Commodore 64.

It doesn't have the animations, but still looks pretty cool I think.


It is a wonder what this thing can still do.

๐ŸŽ„ Merry Christmas and Awesome New Year 2024 To All 8-Bits Fans! ๐ŸŽ„

Jukka

Credits

Fonts used on this site:

Awesome AI image designs by creative prompts from various Advent of Code subredditors.

The llvm-mos project is developed by https://llvm-mos.org/.

Turbo Chameleon 2 and RR-Net 3 are developed by Individual Computers.

Thanks to all the participants to the Advent of Code subreddit. It was very enjoyable to take part in all the conversations in December.

And of course, big thanks to Eric Wastl for designing Advent of Code!