Terrible Terrible C

Today I want to talk about a couple of the terrible things that C (or a compiler extension) allows you to do.

First up: labels as values

gcc has an extension that allows you to treat a label as a value. Why it has such a feature is something of an enigma to me, but it does all the same. Most notable though is this snippet.

You may not use this mechanism to jump to code in a different function. If you do that, totally unpredictable things will happen. The best way to avoid this is to store the label address only in automatic variables and never pass it as an argument.

Of course, gcc does nothing to prevent you from returning the value of a label and then later jumping to it. In fact, that almost sounds like a challenge. And behold

#include <stdio.h>

void *label_value(int really)
{
    if (really)
        return &&LABEL;
    else {
        LABEL:
        printf("WHY!!!\n");
    }
}

void jump(void *LABEL)
{
    goto *LABEL;
}

int main()
{
    jump(label_value(1));
    return 0;
}
$ gcc lab.c
lab.c: In function ‘label_value’:
lab.c:7: warning: function returns address of local variable
$ ./a.out
WHY!!!

To be clear about what’s happening here. label_value returns the a pointer to the position in memory where goto will jump and continue executing. We’re passing this to another function, which is then jumping right back into the function that just returned. So when we call jump, the ‘jump’ function never actually returns, and label_value returns twice.

Aside from being completely awful, this works exactly as we expect. I can imagine all sorts of wonderful uses for this feature, including C’s long missing feature: continuations. Here’s a first order approximation.

#include <stdio.h>

void *label_value(int really)
{
    void *label = &&LABEL;
    int x = 2, y = 3;

    if (really)
        return &&LABEL;
    else {
        LABEL:
        printf("%d, %d\n", x, y);
        x = 2, y = 3;
        goto *label;
    }
}

void jump(void *LABEL)
{
    void *label[] = { &&WAT, &&WAT, &&WAT };
    int x = 4, y = 5;
    goto *LABEL;
    WAT:
    printf("%d, %d\n", x, y);
}

int main()
{
    jump(label_value(1));
    return 0;
}
$ ./a.out
4, 5
2, 3

Now we’re starting to have fun. Using stack locations as variables, we can control where each function jumps to as well as the values it sees for its variables.

Exhibit 2: Arrays as functions

C is a weakly typed language. C is so weakly typed, that you can cast an array to a function pointer and call it. This only works if the array lives in the right part of the object file though. Consider though:

#include <stdio.h>

unsigned char return_number[] __attribute__((section (".text"))) = 
      {0x55, 0x48, 0x89, 0xE5, 0xB8, 255, 255,0,0, 0x5D,0xC3 };

int main()
{
  int (*fn)(void);
  fn = (int(*)(void))&return_number;
  printf("%d\n", (*fn)());
  return 0;
}

We initialize a global array of chars and tell the compiler and linker to make sure it puts this array in the text section. The array itself contains a bunch of random seeming values.

In main we then declare fn as a pointer to a function returning int and taking no arguments and tell it to point to our array. Then we just dereference it and call it. What do we get?

$ gcc arraycode.c
/tmp/enggrid1.BU.EDU.tmp.Ymm30062/ccmP8SbE.s: Assembler messages:
/tmp/enggrid1.BU.EDU.tmp.Ymm30062/ccmP8SbE.s:3: Warning: ignoring changed section attributes for .text
$ ./a.out
65535

Why would we get that? Let’s consult objdump.

00000000004004ac <return_number>:
 4004ac: 55                     push %rbp
 4004ad: 48 89 e5               mov %rsp,%rbp
 4004b0: b8 ff ff 00 00         mov $0xffff,%eax
 4004b5: 5d                     pop %rbp
 4004b6: c3                     retq

This will just return the number 0xffff (65535) which is exactly what we observe.

One thing you may be tempted to do is to modify the array and change the return value of the function. Unfortunately this segfaults as the text section is not writable. There’s probably a flag in the elf file that would make it writable, but I’ve yet to find a way to flip it.

So what’s the point?

There is no real point to any of this. Nobody should ever write real code that does these things, but it’s fun to play with. I guess the real take away here is the C really is just portable assembly, and if you really want to you can pretty easily peel back the layers of abstraction that it gives you and write non-portable assembly with it!