Saturday, 29 September 2012

Vague Essays (in SD329)

I'm doing a level 3 (3rd year) science course with the Open University (SD329, if you're interested). As part of the assignments, this course includes some essays; not a big deal, I've done a course with 2-3 thousand word essays in it before. But this is the first one where the essay questions were rather vague.

There was a lot of talk on the course website/The Facebooks about this. Humanities students might be used to these sorts of essay questions, but it seemed to really phase some students when they did their assignments. I was a little put off at first, but then when talking to my tutor about what they were looking for, it became quite clear that, at least in OU science courses:
If you're given a vague essay question, it's for a good reason.
This is generally because either:
  • You've only covered so much material on that topic, and writing an essay about it all in the word count would be easy (even if you include some extra research) or, more likely,
  • They're deliberately giving you scope to waffle on about whatever you want, so long as you keep to the general topic.
Once I realised this, it became more easy to think of them as "open-ended" essays as opposed to vague ones; the people who set the assignment don't have some secret, hidden question they "really" want answered, they just want you to talk about anything on the topic.

This whole thing may seem obvious to you; but it's the sort of implied knowledge nobody has told me (and at least a few others on my course) about. Science students generally like to know exactly what's required when doing an assignment, so being given free reign over a topic can be a little uncomfortable/unnatural at first - but realizing that it's expected makes writing these sorts of essays much easier.

Running a program and interacting with it.

Here's an example of a function, ccl_popen3(), that opens a program with the standard streams that you want to write to/read from open. It's from a while ago, I'll brush it up later (in theory).

Either way, ccl_popen3() is the money function, and main() demonstrates usage. Compiles on my machine with gcc popen3.c -o popen3 -std=gnu99 -Wall -Wextra -Werror.
#include <stdlib.h>
#include <unistd.h>

#define READ_END  0
#define WRITE_END 1
#define NENDS     2

#define OPEN_STDIN  (1 << STDIN_FILENO)
#define OPEN_STDOUT (1 << STDOUT_FILENO)
#define OPEN_STDERR (1 << STDERR_FILENO)
#define OPEN_ALL    (OPEN_STDIN | OPEN_STDOUT | OPEN_STDERR)
#define MAX_STREAMS 3



__attribute__((noreturn))
static void popen3_child(const char *exe,
                         char *const argv[],
                         int shell,
                         int streams,
                         int pipes[MAX_STREAMS][NENDS])
{
    for (int i = 0 ; i < MAX_STREAMS ; i++)
        if (streams & (1 << i)) {
            close(pipes[i][i == STDIN_FILENO ?
                           WRITE_END : READ_END]);
            if (dup2(pipes[i][i == STDIN_FILENO
                              ? READ_END : WRITE_END],
                     i) == -1)
                exit(127);
        }

    if (shell)
        execvp(exe, argv);
    else
        execv(exe, argv);

    /* Any return from execv()/execvp() is an error. */
    exit(127);
}

static void popen3_parent(int streams,
                          int fds[],
                          int pipes[][NENDS])
{
    for (int i = 0 ; i < MAX_STREAMS ; i++)
        if (streams & (1 << i)) {
            close(pipes[i][i == STDIN_FILENO ?
                           READ_END : WRITE_END]);
            fds[i] = pipes[i][i == STDIN_FILENO ?
                              WRITE_END : READ_END];
        }
}



/*
 * TODO: (i) different functions for child and parent, (ii)
 * close original pipe fd's.
 */
pid_t ccl_popen3(const char *exe,
                 char *const argv[],
                 int shell,
                 int streams,
                 int fds[])
{
    int pipes[MAX_STREAMS][NENDS] = {{-1, -1},
                                     {-1, -1},
                                     {-1, -1}};

    for (int i = 0 ; i < MAX_STREAMS ; i++)
        if ((streams & (1 << i)) && (pipe(pipes[i]) == -1))
            goto err;

    pid_t pid = fork();
    switch (pid) {
    case -1:  /* error */
        goto err;
    case 0:   /* child */
        popen3_child(exe, argv, shell, streams, pipes);
        break;
    default:  /* parent */
        popen3_parent(streams, fds, pipes);
        return pid;
    }

 err:
    for (int p = 0 ; p < MAX_STREAMS ; p++)
        for (int e = 0 ; e < NENDS ; e++)
            if (pipes[p][e] != -1)
                close(pipes[p][e]);

    return -1;
}



#include <poll.h>
#include <stdio.h>

void ccl_poll_input(int fds[3])
{
    struct pollfd pollfds[2] = {
        { .fd = fds[STDOUT_FILENO], .events = POLLIN },
        { .fd = fds[STDERR_FILENO], .events = POLLIN }
    };

    int running = 1;
    while (running) {
        int res = poll(pollfds, 2, -1);

        if (res == -1) {
            perror("poll()");
            exit(EXIT_FAILURE);
        }

        struct pollfd *pfd = pollfds;
        for (int i = 0 ; i < 2 ; i++, pfd++) {
            if (pfd->revents & (POLLERR | POLLNVAL)) {
                fprintf(stderr, "poll() error\n");
                exit(EXIT_FAILURE);
            }

            if (pfd->revents & (POLLIN | POLLHUP)) {
                char buf[1024];
                ssize_t nread = read(pfd->fd,
                                     buf, sizeof(buf) - 1);

                if (nread == -1) {
                    perror("read()");
                    exit(EXIT_FAILURE);
                }

                if (nread == 0) {
                    /* EOF */
                    running = 0;
                    break;
                }

                buf[nread] = '\0';
                printf(" Read from %d: \"%s\"\n",
                       pfd->fd, buf);
            }
        }
    }
}



#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <poll.h>

int main(void)
{
    int grep_fds[3];

    const char *exe = "/bin/grep";
    char *const argv[] =
      {"grep", "-h", "XX", "jkfdsl", "-", NULL};
    
    pid_t grep_pid =
      ccl_popen3(exe, argv, 0, OPEN_ALL, grep_fds);

    if (grep_pid == -1) {
        perror("ccl_popen3()");
        exit(EXIT_FAILURE);
    }

    /* Write some input to grep's standard input, and close
     * it. */
    const char *input = "abcd\ngg9XXp\nDqdXXpird\n";
    if (write(grep_fds[STDIN_FILENO], input, strlen(input))
        != (ssize_t)strlen(input)) {
        perror("write()");
        exit(EXIT_FAILURE);
    }
    close(grep_fds[STDIN_FILENO]);

    printf("out: %d, err: %d\n", grep_fds[STDOUT_FILENO],
           grep_fds[STDERR_FILENO]);

    struct pollfd pollfds[2] = {
        { .fd = grep_fds[STDOUT_FILENO], .events = POLLIN },
        { .fd = grep_fds[STDERR_FILENO], .events = POLLIN }
    };

    int running = 1;
    while (running) {
        int res = poll(pollfds, 2, -1);

        if (res == -1) {
            perror("poll()");
            exit(EXIT_FAILURE);
        }

        struct pollfd *pfd = pollfds;
        for (int i = 0 ; i < 2 ; i++, pfd++) {
            if (pfd->revents & (POLLERR | POLLNVAL)) {
                fprintf(stderr, "poll() error\n");
                exit(EXIT_FAILURE);
            }

            if (pfd->revents & (POLLIN | POLLHUP)) {
                char buf[1024];
                ssize_t nread = read(pfd->fd,
                                     buf, sizeof(buf) - 1);

                if (nread == -1) {
                    perror("read()");
                    exit(EXIT_FAILURE);
                }

                if (nread == 0) {
                    /* EOF */
                    running = 0;
                    break;
                }

                buf[nread] = '\0';
                printf(" Read from %d: \"%s\"\n",
                       pfd->fd, buf);
            }
        }
    }

    int grep_status;
    if (waitpid(grep_pid, &grep_status, 0) == -1) {
        perror("waitpid()");
        exit(EXIT_FAILURE);
    }

    if (WIFEXITED(grep_status))
        printf("grep exited with status %d\n",
               WEXITSTATUS(grep_status));
    else
        // TODO: This better.
        printf("grep terminated abnormally\n");
    return EXIT_SUCCESS;
}

Wednesday, 26 September 2012

C on Linux Kickstart

Overview


So, there are plenty of resources for learning how to program in C/C++ (see the end of this post) but here's a quick guide on how to get off to a good start with the practical matters. It's the sort of stuff I wish someone had pointed out to me when I started, since it would have made my life a lot easier.

While graphical IDE's may make your life easier for large projects, a knowledge of how the core tools used in making programs fit together is essential if you're going to go anywhere with your programming; where more complex tools let you down, you can pick yourself up if you know what they're doing behind the scenes. This is why this guide will start you off using the compiler on the command line.

The command line


If you're not already used to using the command line, have a look at this chapter of Introduction to Linux: A Hands On Guide. You'll need to get comfortable with moving around the file system, creating and listing the contents of directories, etc.

What to use & install


The two absolute essentials you'll need are (i) a text editor and (ii) a compiler. You might already be a fan of the text editors vi or emacs, both popular with programmers, but they can have a steep learning curve. If you've got a graphical Linux distribution it will have a default text editor that will doubtless have syntax highlighting (this makes certain parts of your source code stand out). The defaults for Gnome, Kde and Xfce are gedit, kate and mousepad respectively. So from the command-line, you can open e.g. the file program.c from a Gnome desktop by typing the following at a terminal:
    gedit program.c
As for a complier, by far the most popular option is GCC. On Debian/Ubuntu, you can install this, as well as all the development libraries you'll need, by installing the build-essential package (as root):
    apt-get install build-essential
On Fedora/Red Hat, the same effect can be achieved by:
    yum groupinstall "Development Tools" "Development Libraries"
If you have another distribution and you're not sure what to install, ask around; if you can compile the main.c program in the next section, you have what you need.

How to compile programs


To give us something to work with, open your text editor of choice, write the following into it (copy & paste if you like/you get errors) and save it as main.c:
#include <stdio.h>

int main()
{
    printf("hello world\n");
}
We won't go into what the statements mean (I'll leave that to the tutorials at the end of this post) but this gives us something practical to work with.
Now, you can't run this file directly, you first have to compile it into a file that you can. The simplest way to do this is:
    gcc main.c
This will produce a file called a.out. You can run this in the usual way (from where you are just after compiling, type ./a.out). If you like, you can specify the output file to be something different by using -o filename (e.g. gcc main.c -o main).

However, you will make your programming life easier if you never compile your programs like this; it will lead you down a bad path. When you compile programs like this, you will let GCC ignore many potentially dangerous practices. Luckily, GCC has a way to alert you to these in the form of its warning options - they control what situations GCC will warn you about, and what situations it will silently ignore. The defaults are pretty lax, and while they may be useful in compiling legacy code you know to work and just want to compile, when learning and/or writing new code, you want your compiler to be as strict as possible. There are an endless variety of options you can fine-tune this strictness to your heart's content if you like, but for now there are three that will serve you very well: -Wall, -Wextra and -Werror. The first two of these enable a group of common warnings, and the last turns all warnings (that would usually just cause the compiler to emit a message about the offending construct) into errors that cause the compiler to flat-out refuse to finish compiling your program until you've sorted the error out.

These warnings are there for a reason; use them. They will prevent you from making mistakes that will lead to bugs that will just needlessly waste your time. A quite common forum post from new programmers reads essentially "here's a program that doesn't work the way I expect - help!". At least 50% of the time, I can copy + paste the program and compile with -Wall -Wextra and the result will tell me what they've done wrong, without even having to look at their program. Why spend time searching for errors when you can let your compiler do it for you?

In short, always compile your programs with at least -Wall -Werror - try it now with our main.c above:
    gcc main.c -o main -Wall -Werror
You'll notice I've not included -Wextra in the above command-line. -Wextra can be a bit harsh, especially on new programmers; use it to double-check a program that's not behaving as you expect, or for any code you hope to give to others (including posting it on a forum) for a thorough sanity-check first.

Dealing with error messages


There is one golden rule when dealing with error messages. At first you (like me) will probably ignore it, only to learn through bitter experience that it applies no matter how experienced you are. Luckily, it's very simple:
  • Always deal with the very first error message emitted by the compiler.
The reason for this is quite simple: once the compiler sees something wrong, it's very likely that later things will make even less sense to it than they ordinarily would. So find the first error the compiler spots, fix it, then try recompiling. Often you will fix several error messages by fixing one actual error.

The error messages emitted by GCC are always in the format file:line[:position]: ERROR MESSAGE. If you've tried compiling the above main.c with -Wall you may have already seen the following:
    main.c:6:1: warning: control reaches end of non-void function
This lets us know that the error occurred in main.c on line 6 (and was estimated to be at character 1, but that's rarely useful). Exactly what this means you'll learn in time, but in this case we can fix this by turning main.c into:
#include <stdio.h>

int main()
{
    printf("hello world\n");
    return 0;
}
Try recompiling and you'll see no error message.

When you see an error message you don't understand, don't ignore it or start turning warning options off until it goes away. A quick copy-and-paste it into Google (leaving out any specifics like file names, function names and line numbers) will usually tell you exactly what's gone wrong:
    main.c:6:1: warning: control reaches end of non-void function

Where to go from here


There are several good introductions to programming in C on the web, my favourite being this one.

When you get stuck, forums can be invaluable - there is a Linux-specific forum with an active programming section here, and active forums for C and C++ programming here and here, respectively. You'll get good, fast responses if you include (i) code that demonstrates the problem you're having, (ii) what you think the code should do and (iii) what's going wrong.

Tuesday, 25 September 2012

Seeding Randomness

Note: This post is also a demonstration of a simple use of Pthread memory barriers.

Recently I read a discussion on choosing a suitable seed for srand() - ideally you want something random to seed srand() with, and one of the obvious suggestions was to use the current time. This lead me to wonder just how "good" a source of randomness the time is, and how likely two processes/threads are to find the time to be the same in practice.

The most coarse source of time is time(), which has a resolution of one second. Obviously, any program executed more often than every second is going to see the time to be the same as some other program.

Two much better functions in this respect are gettimeofday() and the newer clock_gettime(); they have a best resolution of one millisecond and one nanosecond, respectively (whether they actually have this resolution depends on your machine). But how likely are they to return the same time?

This being the sort of thing that keeps me up late into the night, I set up a program that attempts to spot two calls to one of these functions returning an identical time. My first attempt at just calling either one repeatedly and seeing whether the times were the same showed gettimeofday() frequently came out the same if called consecutively, whereas I couldn't get clock_gettime() to produce the same result twice. (Obviously since this is a practical (i.e. fudged) test, this result depends on my hardware, OS and possibly how my kernel is configured.) Anyway, undeterred I came up with a scheme using threads.

See the listing at the end of the program (with SYNC undefined) to see the attempt at just using raw threads. On my machine (stock Ubuntu 12.04 on a Core i5) with NUM_THREADS set to 300 (which was about as high as I could get it before getting memory allocation errors in pthread_create()) this program had to run 449 times (on average) to generate one run where only a single match for the time was found.

But we can do better than this. Creating a thread takes time - time during which the OS's unpredictable scheduling reigns supreme and with an iron fist, leading our threads to unpredictably diverge. We can force them all to synchronize at a point much closer to the call to get the time by using a Pthread memory barrier. We initialize the barrier with a count of NUM_THREADS, meaning any thread waiting at the barrier will wait until NUM_THREADS threads (i.e. all of our threads) are also waiting at the barrier before it is allowed to continue execution.* From here, there's a lot less logical "distance" to the clock_gettime() call, so we should be able to generate the same time more easily. The barriers in the listing at the end are enabled by passing the -DSYNC flag to gcc when compiling. Note that we get the time as soon as possible after getting past the barrier, and then check whether our barrier wait was successful, to keep the distance between getting past the barrier and getting the time as low as possible.

Using memory barriers, the number of runs we needed to make to get one match was reduced from 449 to 89. Using optimization (-O3) reduced this a little further to 62 times. So, my best attempt results in the same time being returned once every (at best) 62 * 300 = 18600 threads.

So, the conclusion? If I'm ever after high-quality random numbers for security purposes, I'll turn to /dev/random and/or /dev/urandom, but for any other application clock_gettime() is much simpler and should be more than enough for me.

/*
 * same.c
 *
 * See if we can get clock_gettime() to generate two times
 * that are the the same. Can be compiled with -DSYNC to use
 * a memory barrier in the thread to try and force the same
 * time to be retrieved.
 *
 * Compile with (where # is the number of threads you want):
 *
 *   $ gcc same.c -DNUM_THREADS=# [-DSYNC] -pthread -lrt
 *
 * Copyright (C) 2012 John Graham. This source code is
 * released into the public domain; you may use it as you
 * wish.
 */

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>

/* Array into which threads place their times. */
struct timespec times[NUM_THREADS];

#ifdef SYNC
/* Memory barrier at which threads sync (if applicable)
 * immediately before getting the time. */
pthread_barrier_t barrier;
#endif

void *thread(void *arg)
{
  /* Our argument is the index into 'times'. */
  intptr_t index = *(intptr_t *) arg;

#ifdef SYNC
  /* Wait at the barrier. This will block until all
   * threads are at the same point in their execution. */
  int pret = pthread_barrier_wait(&barrier);
#endif

  /* Get the time ASAP, *then* check the return value from
   * pthread_barrier_wait(). */
  if (clock_gettime(CLOCK_MONOTONIC, &times[index]) != 0) {
    perror("getting time");
    exit(EXIT_FAILURE);
  }
#ifdef SYNC
  if (pret != 0 && pret != PTHREAD_BARRIER_SERIAL_THREAD) {
    fprintf(stderr,
            "Waiting at barrier: %s\n", strerror(pret));
    exit(EXIT_FAILURE);
  }
#endif

  return NULL;
}

/* See if *any* of the entries in 'times' are identical. */
int compare_times(void)
{
  int nsame = 0;
  int i, j;
  for (i = 0; i < NUM_THREADS; i++)
    for (j = i + 1; j < NUM_THREADS; j++)
      if (times[i].tv_sec == times[j].tv_sec &&
          times[i].tv_nsec == times[j].tv_nsec)
        nsame++;
  return nsame;
}

int main()
{
  int i;   /* Generic index. */
  intptr_t nums[NUM_THREADS]; /* Indices to pass to
     * threads. */
  pthread_t threads[NUM_THREADS];

#ifdef SYNC
  if (pthread_barrier_init(&barrier,
                           NULL, NUM_THREADS) != 0) {
    perror("pthread_barrier_init()");
    exit(EXIT_FAILURE);
  }
#endif

  /* Start all our threads, initializing indices as we go
   * along. */
  for (i = 0; i < NUM_THREADS; i++) {
    nums[i] = i;
    if (pthread_create(&threads[i],
                       NULL, thread, &nums[i]) != 0) {
      perror("pthread_create()");
      exit(EXIT_FAILURE);
    }
  }

  /* Wait until all threads have finished. */
  for (i = 0; i < NUM_THREADS; i++) {
    if (pthread_join(threads[i], NULL) != 0) {
      perror("pthread_join()");
      exit(EXIT_FAILURE);
    }
  }

  int times = compare_times();
  printf("%d matches\n", times);
  return times;
}


* If you want a demonstration that the memory barrier is working, initialize it with a count of NUM_THREADS+1. Since this many threads will never wait on the barrier, every thread will sit and wait there indefinitely, doomed to spend the rest of their sad, sorry lives waiting for a thread that doesn't exist (it won't even wake up for EINTR, unless this kills the program).

Monday, 24 September 2012

This Blog

By the way, this blog could be viewed as a continuation/expansion of this earlier one I started which quickly went stale, I think mainly due to its scope being too narrow. Or maybe I just have a limited attention span; I guess time will tell.

It's also a place I'll store things I find myself repeating and what a handy reference to, as well as random things (especially code samples) that may be useful to others and that I want to keep. A bit like a public Dropbox (which I highly recommend, by the way).

Some of you might find the previous blog far too geeky/technical - there will be some posts like that in this blog, but there will also be some general/random ones.

The Sea of Abandoned Blogs

There are many abandoned blogs out there. The most poignant I've seen recently may be this one, whose only post ten years ago states that Blogger has given this blogger something to live for.

I really hope he was just kidding...

The Cost of a Degree

I'm doing a degree with the Open University at the moment, and I get really annoyed when my friends enquire about how they might go about a similar endeavour. Not annoyed at my friends because I have anything against them doing the same - the more the merrier, as far as I'm concerned. I get annoyed that I have to tell them that they may have to pay thousands of pounds more than me for the privilege, and all because I snuck onto the system earlier than they did.

For those not familiar with OU degrees, here's basically how it breaks down. You do modules that each carry a certain number of points at a certain level. Undergraduate level 1 corresponds to the first year of a degree, level 2 to the second and level 3 to the third and final year. Usually, you need 120 points from each level for a degree.

So, 120 points of part-time, modular study corresponds to one year of a degree, except you don't (necessarily) do 120 points every calendar year (you can, but you'd better have a lot of free time and/or a penchant for late nights). Anyway, the cost of each course is roughly proportional to the number of points it carries, maybe more if it's a residential course or for certain subjects like law and IT. So whether you study four courses worth 30 points or two worth 60 for a year's worth of study, you'll pay roughly the same for the total 120 points for that level. When I started studying with the OU, the cost for 120 points of study was roughly £1,400.

However, since the university tuition fee reform here in the UK, the cost has gone up drastically. Now you can expect to pay £5,000 for 120 points of study - that's a 350% increase in tuition fees. And remember, that's five grand for a single year's worth of study; to do all three years of a degree, you'd now I'd have to pay £15,000 instead of £4,200.

Luckily, since I started studying at the lower rate I can continue on these lower fees so long as I keep within certain rules like studying something every year and finishing by a certain time. If I'd started studying one academic year later, or if I step outside the guidelines (e.g. by taking a year off studying) I would/will be on the higher rate, and there's just no way I can afford to do that.

"Arrrr!" I hear you cry. "Aye, but there be grants available fer ye if ye wants te study, but ye cannot shell out the doubloons fer it". Well, not for me, since I already have a degree, so I can't get any financial aid for another one. Then again, maybe this is my problem - perhaps I'm being selfish in wanting to broaden my education? Is another degree really just a selfish endeavour I should be made to pay through the nose for? I don't know; I'm just glad I started when I did.