Tuesday, 25 September 2012

Seeding Randomness

Note: This post is also a demonstration of a simple use of Pthread memory barriers.

Recently I read a discussion on choosing a suitable seed for srand() - ideally you want something random to seed srand() with, and one of the obvious suggestions was to use the current time. This lead me to wonder just how "good" a source of randomness the time is, and how likely two processes/threads are to find the time to be the same in practice.

The most coarse source of time is time(), which has a resolution of one second. Obviously, any program executed more often than every second is going to see the time to be the same as some other program.

Two much better functions in this respect are gettimeofday() and the newer clock_gettime(); they have a best resolution of one millisecond and one nanosecond, respectively (whether they actually have this resolution depends on your machine). But how likely are they to return the same time?

This being the sort of thing that keeps me up late into the night, I set up a program that attempts to spot two calls to one of these functions returning an identical time. My first attempt at just calling either one repeatedly and seeing whether the times were the same showed gettimeofday() frequently came out the same if called consecutively, whereas I couldn't get clock_gettime() to produce the same result twice. (Obviously since this is a practical (i.e. fudged) test, this result depends on my hardware, OS and possibly how my kernel is configured.) Anyway, undeterred I came up with a scheme using threads.

See the listing at the end of the program (with SYNC undefined) to see the attempt at just using raw threads. On my machine (stock Ubuntu 12.04 on a Core i5) with NUM_THREADS set to 300 (which was about as high as I could get it before getting memory allocation errors in pthread_create()) this program had to run 449 times (on average) to generate one run where only a single match for the time was found.

But we can do better than this. Creating a thread takes time - time during which the OS's unpredictable scheduling reigns supreme and with an iron fist, leading our threads to unpredictably diverge. We can force them all to synchronize at a point much closer to the call to get the time by using a Pthread memory barrier. We initialize the barrier with a count of NUM_THREADS, meaning any thread waiting at the barrier will wait until NUM_THREADS threads (i.e. all of our threads) are also waiting at the barrier before it is allowed to continue execution.* From here, there's a lot less logical "distance" to the clock_gettime() call, so we should be able to generate the same time more easily. The barriers in the listing at the end are enabled by passing the -DSYNC flag to gcc when compiling. Note that we get the time as soon as possible after getting past the barrier, and then check whether our barrier wait was successful, to keep the distance between getting past the barrier and getting the time as low as possible.

Using memory barriers, the number of runs we needed to make to get one match was reduced from 449 to 89. Using optimization (-O3) reduced this a little further to 62 times. So, my best attempt results in the same time being returned once every (at best) 62 * 300 = 18600 threads.

So, the conclusion? If I'm ever after high-quality random numbers for security purposes, I'll turn to /dev/random and/or /dev/urandom, but for any other application clock_gettime() is much simpler and should be more than enough for me.

/*
 * same.c
 *
 * See if we can get clock_gettime() to generate two times
 * that are the the same. Can be compiled with -DSYNC to use
 * a memory barrier in the thread to try and force the same
 * time to be retrieved.
 *
 * Compile with (where # is the number of threads you want):
 *
 *   $ gcc same.c -DNUM_THREADS=# [-DSYNC] -pthread -lrt
 *
 * Copyright (C) 2012 John Graham. This source code is
 * released into the public domain; you may use it as you
 * wish.
 */

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>

/* Array into which threads place their times. */
struct timespec times[NUM_THREADS];

#ifdef SYNC
/* Memory barrier at which threads sync (if applicable)
 * immediately before getting the time. */
pthread_barrier_t barrier;
#endif

void *thread(void *arg)
{
  /* Our argument is the index into 'times'. */
  intptr_t index = *(intptr_t *) arg;

#ifdef SYNC
  /* Wait at the barrier. This will block until all
   * threads are at the same point in their execution. */
  int pret = pthread_barrier_wait(&barrier);
#endif

  /* Get the time ASAP, *then* check the return value from
   * pthread_barrier_wait(). */
  if (clock_gettime(CLOCK_MONOTONIC, &times[index]) != 0) {
    perror("getting time");
    exit(EXIT_FAILURE);
  }
#ifdef SYNC
  if (pret != 0 && pret != PTHREAD_BARRIER_SERIAL_THREAD) {
    fprintf(stderr,
            "Waiting at barrier: %s\n", strerror(pret));
    exit(EXIT_FAILURE);
  }
#endif

  return NULL;
}

/* See if *any* of the entries in 'times' are identical. */
int compare_times(void)
{
  int nsame = 0;
  int i, j;
  for (i = 0; i < NUM_THREADS; i++)
    for (j = i + 1; j < NUM_THREADS; j++)
      if (times[i].tv_sec == times[j].tv_sec &&
          times[i].tv_nsec == times[j].tv_nsec)
        nsame++;
  return nsame;
}

int main()
{
  int i;   /* Generic index. */
  intptr_t nums[NUM_THREADS]; /* Indices to pass to
     * threads. */
  pthread_t threads[NUM_THREADS];

#ifdef SYNC
  if (pthread_barrier_init(&barrier,
                           NULL, NUM_THREADS) != 0) {
    perror("pthread_barrier_init()");
    exit(EXIT_FAILURE);
  }
#endif

  /* Start all our threads, initializing indices as we go
   * along. */
  for (i = 0; i < NUM_THREADS; i++) {
    nums[i] = i;
    if (pthread_create(&threads[i],
                       NULL, thread, &nums[i]) != 0) {
      perror("pthread_create()");
      exit(EXIT_FAILURE);
    }
  }

  /* Wait until all threads have finished. */
  for (i = 0; i < NUM_THREADS; i++) {
    if (pthread_join(threads[i], NULL) != 0) {
      perror("pthread_join()");
      exit(EXIT_FAILURE);
    }
  }

  int times = compare_times();
  printf("%d matches\n", times);
  return times;
}


* If you want a demonstration that the memory barrier is working, initialize it with a count of NUM_THREADS+1. Since this many threads will never wait on the barrier, every thread will sit and wait there indefinitely, doomed to spend the rest of their sad, sorry lives waiting for a thread that doesn't exist (it won't even wake up for EINTR, unless this kills the program).

No comments:

Post a Comment