Poisson Processes and Friendsgiving Drinking Games

5:23 PM, December 16, 2024

Edited 10:03 PM, December 16, 2024

Edited 11:03 PM, December 16, 2024

I started this blog post, and meant to post it, not very long after thanksgiving, but unfortunately the last round of final homework assignments and projects kept me busy up until the day before yesterday. After having a wonderful thanksgiving with my family, the day after thanksgiving I made a four hour drive down to Earlton NY where me and a group of friends had rented a house in the woods where we had a ‘friendsgiving’ celebration for those members of the group who were not able to see their real families for thanksgiving due because they are not from Rochester. Needless to say, we had an amazing time their. On the evening of the day I drove down I found some time to read the Poisson Processes chapter of ‘Introduction to Stochastic Processes with R’, where I more or less expanded on a lot of the knowledge I learned in my undergrad about Poisson and Exponential Random variables from James Marengo and Tony Wong. And then after reading the chapter, I was sucked into a drinking game that could very easily be thought of as the combination of several poisson processes that I had just read about in the textbook! Rather than regurgitate the whole chapter, I’m simply going to explain intuitively how poisson processes work and how they connect both to the exponential, poisson, and uniform random variables that we all know and love and to my friends’ and I’s attempts to get each other completely inebriated (just as a side note: we’re all old enough and smart enough to know how to drink responsibly).

First of all, a poisson process is a type of counting process. A counting process (N_t)_{t \geq 0} is a collection of non-negative, integer valued random variables such that if 0 \leq s \leq t then N_s \leq N_t. In other words, if t is time, the process moves forward in continuous time and ‘counts’ the number of jumps that have happened between 0 and t, which can’t go down once a jump has happened. To give an example, let’s say that you’re sitting on your driveway counting cars. From the time you start counting, t = 0, to ten minutes from then, t = 10, you count six cars. Obviously it can’t be the case that between t = 0 and t = 5 you counted seven cars. And you also obviously can’t count a non-integer number of cars. Since the number of cars you’ve seen up to any time t is a random variable, the collection of all these random variables across all possible t’s is a counting process. Basically, a counting process counts the number of things that happen across time (or some other t, like you could count the number of skid marks on a road and t could be length, but t is usually time so from now on we’ll just use t and time interchangeably).

There are many ways to define a poisson process, but the fundamental intuition is that the jumps are equally likely to happen at any time. You’re just as likely to see a car at t = 1.73…, t=10.432…, or t=1103802.2038293…, and as a result the distribution of the number of jumps in any fixed interval is the same for any two non-overlapping intervals, so the distribution of the number of events for 6 \leq t \leq 16 is the exact same as the number of jumps for 10006 \leq t \leq 10016. There are a few facts about poisson process that come from this directly. For example, no matter where you enter the poisson process, the distribution of the time that you’ll have to wait before you see a jump is going to be the same, regardless of the past. Thus the distribution of the waiting time X satisfies X - t | X \geq t \sim X, the ‘memoryless’ property. Since only the exponential distribution has this property, X is exponentially distributed. A brief proof I’ve written that X - t | X \geq t \sim X implies F(X) = 1 - e^{-\lambda x} for some positive constant \lambda:

P(X - t > x | X \geq t) = P(X > x)

P(X > x + t\textrm{ and }X \geq t)/P(X > t) = 1 - F(x)

P(X > x + t\textrm{ and }X \geq t) = (1 - F(x))(1 - F(t))

Let’s now assume that X is positively valued. Since all x’s we care about are positive, it follows that P(X > x + t\textrm{ and }X \geq t) = P(X > x + t), so

1 - F(x + t) = (1 - F(x))(1 - F(t))

Now let’s define g to be the simple function g(y) = 1 – y. So we can rewrite the above as

g(F(x + t)) = g(F(x))g(F(t))

Now notice that g(F(...)) is just another function, and this function has to have the property that for any two numbers t and x its true that multiplying g(F(x)) and g(F(t)) is equal to the function applied to x + t. And what are the only kinds of (not trivial) functions for which that is always true!? Exponents! So g(F(x)) has to be something like b^x, so that b^{x+t} = b^{x}b^{t}. Since for any b we can write this as b^x = (e^{log(b)})^x = e^{log(b) x}, g(F(x)) must be able to be written in the form e^{\lambda x}. However, in order for F(x) to actually be the cdf of a random variable (e.g. go to 1 as x goes to infinity and go to 0 as x goes to negative infinity), we need lambda to be negative, so instead we write g(F(x)) as e^{-\lambda x} where \lambda is positive. Thus the pdf is

    \[g(F(x)) = e^{-\lambda x} \leftrightarrow 1 - F(x) = e^{-\lambda x} \leftrightarrow F(x) = 1 - e^{-\lambda x}\]

Which defines the exponential distribution. So if a continuous distribution is memoryless, it has to be exponential! The poisson process also connects to the uniform distribution, in that if you randomly select an interval that contains exactly 1 jump, where in the interval that jump is follows a uniform distribution, which makes sense since jumps are equally likely everywhere. Now, onto the drinking games. One other thing I read about poisson processes in the textbook was that if you add two poisson processes it’s also a poisson process. Then, right before all of my friends drew me into the game, I read that if we have the minimum of exponential random variables X_1, X_2, ..., X_n, and we call this minimum M, and the X‘s have rates \lambda_1, \lambda_2, ..., \lambda_n, then the probability that any one of the X‘s, say X_k, is the minimum is equal to \frac{\lambda_k}{\lambda_1 + \lambda_2 + ... + \lambda_n}. I won’t include a proof of this, but you can find it in the textbook which I was able to find for free online. After spending some of my evening reading about this, I was then roped into a few rounds of uno followed by a drinking game. In the drinking game, everyone in the group had to try and flip a solo cup upside down from the edge of the table by only touching the bottom, and we could try as many flips as fast as we could until we got one of them to land, and the last person in the group to flip their cup and land it upside down had to drink. Each round, a few people flipped theirs really quickly, and a couple took a very long time, and it occurred to me that this was actually very much like a poisson process! At any point in time after starting, each person is probably about equally likely to jump from cup unflipped to cup flipped, and at any point in time that you still have yet to flip the cup your expected time to get the cup flipped should be equal to the expected time to get the cup flipped when you first started flipping. If we have n people in the group, each person i has their own rate at which they flip their cup, \lambda_i. So for example, there is a separate \lambda_{TJ}, \lambda_{Maryah}, \lambda_{Olivia}, \lambda_{Bobby}, \lambda_{Kayla}, \lambda_{Moheed}, \lambda_{Nadia}, \lambda_{Natalie}, \lambda_{Will}.

Suppose I have been keeping track of the order in which everyone manages to get their cup flipped throughout all of the times that we have played the game together. An interesting question is how that information can be used to estimate the probability that any individual person is the last one in the next round. For example, let’s say I have the order in which everyone managed to flip their cup in the previous ten rounds, and the next round is what my friends referred to as the “gravy cup” round, meaning that the person who loses has to drink a shot of gravy mixed with bourbon. Even if we assume that everyone flips faster for the gravy cup round (so this would be kind of the opposite of a “thinned” poisson process), if everyone’s rate goes up proportionally it won’t affect the probability that each individual person gets their cup flipped before or after anyone else. To estimate each person’s individual probability of being the last one, first we have to estimate their rates. One way we could do this is to use cox regression to estimate everyone’s rate relative to me (so set \lamdba_{TJ} = 1), and then just use the above fact about the minimum of a bunch of exponentials of different rates to get the probability of being the maximum. I will talk about the first step later, but the second step I’ll explain first. The second step would of course be very tedious but I think you could write a big loop to do it. Like, let’s say I only care about my probability of being the maximum. First I calculate the probability I am the minimum, P(X_{min} = X_{TJ}) = \lambda_{TJ}/(\lambda_{TJ} + \lambda_{Maryah} + ...), then take one minus that, and then for each of the probabilities that someone else is the minimum, calculate the probability that I am NOT the minimum after that person flipped their cup, and then do that over and over until I get the probability that I am the maximum. To illustrate, let’s say theres only three people, TJ, Olivia, and Kayla. Then the probability that I am last would be:

    \[\frac{\lambda_{Kayla}}{\lambda_{Kayla} + \lambda_{Olivia} + \lambda_{TJ}}(1-\frac{\lambda_{TJ}}{\lambda_{TJ} + \lambda_{Olivia}}) + \frac{\lambda_{Olivia}}{\lambda_{Kayla} + \lambda_{Olivia} + \lambda_{TJ}}(1-\frac{\lambda_{TJ}}{\lambda_{Kayla} + \lambda_{TJ}})\]

In other words, the probability I am last is the probability that Kayla flips first times the probability that I am not the second one to flip after Kayla landed her flip first plus the probability that Olivia flips first times the probability I am not the second one to flip after Olivia flipped first. As you add more people though, the number of different combos you have to calculate to get the final probability goes up factorially. Note that here we are taking advantage of the memoryless property of the exponential, after Kayla flips first the probability that I am the next to flip compared to Olivia is the same as the probability that I would have been first to flip if I had just started the game playing with Olivia.

The first part, estimating the lambdas with cox regression, isn’t difficult to implement once you’ve conceptualized it. Cox regression estimates the \betas for time to event data, where each e^\beta is how many times higher (or lower) the person’s hazard is of having an event for each increase of one in the covariate associated with \beta. Since in this case our hazard is just a constant function that is proportional to \lambda, and since cox regression only cares about the order of events not the actual times, all we would have to do is construct a matrix that we can put into software that implements cox regression to give us what we’re looking for. So what we would do is create a matrix where each row is a successful flip and has first the response, which would be the order stat of when they flipped, and then a bunch of columns that indicate which person you are. So for example, if the order was TJ, Olivia, Kayla the first time we played, then it was Olivia, TJ, Kayla, the second time we played, then Olivia, Kayla, TJ the third time we played, the matrix would have nine rows, [1, 1, 0, 0], [2, 0, 1, 0], [3, 0, 0, 1], [1, 0, 1, 0], [2, 1, 0, 0], [3, 0, 0, 1], [1, 0, 1, 0], [2, 0, 0, 1], [3, 1, 0, 0], and you don’t include TJ as a parameter in the cox model since you are using it as the baseline category. Some R code that implements this:

data_matrix <- matrix(c(1, 1, 0, 0,
2, 0, 1, 0,
3, 0, 0, 1,
1, 0, 1, 0,
2, 1, 0, 0,
3, 0, 0, 1,
1, 0, 1, 0,
2, 0, 0, 1,
3, 1, 0, 0
),
ncol = 4, byrow = TRUE)

df <- as.data.frame(data_matrix)
colnames(df) <- c(“time”, “TJ”, “Olivia”, “Kayla”)

print(df)

library(survival)

surv_obj <- Surv(time = df$time, event = rep(1, nrow(df)))

cox_model <- coxph(surv_obj ~ Olivia + Kayla, data = df, ties = “breslow”)

summary(cox_model)

The resultant estimates of the betas are 1.1656 for Olivia and -0.5237 for Kayla, taking the exponent of these gives an estimate of \lambda_{Kayla} = 0.5923 and \lambda_{Olivia} = 3.2077 relative to \lambda_{TJ} = 1, so in this situation we would estimate that I would have to drink the gravy cup with probability

    \[\frac{\lambda_{Kayla}}{\lambda_{Kayla} + \lambda_{Olivia} + \lambda_{TJ}}(1-\frac{\lambda_{TJ}}{\lambda_{TJ} + \lambda_{Olivia}}) + \frac{\lambda_{Olivia}}{\lambda_{Kayla} + \lambda_{Olivia} + \lambda_{TJ}}(1-\frac{\lambda_{TJ}}{\lambda_{Kayla} + \lambda_{TJ}}) =\]

    \[\frac{0.6}{0.6 + 3.2 + 1}(1-\frac{1}{1 + 3.2}) + \frac{3.2}{0.6 + 3.2 + 1}(1-\frac{1}{0.6 + 1}) = 0.34\]

I could write more about this but I think this is good for now. I enjoyed reading the Introduction to Stochastic Processes in R chapter for this, and I found it very easily accessible and had no difficulty understanding what was going on. I think any undergrad course in Stochastic Processes would be benefitted by use of this book. Next I’ll see if I can tackle the chapters of the Resnick book on this topic. I think I will condense both chapters to a single blog post, and I think I’ll only scratch the surface as I have plenty of other things I would like to self study and do during the break, including some nonparametric stuff that I was introduced to at the very end of the Causal inference course that I’m interested in learning more about, as well as preparation for TAing the asymptotics course in the spring and work that I need to do for the T32 Grant and stuff I am doing with Luke for this clustering paper, which has taken a bit of a new direction since Cluster Ensembling has such a strong connection to network theory (where a ‘clustering’ corresponds to a set of connected vertices on a graph). Writing this blog post has been fun, and I will continue writing over the course of the break before courses in the Spring. I am also very glad to, as of a couple of days ago, be totally done with courses for the fall semester! I enjoyed my classes, but they weren’t easy and I’m glad to have them completed now. I’ve now finished a hair short of three quarters of my required coursework for the Statistics PhD program (the U of R Statistics PhD requires 16 4-credit hour courses), and I’m excited to see how much I can teach myself over the break and I’m looking forward to next semester. Happy holidays!

Edit: I thought more about the strategy of using cox regression to estimate the respective difference between lambdas, and I realized that it would be a good idea to run a simulation study to ensure it works. The first simulation study gave me results that were very counter-intuitive, so I looked up the way that the ties were broken by the software, and realized that the software automatically uses the Efron approximation to break ties (which I had not yet heard of as I’m not super familiar with survival analysis), but what I wanted to happen was everyone in the tie to be in the risk set for having an event (ie flipping a cup successfully). Once I did that [by changing the variable ties to “breslow” as can be seen above] I got results from my simulation that were better but still not lining up with the simulation. Intuitively it seems to me like this strategy should work since cox regression estimates the relative hazard of you having an event (eg successfully flipping a cup) based on the order in which the events occur, and the hazard function in this case is constant (because we are using the exponential distribution to model event time), so e^\beta_{Kayla} for example should give the relative hazard of Kayla having an event, which should be proportional to lambda since lambda is a rate. Maybe I’m not understanding exactly the relationship between hazard and rate. I’ll come back to this.

Edit 2: I thought about it more and realized why Cox isn’t going to work for this. After the first ‘time’, all of the first people to flip are eliminated from the risk set, the problem is that the Cox model, when breaking the tie for the second ‘time’ sees people in the risk set who are not actually in the ‘risk set’ for that round. So for example, if in six rounds, Olivia was not the first to flip twice, at the second ‘time’ when breaking the tie evaluating Kayla having one second round ‘time’ the model evaluates that as if Kayla had an event BEFORE Olivia two times, even if Kayla never had an event before Olivia. This is very dissapointing to me as I thought I had found a cool way to coerce this problem into cox regression, but sadly I have not. I learned a lesson about trying to apply methods in situations they’re not intended for, in that you have to do it very carefully. I am right though about the best way of estimating the probability of having to drink the Gravy cup is by estimating all of the relative lambdas.

Let’s say I want to estimate the lambdas of everyone relative to me, so assuming \lambda_{TJ} = 1. By simplifying the problem this way, that is, only thinking about estimating everyone’s \lambda‘s relative to me, we make our task easier. For each other person, we know that P(X_{other} > X_{TJ}) = \lambda_{TJ}/(\lambda_{TJ} + \lambda_{Other}). And we have an estimate of P(X_{other} > X_{TJ}), just take the number of times the other person finished after me over the total number of games, let’s call this \hat{P}. And we know \lambda_{TJ} = 1. So we can solve for \lambda_{Other} to get an estimate of it:

    \[\hat{P}= \lambda_{TJ}/(\lambda_{TJ} + \lambda_{Other}) \rightarrow (\lambda_{TJ} + \lambda_{Other}) = \lambda_{TJ}/\hat{P} \rightarrow (\lambda_{Other}) = \lambda_{TJ}/\hat{P} - \lambda_{TJ}\]

. So in the example I gave earlier, I finished before Kayla twice out of three, so \hat{\lambda}_{Kayla} = \lambda_{TJ}/\hat{P} - \lambda_{TJ} = 1/(2/3) - 1 = 0.5 and I finished before Olivia only once, so \hat{\lambda}_{Kayla} = \lambda_{TJ}/\hat{P} - \lambda_{TJ} = 1/(1/3) - 1 = 2, so the new estimate of my probability of having to drink the gravy cup is

    \[\frac{\lambda_{Kayla}}{\lambda_{Kayla} + \lambda_{Olivia} + \lambda_{TJ}}(1-\frac{\lambda_{TJ}}{\lambda_{TJ} + \lambda_{Olivia}}) + \frac{\lambda_{Olivia}}{\lambda_{Kayla} + \lambda_{Olivia} + \lambda_{TJ}}(1-\frac{\lambda_{TJ}}{\lambda_{Kayla} + \lambda_{TJ}}) =\]

    \[\frac{0.5}{0.5 + 2 + 1}(1-\frac{1}{1 + 2}) + \frac{2}{0.5 + 2 + 1}(1-\frac{1}{0.5 + 1}) = \frac{2}{7}\]

Even though this leaves me mostly satisfied, I still dislike not using all of the information available in terms of the other players performances relative to each other. But I’m going to move on because its not something worth getting hung up over. Although there’s probably a way to estimate what I’m trying to estimate that’s better than what I’ve just suggested, I still had a fun time thinking this out. I definitely picked the right field by going into statistics.

Leave a Comment

Your email address will not be published. Required fields are marked *