Statistics and Philosophy

8:06 PM Wednesday, July 30, 2025

This blog post is going to be a bit different than a lot of my previous posts except perhaps the most recent one, although I hope to get back to making more blog posts that cover more interesting statistical topics in the near future once I’ve passed my second round of qualifying exams that I’m currently in the midst of preparation for. One question that I often am asked when I tell people that I am a statistician is something to the effect of “What is statistics (as a field in and of itself)?” or “what is it you actually do?” and in this blog post I’ll seek to answer that question in perhaps the most roundabout way possible, while also touching on what I have come to believe is the purpose of my life. The significant developments and events in a person’s life often leave them questioning what all of it is for. With this type of spiritual and intellectual questioning in mind I read “man’s search for meaning”, of which a significant portion was devoted to a reframing of the question; instead of asking “what is the meaning of life?” we ought to ask “what is the meaning of MY life?” because the answer to the question isn’t going to be the same for everyone. This put me onto another line of thought: what it is that I was put on this Earth to do? What is my unique gifting or purpose? What is it that stands me out from all of the other people that live and breathe and toil and build in the world? What pursuits do I find inherently valuable and stimulating?

I thought about these questions for a long time, and a number of different things came to mind: creating a masterpiece of some nature, having as many different and good experiences as possible, achieving as high-up a position of leadership as possible, doing as much good for others as I can, etcetera. Each of these called to me for its own reasons, and each of them is, for me, something that I will to pursue to some extent, but none of them called to me and the gifts that I have specifically. Then I thought of something that really stuck with me, an idea that actually floated to me from the recesses of my memories as a summer intern at Excellus Blue Cross Blue Shield. When I worked at Excellus Blue Cross Blue shield, I had the opportunity to take the Clifton-Strengths 2.0 assessment twice, once during each of the two summers I interned there. The first time I took it, my top Strength was Learner, and the second time I took it, my top strength was Context, and my number two strength was Learner. The meaning of the Learner strength is self-evident, and the meaning of the Context strength is the same as that of the Learner strength, except applied to the history/story of a person or organization or thing. Which leads me to what I think it is that I’m here for: To learn and appreciate as much as I can about the amazing truths of the universe, of mathematics, of philosophy, of literature and poetry, of religion of all kinds, of the Earth and its incredible abundance of beautiful nature and life, of the sciences of physics and astronomy and chemistry and biology, of the rich history of human civilization, of the incomprehensibly complex human mind, of the intricate and unique nature of the people whose lives I’m blessed to be a part of (including myself), and of every other interesting topic that we have seen fit to study in the brief time that it’s been since animals first opened their eyes and recognized the miracle of their own existence. And not just to memorize facts, but to reason about and question these things, to discuss them, to apply them and draw analogies between them, to pull them apart and put them back together and tease out their most fundamental aspects, which is what it really means to learn something. And, of course, I don’t believe that anything is good for a person to the exclusion of all else. I also want to continue to invest in meaningful connections with friends and family, I want to contribute uniquely to the world, I’d like stop and smell the roses and not move too fast to enjoy myself and my experiences (“Life moves pretty fast, if you don’t stop and look around once in a while, you could miss it”), I want to someday be married and have children, I’d like to exhibit leadership of others in some capacity, and I’d like to have a positive impact on the lives of others. All the things that contribute to a well-lived-life.

I am fortunate enough to be in the position to learn about one of the most interesting fields of all, the field of statistics. Now, statistics may not seem terribly interesting to someone who is unfamiliar with it, but it’s actually at a critical intersection and forms a basis for thinking about all knowledge. An interesting thought that occurred to me about the field of statistics came when I was reading a wikipedia article about Kant. The article talked about how Kant delineated two kinds of knowledge, which he called analytic and synthetic. Analytic knowledge is knowledge that can be known a priori – outside of sensory experience: that is, logical and mathematical truths that can be proven by their definitions. For example, if a rational number is defined to be one which can be written as the ratio of two non-zero integers which share no factors (except 1), we can prove that there is no rational number such that the number multiplied by itself equals two (that is, the irrationality of the square root of two is an analytic truth). Synthetic knowledge is not able to be known a priori, it requires sensory experience – for example, truths about the nature of chemistry, physics, biology, and the other sciences are synthetic knowledge. Statistics finds itself as the bridge between mathematics, the field of analytic knowledge, and the other sciences which are fields of synthetic knowledge learned through experimentation. It is an analytic field applied to finding synthetic truth. Statistics is primarily concerned with posing models of real world phenomena (or technically noumena according to Kant, since the model posits a process by which the thing-in-itself operates) that are formulated mathematically and then use the results from the real world experiments to ‘check’ whether or not the model is adequate, which gives us knowledge about the operation of the real world process, often with a certain “probability” in the sense that “if the model is true the probability of seeing this result is p = …”. Often knowledge is gathered through statistics by posing a particular model and then rejecting a particular truth within that model based on the fact that if that particular truth were true than the data would be “too improbable”. For example, if we have a linear model and we are concerned with one of the coefficients: say, we have a linear model for house price and we want to determine the effect of number of bathrooms, we pose a model where bathrooms and many other factors have a linear effect on house price, and we look at how likely we would be to see the data that we see if bathrooms has no effect on housing price (the model where the coefficient on bathrooms is equal to zero). If the data we see is highly improbably under the model where the coefficient on bathrooms is equal to zero (that is, the p-value is very low), then it indicates that this model is probably wrong and so we reject it. In this way, rejection of models and analytic knowledge – “the probability of xyz data given abc model is mnop” is analytic – helps us learn about synthetic truths – a house with more bathrooms costs more is synthetic. Oft-used “confidence intervals” for the coefficient are constructed by an extension of this type of thinking, “the set of possible coefficient values / models for which the probability of xyz data is above mnop is qrs to tuv”, where qrs to tuv is the confidence interval.

Another approach to modeling as an alternative to using p-value type thinking is the Bayesian approach, which I am spending a lot of time studying currently as it is one of the courses that is going to be tested on for the aforementioned Advanced Exam that I have to take in about a month. In the Bayesian approach, there is an additional step which is the specification of a prior distribution, which represents prior beliefs about the parameter of interest, and allows for an interpretation of the results that can be more intuitive than that used by mainstream p-value-ists (who Bayesians refer to as “Frequentist” in a similar way that ADHD/Autistic people refer to non-ADHD/Autistic people as “Neurotypical” – “Frequentist” really just means “Not a Bayesian”). So, for example, lets say we know that the addition of a single bathroom to a home is highly unlikely to affect the price by more than $120,000, we set a ‘prior’ distribution of the parameter to be a normal distribution centered at zero and have a variance of $40,000, we would then specify the same linear form for the model, and our result would be interpreted as “assuming jkl prior knowledge about the coefficient, the probability the coefficient is above zero is mnop, and the probability it is between qrs and tuv is 0.95”. This Bayesian approach is appealing in that its explanations make reference to probabilities of the coefficient, rather than probabilities of the data given the coefficient has such-and-such a value (the frequentist approach). The oft-cited drawback of the bayesian approach is that one has to make a semi-arbitrary judgment about one’s prior knowledge, although I find this drawback not very important because, in the first place, modern bayesians typically use ‘non-informative priors’ that are designed to affect conclusions as little as possible and, in the second place, the assumption of a prior is not terribly different from the semi-arbitrary specification of any model for any real-world process that is done by frequentists. More significant drawbacks are a little more complicated to explain, but have to do with the blessing and the curse of Bayesian modelling which is MCMC – the method by which most bayesian models are fit. MCMC is wonderful in that it allows one to specify a model and circumnavigate the difficulties posed by the often heavy mathematics required for a maximum likelihood approach when there are a lot of parameters to consider: in MCMC, one simply specifies the priors, specifies the likelihood, and out comes samples from the posterior distribution of the parameters which are used to make conclusions about them. MCMC is dreadful in that it’s often difficult to tell if the MCMC has ‘converged’, that is that it’s been running ‘long-enough’ that where one chose to begin to ‘sample’ from the posterior is irrelevant, and this ‘convergence’ takes a long time and therefore can be highly impractical in simulation study. A lot of bayesian theoretical work has been devoted to ‘convergence diagnostics’, which are methods for determining if MCMC’s have converged. One thing that frequentists are often concerned with is the ‘rate of convergence’ (different than mcmc-convergence) of an estimator: frequentists want to know as they increase the sample size how quickly a method of estimating an unknown thing approaches the truth of the unknown thing. For example, again with the house example, instead of using real data, we generate some fake data for which we KNOW the true value of the coefficient on bathrooms is $5,000, we generate one thousand fake data sets with a sample size of 100 houses, one thousand with 200 houses, one thousand with 300 houses, one thousand with 400 houses, one thousand with 500 houses; we run our frequentist method and see that as the sample size increases from 100 to 500 we are more and more likely to draw the correct conclusion that the coefficient on bathrooms is non-zero, and our ‘estimate’ of the value of the coefficient gets closer to $5,000 quickly enough (what is meant by quickly enough is something that can be the subject of another blog post but it often has to do with ‘quickly enough that the CLT kicks in’ or ‘at a root-n-rate’ which I won’t get into any more here). We want to ensure this is also true for the bayesian method, but it’s more difficult because every single MCMC takes a long time to run, to run our MCMC five thousand times is relatively impractical depending on how many other parameters there are in the model.

A lot of the above is oversimplification and is more meant to convey an intuitive understanding of the statistical thinking than it is to be mathematically precise (I haven’t even shown any math at all!). All this is said really to say that Statistics is a rich and interesting field that uses math / analytic thinking-knowledge to learn about the real world, and that there are a TON of interesting real world problems for which statistical thinking-knowledge is helpful in learning about underlying unobservable processes, and as a result it’s a cool place to be and there’s a lot here to satiate my desire to be the aperture for knowledge that I’ve come to believe is to be the anchoring component of my adult life.

Now onto more studying!

Leave a Comment

Your email address will not be published. Required fields are marked *