Thursday, December 9, 2021

Poisson-binomial Stochastic Processes: Introduction, Simulations, Inference

In simple English, we introduce a special, off-the-beaten-path type of point process called Poisson-binomial. We analyze its properties and perform simulations to see the distribution of points that it generates, in one and two dimensions, as well as to make inference about its parameters. Statistics of interest include distance to nearest neighbors or interarrival times  (some times called increments).  Combined with radial processes, it can be used to model complex systems involving clustering, for instance the distribution of matter in the universe. Limiting cases are Poisson processes, and this may lead to what is probably the simplest definition of of stochastic, stationary Poisson point process. Our approach is not based on traditional statistical science, but instead on modern machine learning methodology. Some of the tests performed reveal strong patterns invisible to the naked eye.

Probably the biggest takeaway from this article is its tutorial value. It shows how to handle a machine learning problem involving the construction of a new model, from start to finish, with all the steps described at a high level, accessible to professionals with one year worth of academic training in quantitative fields. Numerous references are provided to help the interested reader dive into the technicalities. This article covers a large amount of concepts in a compact style: for instance, stationarity, isotropy, intensity function, paired and unpaired data, order statistics, hidden processes, interarrival times, point count distribution, model fitting, model-free confidence intervals, simulations, radial processes, renewal process, nearest neighbors, model identifiability, compound or hierarchical processes, Mahalanobis distance, rescaling, and more. It can be used as a general introduction to point processes. 

Related probability distributions include logistic, Laplace, uniform, Cauchy, Poisson, Erlang, Poisson-binomial (the distribution these processes are named after), and many that don't have a name. These distributions are used in this article. Part 1 of this article can be found here. The link below points to part 2.

The accompanying spreadsheet has its own tutorial value, as it uses powerful Excel functions that are overlooked by many. Source code is also provided and included in the spreadsheet.


Read the full article here.



Content

1. Unpaired two-dimensional processes

2. Cluster and hierarchical point processes

  • Radial process
  • Basic cluster process
  • The spreadsheet
3. Machine learning inference
  • Interarrival time, and estimation of the intensity
  • Estimation of the scaling factor
  • Confidence intervals, model fitting, and identifiability issues
  • Estimation in higher dimensions
  • Results from some testing, including about radiality
  • Distributions related to the inverse or hidden process
  • Convergence to the Poisson process

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Fuzzy Regression: A Generic, Model-free, Math-free Machine Learning Technique

  A different way to do regression with prediction intervals. In Python and without math. No calculus, no matrix algebra, no statistical eng...