In a frequentist setting, a statistician or econometrician typically commits to one specific parameter θ^ after observing y and X and uses this parameter to dictate the probability of outcomes. Therefore, for an unobserved data y~,X~ the likelihood is
P(y~∣θ^,X~)
However, in a dynamic real-world environment, it is often unrealistic to assume that the parameter remains fixed, given the continual changes and updates that influence underlying conditions.
Posterior Predictive
Instead of committing to one parameter to compute the likelihood of an unobserved data, the posterior predictve computes the weighted average of the likelihoods across all possible θ, given its parameter α.
P(y~∣X~,y,X,α)=∫P(y~∣θ,X~)P(θ∣y,X,α)dθ
Bayesian Linear Regression
Consider the following linear framework: y=Xθ+U
U∣X∼N(0,σ2),θ∼N(0,λ−1I)
Note that U∣X∼N(0,σ2) implies y∣X∼(Xθ,σ2). Then, the posterior has the form
θ∼N(θMAP,(σ21X⊤X+λI)−1),θMAP=(X⊤X+σ2λI)−1X⊤y
y~∣X,y,X~∼N(θMAP⊤X~,σ2+X~⊤(σ21X⊤X+λI)X~)
Gaussian Process
Gaussian Process is the distribution over functions.
For a more general case that can be expanded to non-linear framework suppose y=f(X)+U. Then f follows a Gaussian Process
f(X)∼N(m(X),K(X,X′))
Linear Basis Function
For example, f can depend on the parameter θ where θ∼N(0,λ−1I). Consider the case f(X)=θ⊤ϕ(X). Then, m(X) is as follows
m(X)=E(θ⊤ϕ(X))=E(θ⊤)ϕ(X)=0
Kernel
The kernel K determines the covariance over the functions, where K(X,X′)=cov(X,X′). A common choice is the RBF kernel.
K(X,X′)=α2exp(2l2−∣∣X−X′∣∣2)
ref
Empirical Bayes
Instead of using cross validation to find the best hyperparmeter α, which can potentially be biased towards the validation set, an alternative approach is to use Empirical Bayes.
α^∈αargmax{P(y,X∣α)}
P(y,X∣α)=∫P(y,X∣α)P(θ∣α)dθ