java - Calculate probability of bivariate normal distribution over area / polygon -
i'm trying calculate probability of bivariate normal distribution on specific area respectively specific polygon in java.
the mathematical description integrate probability density function (pdf) of bivariate normal distribution on specific complex area.
my first approach use 2 normaldistribution
objects aid of apache-commons-math
library. given dataset x dimension 1 , dataset y dimension 2 i've computed mean , standard deviation each normaldistribution
.
with method public double probability(double x0, double x1)
org.apache.commons.math3.distribution.normaldistribution
i'm able set individual interval each dimension, means can define rectangular area , probability
normaldistribution normalx = new normaldistribution(means[0], stddeviation_x); normaldistribution normaly = new normaldistribution(means[1], stddeviation_y); double probabilityofrect = normalx.probability(x1, x2) * normaly.probability(y1, y2);
if standard deviations small enough , defined region large enough, probability approach number of 1.0 (0.99999999999), expected.
as i've said need compute specific area, first approach won't work way because i'm able define rectangular areas.
so second approach use class multivariatenormaldistribution
, implemented in apache-commons-math
.
by using multivariatenormaldistribution
vector means , covariance matrix, i'm able pdf of specific point x public double density(double[] vals)
, description saying
returns probability density function (pdf) of distribution evaluated @ specified point x.
in approach i'm converting complex area in arraylist of points , subsequently summing densities iterating on arraylist this:
multivariatenormaldistribution mnd = new multivariatenormaldistribution(means, covariances); double sum = 0.0; for(point p : complexarea) { double[] pos = {p.x, p.y}; sum += mnd.density(pos); } return sum;
but i've encountered problem lacking precision when setting standard deviations low values pdf containing peaks > 1 @ position i'm calling mnd.density(pos)
. sum adding values > 1.
to avoid these peaks i'm trying sum average of summed value surrounding points in double precision of current point by
multivariatenormaldistribution mnd = new multivariatenormaldistribution(means, covariances); double sum = 0.0; for(point p : surfacepoints) { double tmpres = 0.0; for(double x = p.x - 0.5; x < p.x + 0.5; x+=0.1) { for(double y = p.y - 0.5; y < p.y + 0.5; y+=0.1) { double[] pos = {x, y}; tmpres += mnd.density(pos); } } sum += tmpres / 100.0; } return sum;
which works.
all in i'm not quite sure if approaches fundamentally correct. approach compute probability numerical integration i'm clueless how achieve in java.
are there other possibilities achieve this?
edit: beside fact of lacking accuracy, main question is: second approach "summing densities" valid method obtain probability in area of bivariate normal distribution? thinking 1-dimensional normal distributions, probability of 1 specific point 0. how public double density(double[] vals)
method in apache math library obtain valid value?
your current approach perform numerical integral sampling @ points integer coordinates, assigning value @ each point whole square. has 2 main sources of error. 1 function may vary lot within square. boundary, integrate on squares not contained in region. third source of error roundoff, significant since other sources of errors huge.
one simple way reduce error use finer grid. if sample @ points coordinates integers divided n (and multiply area n^-2 of 1/n 1/n squares), reduce both sources of errors. problem sample @ n^2 many points.
i suggest writing double integral on region integral of integrals.
the inner integral (say, respect x) integral of one-dimensional gaussian on interval, if region convex, or @ worst on finite list of integrals. integrate pdf restricted particular y coordinate y0 along intersection of polygon horizontal line y=y0. can evaluate inner integrals using functions such erf, numerically approximated in libraries, or can using one-dimensional numerical integral.
the outer integral (say, respect y) naturally breaks pieces. there point of polygon, function inside outer integral might not smooth. so, break outer integral y-coordinates of vertices of polygon, , numerical integral such trapezoid rule or simpson's rule on each of intervals. these require evaluate inner integral @ few points in each interval , weight them appropriately.
this should produce more accurate results given amount of time refining grid.
Comments
Post a Comment