gammalinux distributionn△gammalinux distributionn

The waiting time to observe the occurrence of
events in a Poisson process with intensity
is a random variable that follows the Gamma distribution with parameters
(both of which must be positive).
The red vertical segment marks the mean of the distribution.
Contributed by:
&& from  Contributed by:
Share:Embed Interactive Demonstration New! Related Topics
The #1 tool for creating Demonstrations and anything technical.
Explore anything with the first computational knowledge engine.
The web's most extensive mathematics resource.
An app for every course&right in the palm of your hand.
Read our views on math,science, and technology.
The format that makes Demonstrations (and any information) easy to share and interact with.
Programs & resources for educators, schools & students.
Join the initiative for modernizing math education.
Walk through homework problems one step at a time, with hints to help along the way.
Unlimited random practice problems and answers with built-in Step-by-step solutions. Practice online or make a printable study sheet.
Knowledge-based programming for everyone.
Occupation&&Organization
Note: Your message & contact information may be shared with the author of any specific Demonstration for which you give feedback.
Download or upgrade to Mathematica Player 7EX
I already have Mathematica Player or Mathematica 7+The Gamma Distribution
\(\newcommand{\P}{\mathbb{P}}\)
\(\newcommand{\E}{\mathbb{E}}\)
\(\newcommand{\R}{\mathbb{R}}\)
\(\newcommand{\N}{\mathbb{N}}\)
\(\newcommand{\bs}{\boldsymbol}\)
\(\newcommand{\var}{\text{var}}\)
\(\newcommand{\sd}{\text{sd}}\)
\(\newcommand{\skew}{\text{skew}}\)
\(\newcommand{\kurt}{\text{kurt}}\)
3. The Gamma Distribution
Basic Theory
We now know that the sequence of
\(\bs{X} = (X_1, X_2, \ldots)\) in the Poisson process is a sequence of
, each having the exponential distribution with rate parameter \(r\), for some \(r \gt 0\). No other distribution gives the strong renewal assumption that we want: the property that the process probabilistically restarts, independently of the past, at each arrival time and at each fixed time.
The \(n\)th arrival time is simply the sum of the first \(n\) inter-arrival times:
\[ T_n = \sum_{i=0}^n X_i, \quad n \in \N \]
Thus, the sequence of arrival times \(\bs{T} = (T_0, T_1, \ldots)\) is the partial sum process associated with the sequence of inter-arrival times \(\bs{X} = (X_1, X_2, \ldots)\).
Distribution Functions
Recall that the common
of the inter-arrival times is
\[ f(t) = r e^{-r t}, \quad 0 \le t \lt \infty \]
Our first goal is to describe the distribution of the \( n \)th arrival \( T_n \).
For \(n \in \N_+\), \(T_n\) has a continuous distribution with probability density function \( f_n \) given by
\[ f_n(t) = r^n \frac{t^{n-1}}{(n - 1)!} e^{-r t}, \quad 0 \le t \lt \infty\]
\( f_n \) increases and then decreases, with mode at \( (n - 1) / r \).
\) is concave upward.
\( f_2 \) is concave downward and then upward, with inflection point at \( t = 2 / r \).
For \( n \ge 2 \), \( f_n \) is concave upward, then downward, then upward again with inflection points at \( t = \left[(n - 1) \pm \sqrt{n - 1}\right] \big/ r \).
Since \(T_n\) is the sum of \(n\) independent variables, each with PDF \(f\), the PDF of \(T_n\) is the
power of \(f\) of order \(n\). That is, \(f_n = f^{*n}\). A simple induction argument shows that \(f_n\) has the form given above. For example,
\[ f_2(t) = \int_0^t f(s) f(t - s) \, ds = \int_0^t r e^{-r\,s} r e^{-r(t-s)} \, ds = \int_0^t r^2 e^{-r\,t} \, ds = r^2 t \, e^{-r\,t}, \quad 0 \le t \lt \infty \]
Parts (a) and (b) follow from standard calculus.
The distribution with this probability density function is known as the gamma distribution with shape parameter \(n\) and rate parameter \(r\). It is lso known as the Erlang distribution, named for the Danish mathematician . Again, \(1 / r\) is the , and that term will be justified below. The term shape parameter for \( n \) clearly makes sense in light of parts (a) and (b) of the last result.
The term rate parameter for \( r \) is inherited from the inter-arrival times, and more generally from the underlying Poisson process itself: the random points are arriving at an average rate of \( r \) per unit time.
A more general version of the , allowing non-integer shape parameters, is studied in the chapter on . Note that since the arrival times are continuous, the probability of an arrival at any given instant of time is 0.
In the , vary \(r\) and \(n\) with the scroll bars and watch how the shape of the probability density function changes. For various values of the parameters, run the experiment 1000 times and compare the empirical density function to the true probability density function.
and the quantile function of the gamma distribution do not have simple, closed-form expressions.
However, we will see in the next section on the
how to write the distribution function as a sum.
Open the , select the gamma distribution, and select CDF view. Vary the parameters and note the shape of the distribution and quantile functions. For selected values of the parameters, compute the quartiles.
The , , and
of \(T_n\) can be found easily from the representation as a sum of independent exponential variables.
The mean and variance of \(T_n\) are.
\(\E\left(T_n\right) = n / r\)
\(\var\left(T_n\right) = n / r^2\)
Recall that the exponential distribution with rate parameter \(r\) has mean \(1 / r\) and variance \(1 / r^2\).
The expected value of a sum is the sum of the expected values, so \(\E\left(T_n\right) = n / r\).
The variance of a sum of independent variables is the sum of the variances, so \(\var\left(T_n\right) = n / r^2\).
For \(k \in \N\), the moment of order \(k\) of \(T_n\) is
\[ \E\left(T_n^k\right) = \frac{(k + n - 1)!}{(n - 1)!} \frac{1}{r^k} \]
Using the standard change of variables theorem,
\[ \E\left(T_n^k\right) = \int_0^\infty t^k f_n(t) \, dt = \frac{r^{n-1}}{(n - 1)!} \int_0^\infty t^{k + n - 1} r \, e^{-r\,t} \, dt \]
But the integral on the right is the moment of order \(k + n - 1\) for the exponential distribution, which we showed in the last section is \((k + n - 1)! \big/ r^{k + n - 1}\). Simplifying gives the result.
More generally, the moment of order \(k \gt 0\) (not necessarily an integer) is
\[ \E\left(T_n^k\right) = \frac{\Gamma(k + n)}{\Gamma(n)} \frac{1}{r^k} \]
where \(\Gamma\) is the gamma function.
In the , vary \(r\) and \(n\) with the scroll bars and watch how the size and location of the mean\( \pm \)standard deviation bar changes. For various values of \(r\) and \(n\), run the experiment 1000 times and compare the empirical moments to the true moments.
The moment generating function of \(T_n\) is
\[ M_n(s) = \E\left(e^{s T_n}\right) = \left(\frac{r}{r - s}\right)^n, \quad -\infty \lt s \lt r \]
Recall that the MGF of a sum of independent variables is the product of the corresponding MGFs.
We showed in the last section that the exponential distribution with parameter \(r\) has MGF \(s \mapsto r / (r - s)\) for \(-\infty \lt s \lt r\).
The moment generating function can also be used to derive the moments of the gamma distribution given above&recall that \(M_n^{(k)}(0) = \E\left(T_n^k\right)\).
Estimating the Rate
In many practical situations, the rate \(r\) of the process in unknown and must be
based on data from the process. We start with a natural estimate of the scale parameter \(1 / r\). Note that
\[ M_n = \frac{T_n}{n} = \frac{1}{n} \sum_{i=1}^n X_i \]
of the first \(n\) inter-arrival times \((X_1, X_2, \ldots, X_n)\). In statistical terms, this sequence is a random sample of size \(n\) from the exponential distribution with rate \(r\).
\(M_n\) satisfies the following properties:
\(\E(M_n) = \frac{1}{r}\)
\(\var(M_n) = \frac{1}{n r^2}\)
\(M_n \to \frac{1}{r}\) as \(n \to \infty\) with probability 1
Parts (a) and (b) follow from the expected value of \(T_n\) and standard properties. Part (c) is the strong law of large numbers.
In statistical terms, part (a) means that \(M_n\) is an unbiased estimator of \(1 / r\) and hence the variance in part (b) is the mean square error. Part (b) means that \(M_n\) is a consistent estimator of \(1 / r\) since \(\var(M_n) \downarrow 0\) as \(n \to \infty\). Part (c) is a stronger from of consistency. In general, the sample mean of a random sample from a distribution is an unbiased and consistent estimator of the distribution mean. On the other hand, a natural estimator of \(r\) itself is \(1 / M_n = n / T_n\). However, this estimator is positively biased.
\(\E(n / T_n) \ge r\).
This follows immediately from
since \(x \mapsto 1 / x\) is concave upward on \((0, \infty)\).
Properties and Connections
As noted above, the gamma distribution is a .
Suppose that \( T \) has the gamma distribution with rate parameter \( r \in (0, \infty) \) and shape parameter \( n \in \N_+ \).
If \( c \in (0, \infty) \) then \( c T \) has the gamma distribution with rate parameter \( r / c \) and shape parameter \( n \).
The moment generating function of \( c T \) is
\[ \E\left[e^{s (c T)} \right] = \E\left[e^{(c s) T}\right] = \left(\frac{r}{r - cs}\right)^n = \left(\frac{r / c}{r / c - s}\right)^n, \quad s \lt \frac{r}{c} \]
The scaling property also follows from the fact that the gamma distribution governs the arrival times in the Poisson process.
A time change in a Poisson process clearly does not change the strong renewal property, and hence results in a new Poisson process.
General Exponential Family
The gamma distribution is also a member of the
family of distributions.
Suppose that \( T \) has the gamma distribution with shape parameter \( n \in \N_+ \) and rate parameter \( r \in (0, \infty) \). Then \( T \) has a two parameter general exponential distribution with natural parameters \( n - 1 \) and \( -r \), and natural statistics \( \ln(T0 \) and \( T \).
This follows from the form of the PDF and the definition of the general exponential family:
\[ f(t) = r^n \frac{t^{n-1}}{(n - 1)!} e^{-r t} = \frac{r^n}{(n - 1)!} \exp\left[(n - 1) \ln(t) - r t\right], \quad t \in (0, \infty) \]
Increments
A number of important properties flow from the fact that the sequence of arrival times \(\bs{T} = (T_0, T_1, \ldots)\) is the partial sum process associated with the sequence of independent, identically distributed inter-arrival times \(\bs{X} = (X_1, X_2, \ldots)\).
The arrival time sequence \(\bs{T}\) has stationary, independent increments:
If \(m \lt n\) then \(T_n - T_m\) has the same distribution as \(T_{n-m}\), namely the gamma distribution with shape parameter \(n - m\) and rate parameter \(r\).
If \(n_1 \lt n_2 \lt n_3 \lt \cdots\) then \(\left(T_{n_1}, T_{n_2} - T_{n_1}, T_{n_3} - T_{n_2}, \ldots\right)\) is an independent sequence.
The stationary and independent increments properties hold for any partial sum process associated with an independent, identically distributed sequence.
Of course, the stationary and independent increments properties are related to the fundamental renewal assumption that we started with.
If we fix \(n \in \N_+\), then \((T_n - T_n, T_{n+1} - T_n, T_{n+2} - T_n, \ldots)\) is independent of \((T_1, T_2, \ldots, T_n)\) and has the same distribution as \((T_0, T_1, T_2, \ldots)\). That is, if we restart the clock at time \(T_n\), then the process in the future looks just like the original process (in a probabilistic sense) and is indpendent of the past. Thus, we have our second characterization of the Poisson process.
A process of random points in time is a Poisson process with rate \( r \in (0, \infty) \) if and only if the arrival time sequence \( \bs{T} \) has stationary, independent increments, and for \( n \in \N_+ \), \( T_n \) has the gamma distribution with shape parameter \( n \) and rate parameter \( r \).
The gamma distribution is closed with respect to sums of independent variables, as long as the rate parameter is fixed.
Suppose that \(V\) has the gamma distribution with shape parameter \(m \in \N_+\) and rate parameter \(r \gt 0\), \(W\) has the gamma distribution with shape parameter \(n \in \N_+\) and rate parameter \(r\), and that \(V\) and \(W\) are independent. Then \(V + W\) has the gamma distribution with shape parameter \(m + n\) and rate parameter \(r\).
There are at least three different proofs of this fundamental result. Perhaps the best is a probabilistic proof based on the Poisson process. We start with an IID sequence \(\bs{X}\) of independent exponentially distributed variables, each with rate parameter \(r\). Then we can associate \(V\) with \(T_m\) and \(W\) with \(T_{m + n} - T_m\) so that \(V + W\) becomes \(T_{m + n}\). The result now follows from the previous theorem.
Another simple proof uses moment generating functions. Recall again that the MGF of \(V + W\) is the product of the MGFs of \(V\) and of \(W\).
A third, analytic proof uses convolution. Recall again that the PDF of \(V + W\) is the convolution of the PDFs of \(V\) and of \(W\).
Normal Approximation
In the , vary \(r\) and \(n\) with the scroll bars and watch how the shape of the probability density function changes. Now set \(n = 10\) and for various values of \(r\) run the experiment 1000 times and compare the empirical density function to the true probability density function.
Even though you are restricted to relatively small values of \(n\) in the applet, note that the probability density function of the \(n\)th arrival time becomes more bell shaped as \(n\) increases (for \(r\) fixed). This is yet another application of the , since \(T_n\) is the sum of \(n\) independent, identically distributed random variables (the inter-arrival times).
The distribution of the random variable \(Z_n\) below
to the standard
as \(n \to \infty\):
\[ Z_n = \frac{r\,T_n - n}{\sqrt{n}} \]
\(Z_n\) is the standard score associated with \(T_n\), so the result follows from the central limit theorem.
Connection to Bernoulli Trials
We return to the analogy between the Bernoulli trials process and the Poisson process that started in the
and continued in the last section on the . If we think of the successes in a sequence of Bernoulli trials as random points in discrete time, then the process has the same strong renewal property as the Poisson process, but restricted to discrete time. That is, at each fixed time and at each arrival time, the process starts over, independently of the past.
In Bernoulli trials, the time of the \( n \)th arrival has the negative binomial distribution with parameters \( n \) and \( p \) (the success probability), while in the Poisson process, as we now know, the time of the \( n \)th arrival has the gamma distribution with parameters \( n \) and \( r \) (the rate).
Because of this strong analogy, we expect a relationship between these two processes.
In fact, we have the same type of limit as with the geometric and exponential distributions.
Fix \( n \in \N_+ \) and suppose that for each \( m \in \N_+ \) \( T_{m,n} \) has the negative binomial distribution with parameters \( n \) and \( p_m \in (0, 1) \), where \( m p_m \to r \in (0, \infty) \) as \( m \to \infty \). Then the distribution of \( T_{m,n} \big/ m \) converges to the gamma distribution with parameters \( n \) and \( r \) as \( m \to \infty \).
Suppose that \( X_m \) has the geometric distribution on \( \N_+ \) with success parameter \( p_m \). We know from our
in the last section that the distribution of \( X_m / m \) converges to the exponential distribution with rate parameter \( r \) as \( m \to \infty \). It follows that if \( M_m \) denotes the moment generating function of \( X_m / m \), then \( M_m(s) \to r / (r - s) \) as \( m \to \infty \) for \( s \lt r \). But then \( M_m^n \) is the MGF of \( T_{m,n} \big/ m \) and clearly
\[ M_m^n(s) \to \left(\frac{r}{r - s}\right)^n \]
as \( m \to \infty \) for \( s \lt r \). The expression on the right is the MGF of the gamma distribution with shape parameter \( n \) and rate parameter \( r \).
Computational Exercises
Suppose that customers arrive at a service station according to the Poisson model, at a rate of \(r = 3\) per hour. Relative to a given starting time, find the probability that the second customer arrives sometime after 1 hour.
Defects in a type of wire follow the Poisson model, with rate 1 per 100 meter. Find the probability that the 5th defect is located between 450 and 550 meters.
Suppose that requests to a web server follow the Poisson model with rate \(r = 5\). Relative to a given starting time, compute the mean and standard deviation of the time of the 10th request.
Suppose that \(Y\) has a gamma distribution with mean 40 and standard deviation 20. Find the shape parameter \(n\) and the rate parameter \(r\).
\(r = 1 / 10\), \(n = 4\)
Suppose that accidents at an intersection occur according to the Poisson model, at a rate of 8 per year. Compute the normal approximation to the event that the 10th accident (relative to a given starting time) occurs within 2 years.
In the , set \(n = 5\) and \(r = 2\). Run the experiment 1000 times and compute the following:
\(\P(1.5 \le T_t \le 3)\)
The relative frequency of the event \(\{1.5 \le T_5 \le 3\}\)
The normal approximation to \(\P(1.5 \le T_5 \le 3)\)
Suppose that requests to a web server follow the Poisson model. Starting at 12:00 noon on a certain day, the requests are logged. The 100th request comes at 12:15. Estimate the rate of the process.
\(r = 6.67\) hits per minuteCross Validated is a question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization. It's 100% free, no registration required.
I have data of size 116.667 rows defined as:
chr17.1552
chr17.2434
chr12.8978
chr17.8536
As it is difficult to show results with all the data, because I have no reputation enough to upload images I made my post with the first 200 rows.
Min. = 0.3693
1st Quartile = 0.3847
Median = 0.4039
Mean = 0.4199
3rd Quartile = 0.4413
Max. = 0.5742
Also I tried some exploratory analysis to see which kind of distribution follows this data (i.e. plot(density(t)) and qqnorm(t); qqline(t,col=2)). According to preliminary results, I would say that this data apparently follows a gamma distribution.
Here I put the first 200 signal values:
A &- structure(list(V1 = c(0.......5282417,
0.......5090263,
0.......4868505,
0.......4770962,
0.......4670385,
0.......4597812,
0.......4442789,
0.......4361337,
0.......4329372,
0.......4279774,
0.......4212546,
0.......4152126,
0.......4085093,
0.......4052757,
0.......4017872,
0.......3996736,
0.......3978592,
0.......3935825,
0.......3919713,
0.......3875172,
0.......3854444,
0.......3822418,
0.......3799694,
0.......3782523,
0.......3767104,
0.......3744146,
0.......3721743,
0.......3708262,
0....3692868)))
t&-as.matrix(A$V1)
My questions are:
1.- How can I fit my data to a gamma distribution?
2.- Is there any way to determine, according to the model fitting, which signal value corresponds to p-value & 0.05?
Thanks for your help!
PD 1: I accept suggestions in order to edit as this is my first post in the community!
PD 2 : I've been trying all of this using R
2,18451044
One simple way to fit a gamma distribution to the data is the method of moments: the gamma distribution with parameters $(\alpha, \beta)$ has mean $\frac\alpha\beta$ and variance $\frac\alpha{\beta^2}$. You can use sample estimates of the mean and variance and some algebra to solve for the parameters of the model. Naturally, more advanced methods exist. But with the large number of observations, this should be a good starting point.
I'm uncertain what your second question means. Are you looking for the score which divides the smallest 95% of your data from the largest 5%?
5,20111027
Your Answer
Sign up or
Sign up using Google
Sign up using Facebook
Sign up using Stack Exchange
Post as a guest
required, but not shown
Post as a guest
required, but not shown
By posting your answer, you agree to the
Not the answer you're looking for?
Browse other questions tagged
Cross Validated works best with JavaScript enabled

我要回帖

更多关于 linux distribution 的文章

 

随机推荐