What is self-similar traffic?
What is long-range dependance?
Estimating value of H
C++ code to measure Hurst parameter
Q & A
|
|
[ Top ] |
Consider a cumulative process Y(t) with stationary increments, and let Xt be its corresponding incremental process: |
Xt = Y(t) - Y(t-1) | (1) |
For example Y(t) can represent the number of bytes arriving up to time t, and
Xt can represent the number of bytes arriving in one unit of time. |
The process Xs(m) is an aggregated process of Xt if: |
Xs(m) = [Xsm-m+1 + Xsm-m+2 + ... + Xsm]/m | (2) |
Process Xt is said to be self-similar if Xt is indistinguishable from
Xs(m). This is a very restrictive definition, especially considering the stochastic nature of the network traffic.
Usually, second-order self-similarity is considered for the purposes of traffic description: auto-covariance functions of the original and
aggregated processes should have the same values. |
Let |
γ(k) = E[(Xt - μ)(Xt+k - μ)] | (3) |
and |
γ(m)(k) = E[(Xs(m) - μ)(Xs+k(m) - μ)] | (4) |
Then, the process Xt is exactly second-order self-similar if |
γ(m)(k) = γ(k) | (5) |
and asymptotically second-order self-similar if |
limm→∞ γ(m)(k) = γ(k) | (6) |
A convenient measure of a process′s distributional self-similarity is its Hurst parameter H. A process is self-similar with parameter H (0 < H < 1) if: |
Y(t) = k-H Y(kt), ∀ k > 0, t ≥ 0 | (7) |
i.e., the original and normalized aggregated processes should have the same distribution. Translating it to the corresponding stationary increment process, we get |
Xs(m) = [Xsm-m+1 + Xsm-m+2 + ... + Xsm]/m = [Y(sm) - Y(sm-m)]/m = [Y(m) - Y(0)]/m | (8) |
From Eq.(7), it follows that Y(m) = mHY(1). Thus, |
X(m) = [mH/m]*[Y(1) - Y(0)] = mH-1 X | (9) |
The self-similarity can be viewed as an ability of an aggregated process to "preserve" the burstiness of the original process, viz., the property of slowly decaying variance: |
var(X(m)) ~ m2H-2 | (10) |
In the context of network traffic, this means that aggregating traffic over large time intervals reduces the burstiness very slowly (compared to non-self-similar traffic).
|
| [ Top ] |
The property of long-range dependence refers to a non-summable auto-correlation function ρ(k): |
ρ(k) = γ(k) / σ2 = E[(Xt - μ)(Xt+k - μ)] / E[(Xt - μ)2] | (11) |
For 0 < H < 0.5 and 0.5 < H < 1 |
ρ(k) ~ H(2H-1)k2H-2, k → ∞ | (12) |
It is clear, that if 0.5 < H < 1, then |
∑∞ρ(k) = ∞ | (13) |
i.e., the process is long-range dependant. |
The long-range dependence results from a heavy-tailed distribution of the corresponding stochastic process. Heavy-tailness refers to the rate of tail decay of the complementary distribution function. In a heavy-tailed distribution, the decay obeys the power law: |
P[X > x] ~ cx-α, x → ∞, 1 < α < 2; | (14) |
As a result, the probability of an extremely large observation in the LRD process is non-negligible. In the context of network traffic, this means that extremely large bursts of data (packet trains) and extremely long periods of silence (inter-arrival times) will occur from time to time. This is one of the reasons why analytic models employing traditional negative exponential distribution often provide overly optimistic estimates for the delays and queue sizes – the probability of an extreme event is negligible.
|
| [ Top ] |
From Eq. (9) we get |
var(X(m)) = m2H-2var(x) | (15) |
or |
log[var(X(m)) / var(x)] = (2H-2) log(m) | (16) |
Eq. (16) suggests that log-log plot of variance vs. aggregation level m should have a linear slope of value 2H-2.
The following figure shows a variance-time log-log plot which can be used to estimate the Husrt parameter of a stochastic process. In a log-log plot, the x-axis
represents the logarithm of the aggregation parameter m and the y-axis represents the logarithm of the normalized variance. The LRD traffic plot
shows a linear dependency (except tail region) with slope s value close to −0.4. From Eq. (16), we expect the log-log plot to have a slope s = 2H − 2.
Thus, the Hurst parameter is estimated to be H = 1 + s/2 = 0.8.
The second plot called SRD traffic was obtained on the same generator, but using sub-streams with exponentially-distributed on/off periods. This distribution
possesses no long-range dependence and its variance-time log-log plot is expected to have a slope of −1 (var(X(m)) ~ m-1).

|
| [ Top ] |
List of files:
- _rand_MT.h
- _types.h
- _util.h
- avltree.h.h
- broadcom_pdf.h
- MersenneTwister.h
- stats.h
- PolyFit.h
- trf_gen_v3.h
- var_time.h
- vartime.cpp
|
This C++ program measures synthetic traffic produced by Self-Similar traffic Generator v.3. It builds the
var-time log-log plot and estimates the Hurst parameter by finding linear least-squares fit to the measured graph and calculating its slope. The following table shows the output
of the program. |
POINT # | WINDOW | LOG WINDOW (X-coordinate) | | LRD VAR | LRD NORM VAR | LRD LOG NORM VAR (Y-coordinate) | LRD LINEAR FIT (Y-coordinate) | | SRD VAR | SRD NORM VAR | SRD LOG NORM VAR (Y-coordinate) | SRD LINEAR FIT (Y-coordinate) |
0 | 0.0001 | 0 | | 0.0628126 | 1 | 0 | 0.00746551 | | 0.0346508 | 1 | 0 | 0.0377225 |
1 | 0.000162378 | 0.210526 | | 0.0526957 | 0.838936 | -0.076271 | -0.0743866 | | 0.0227969 | 0.657904 | -0.181838 | -0.170089 |
2 | 0.000263665 | 0.421053 | | 0.0440054 | 0.700582 | -0.154541 | -0.156239 | | 0.0146055 | 0.421507 | -0.375196 | -0.3779 |
3 | 0.000428133 | 0.631579 | | 0.0366207 | 0.583016 | -0.234319 | -0.238091 | | 0.00920152 | 0.26555 | -0.575854 | -0.585712 |
4 | 0.000695193 | 0.842105 | | 0.0303168 | 0.482655 | -0.316363 | -0.319943 | | 0.00574394 | 0.165766 | -0.780503 | -0.793523 |
5 | 0.00112884 | 1.05263 | | 0.0250827 | 0.399326 | -0.398673 | -0.401795 | | 0.00357364 | 0.103133 | -0.986603 | -1.00133 |
6 | 0.00183298 | 1.26316 | | 0.0206973 | 0.329509 | -0.482133 | -0.483647 | | 0.00220538 | 0.063646 | -1.19623 | -1.20915 |
7 | 0.00297635 | 1.47368 | | 0.0170862 | 0.272018 | -0.565402 | -0.565499 | | 0.0013642 | 0.0393699 | -1.40484 | -1.41696 |
8 | 0.00483293 | 1.68421 | | 0.0141749 | 0.22567 | -0.646526 | -0.647351 | | 0.000839418 | 0.0242251 | -1.61574 | -1.62477 |
9 | 0.0078476 | 1.89474 | | 0.0117262 | 0.186685 | -0.72889 | -0.729203 | | 0.000516435 | 0.014904 | -1.8267 | -1.83258 |
10 | 0.0127427 | 2.10526 | | 0.00969033 | 0.154274 | -0.811708 | -0.811056 | | 0.000317155 | 0.0091529 | -2.03844 | -2.04039 |
11 | 0.0206914 | 2.31579 | | 0.00802201 | 0.127713 | -0.893763 | -0.892908 | | 0.000194784 | 0.00562135 | -2.25016 | -2.2482 |
12 | 0.0335982 | 2.52632 | | 0.00656447 | 0.104509 | -0.980847 | -0.97476 | | 0.000121096 | 0.00349476 | -2.45658 | -2.45601 |
13 | 0.0545559 | 2.73684 | | 0.00549938 | 0.0875522 | -1.05773 | -1.05661 | | 7.34E-05 | 0.00211776 | -2.67412 | -2.66383 |
14 | 0.0885867 | 2.94737 | | 0.00462536 | 0.0736376 | -1.1329 | -1.13846 | | 4.60E-05 | 0.00132829 | -2.87671 | -2.87164 |
15 | 0.143845 | 3.15789 | | 0.00376108 | 0.0598778 | -1.22273 | -1.22032 | | 2.79E-05 | 0.000804806 | -3.09431 | -3.07945 |
16 | 0.233572 | 3.36842 | | 0.00311508 | 0.0495932 | -1.30458 | -1.30217 | | 1.71E-05 | 0.000493271 | -3.30691 | -3.28726 |
17 | 0.379269 | 3.57895 | | 0.00268554 | 0.0427548 | -1.36902 | -1.38402 | | 1.10E-05 | 0.000317988 | -3.49759 | -3.49507 |
18 | 0.615848 | 3.78947 | | 0.00215787 | 0.0343541 | -1.46402 | -1.46587 | | 6.81E-06 | 0.000196651 | -3.7063 | -3.70288 |
19 | 1 | 4 | | 0.00185091 | 0.0294672 | -1.53066 | -1.54772 | | 4.38E-06 | 0.000126402 | -3.89825 | -3.9107 |
LRD test: packet count | 2.46E+08 |
LRD test: traffic load | 0.26099 |
LRD test: expected Hurst parameter | 0.8 |
LRD test: measured Hurst parameter | 0.805601 |
SRD test: packet count | 2.36E+08 |
SRD test: traffic load | 0.249803 |
SRD test: expected Hurst parameter | 0.5 |
SRD test: measured Hurst parameter | 0.506448 |
|
|
[ Top ] |