## Characteristics of Network Traffic

What is self-similar traffic?
What is long-range dependance?
Estimating value of H
C++ code to measure Hurst parameter
Q & A

### What is self-similar traffic?

[ Top ]
Consider a cumulative process Y(t) with stationary increments, and let Xt be its corresponding incremental process:
Xt = Y(t) - Y(t-1)(1)
For example Y(t) can represent the number of bytes arriving up to time t, and Xt can represent the number of bytes arriving in one unit of time.
The process Xs(m) is an aggregated process of Xt if:
Xs(m) = [Xsm-m+1 + Xsm-m+2 + ... + Xsm]/m(2)
Process Xt is said to be self-similar if Xt is indistinguishable from Xs(m). This is a very restrictive definition, especially considering the stochastic nature of the network traffic. Usually, second-order self-similarity is considered for the purposes of traffic description: auto-covariance functions of the original and aggregated processes should have the same values.
Let
γ(k) = E[(Xt - μ)(Xt+k - μ)](3)
and
γ(m)(k) = E[(Xs(m) - μ)(Xs+k(m) - μ)](4)
Then, the process Xt is exactly second-order self-similar if
γ(m)(k) = γ(k)(5)
and asymptotically second-order self-similar if
limm→∞ γ(m)(k) = γ(k)(6)
A convenient measure of a process′s distributional self-similarity is its Hurst parameter H. A process is self-similar with parameter H (0 < H < 1) if:
Y(t) = k-H Y(kt), ∀ k > 0, t ≥ 0(7)
i.e., the original and normalized aggregated processes should have the same distribution. Translating it to the corresponding stationary increment process, we get
Xs(m) = [Xsm-m+1 + Xsm-m+2 + ... + Xsm]/m = [Y(sm) - Y(sm-m)]/m = [Y(m) - Y(0)]/m(8)
From Eq.(7), it follows that Y(m) = mHY(1). Thus,
X(m) = [mH/m]*[Y(1) - Y(0)] = mH-1 X(9)
The self-similarity can be viewed as an ability of an aggregated process to "preserve" the burstiness of the original process, viz., the property of slowly decaying variance:
var(X(m)) ~ m2H-2(10)
In the context of network traffic, this means that aggregating traffic over large time intervals reduces the burstiness very slowly (compared to non-self-similar traffic).

### What is long-range dependance?

[ Top ]
The property of long-range dependence refers to a non-summable auto-correlation function ρ(k):
ρ(k) = γ(k) / σ2 = E[(Xt - μ)(Xt+k - μ)] / E[(Xt - μ)2](11)
For 0 < H < 0.5 and 0.5 < H < 1
ρ(k) ~ H(2H-1)k2H-2, k → ∞(12)
It is clear, that if 0.5 < H < 1, then
ρ(k) = ∞(13)
i.e., the process is long-range dependant.
The long-range dependence results from a heavy-tailed distribution of the corresponding stochastic process. Heavy-tailness refers to the rate of tail decay of the complementary distribution function. In a heavy-tailed distribution, the decay obeys the power law:
P[X > x] ~ cx,    x → ∞, 1 < α < 2;(14)
As a result, the probability of an extremely large observation in the LRD process is non-negligible. In the context of network traffic, this means that extremely large bursts of data (packet trains) and extremely long periods of silence (inter-arrival times) will occur from time to time. This is one of the reasons why analytic models employing traditional negative exponential distribution often provide overly optimistic estimates for the delays and queue sizes – the probability of an extreme event is negligible.

### Estimating value of H

[ Top ]
From Eq. (9) we get
var(X(m)) = m2H-2var(x)(15)
or
log[var(X(m)) / var(x)] = (2H-2) log(m)(16)
Eq. (16) suggests that log-log plot of variance vs. aggregation level m should have a linear slope of value 2H-2. The following figure shows a variance-time log-log plot which can be used to estimate the Husrt parameter of a stochastic process. In a log-log plot, the x-axis represents the logarithm of the aggregation parameter m and the y-axis represents the logarithm of the normalized variance. The LRD traffic plot shows a linear dependency (except tail region) with slope s value close to −0.4. From Eq. (16), we expect the log-log plot to have a slope s = 2H − 2. Thus, the Hurst parameter is estimated to be H = 1 + s/2 = 0.8.

The second plot called SRD traffic was obtained on the same generator, but using sub-streams with exponentially-distributed on/off periods. This distribution possesses no long-range dependence and its variance-time log-log plot is expected to have a slope of −1 (var(X(m)) ~ m-1). ### C++ code to measure Hurst parameter

[ Top ]
List of files:

This C++ program measures synthetic traffic produced by Self-Similar traffic Generator v.3. It builds the var-time log-log plot and estimates the Hurst parameter by finding linear least-squares fit to the measured graph and calculating its slope. The following table shows the output of the program.
 POINT # WINDOW LOG WINDOW (X-coordinate) LRD VAR LRD NORM VAR LRD LOG NORM VAR (Y-coordinate) LRD LINEAR FIT (Y-coordinate) SRD VAR SRD NORM VAR SRD LOG NORM VAR (Y-coordinate) SRD LINEAR FIT (Y-coordinate) 0 0.0001 0 0.0628126 1 0 0.00746551 0.0346508 1 0 0.0377225 1 0.000162378 0.210526 0.0526957 0.838936 -0.076271 -0.0743866 0.0227969 0.657904 -0.181838 -0.170089 2 0.000263665 0.421053 0.0440054 0.700582 -0.154541 -0.156239 0.0146055 0.421507 -0.375196 -0.3779 3 0.000428133 0.631579 0.0366207 0.583016 -0.234319 -0.238091 0.00920152 0.26555 -0.575854 -0.585712 4 0.000695193 0.842105 0.0303168 0.482655 -0.316363 -0.319943 0.00574394 0.165766 -0.780503 -0.793523 5 0.00112884 1.05263 0.0250827 0.399326 -0.398673 -0.401795 0.00357364 0.103133 -0.986603 -1.00133 6 0.00183298 1.26316 0.0206973 0.329509 -0.482133 -0.483647 0.00220538 0.063646 -1.19623 -1.20915 7 0.00297635 1.47368 0.0170862 0.272018 -0.565402 -0.565499 0.0013642 0.0393699 -1.40484 -1.41696 8 0.00483293 1.68421 0.0141749 0.22567 -0.646526 -0.647351 0.000839418 0.0242251 -1.61574 -1.62477 9 0.0078476 1.89474 0.0117262 0.186685 -0.72889 -0.729203 0.000516435 0.014904 -1.8267 -1.83258 10 0.0127427 2.10526 0.00969033 0.154274 -0.811708 -0.811056 0.000317155 0.0091529 -2.03844 -2.04039 11 0.0206914 2.31579 0.00802201 0.127713 -0.893763 -0.892908 0.000194784 0.00562135 -2.25016 -2.2482 12 0.0335982 2.52632 0.00656447 0.104509 -0.980847 -0.97476 0.000121096 0.00349476 -2.45658 -2.45601 13 0.0545559 2.73684 0.00549938 0.0875522 -1.05773 -1.05661 7.34E-05 0.00211776 -2.67412 -2.66383 14 0.0885867 2.94737 0.00462536 0.0736376 -1.1329 -1.13846 4.60E-05 0.00132829 -2.87671 -2.87164 15 0.143845 3.15789 0.00376108 0.0598778 -1.22273 -1.22032 2.79E-05 0.000804806 -3.09431 -3.07945 16 0.233572 3.36842 0.00311508 0.0495932 -1.30458 -1.30217 1.71E-05 0.000493271 -3.30691 -3.28726 17 0.379269 3.57895 0.00268554 0.0427548 -1.36902 -1.38402 1.10E-05 0.000317988 -3.49759 -3.49507 18 0.615848 3.78947 0.00215787 0.0343541 -1.46402 -1.46587 6.81E-06 0.000196651 -3.7063 -3.70288 19 1 4 0.00185091 0.0294672 -1.53066 -1.54772 4.38E-06 0.000126402 -3.89825 -3.9107

 LRD test: packet count 2.46e+08 LRD test: traffic load 0.26099 LRD test: expected Hurst parameter 0.8 LRD test: measured Hurst parameter 0.805601

 SRD test: packet count 2.36e+08 SRD test: traffic load 0.249803 SRD test: expected Hurst parameter 0.5 SRD test: measured Hurst parameter 0.506448
[ Top ]