What is selfsimilar traffic?
What is longrange dependance?
Estimating value of H
C++ code to measure Hurst parameter
Q & A


[ Top ] 
Consider a cumulative process Y(t) with stationary increments, and let X_{t} be its corresponding incremental process: 
X_{t} = Y(t)  Y(t1)  (1) 
For example Y(t) can represent the number of bytes arriving up to time t, and
X_{t} can represent the number of bytes arriving in one unit of time. 
The process X_{s}^{(m)} is an aggregated process of X_{t} if: 
X_{s}^{(m)} = [X_{smm+1} + X_{smm+2} + ... + X_{sm}]/m  (2) 
Process X_{t} is said to be selfsimilar if X_{t} is indistinguishable from
X_{s}^{(m)}. This is a very restrictive definition, especially considering the stochastic nature of the network traffic.
Usually, secondorder selfsimilarity is considered for the purposes of traffic description: autocovariance functions of the original and
aggregated processes should have the same values. 
Let 
γ(k) = E[(X_{t}  μ)(X_{t+k}  μ)]  (3) 
and 
γ^{(m)}(k) = E[(X_{s}^{(m)}  μ)(X_{s+k}^{(m)}  μ)]  (4) 
Then, the process X_{t} is exactly secondorder selfsimilar if 
γ^{(m)}(k) = γ(k)  (5) 
and asymptotically secondorder selfsimilar if 
lim_{m→∞} γ^{(m)}(k) = γ(k)  (6) 
A convenient measure of a process′s distributional selfsimilarity is its Hurst parameter H. A process is selfsimilar with parameter H (0 < H < 1) if: 
Y(t) = k^{H} Y(kt), ∀ k > 0, t ≥ 0  (7) 
i.e., the original and normalized aggregated processes should have the same distribution. Translating it to the corresponding stationary increment process, we get 
X_{s}^{(m)} = [X_{smm+1} + X_{smm+2} + ... + X_{sm}]/m = [Y(sm)  Y(smm)]/m = [Y(m)  Y(0)]/m  (8) 
From Eq.(7), it follows that Y(m) = m^{H}Y(1). Thus, 
X^{(m)} = [m^{H}/m]*[Y(1)  Y(0)] = m^{H1} X  (9) 
The selfsimilarity can be viewed as an ability of an aggregated process to "preserve" the burstiness of the original process, viz., the property of slowly decaying variance: 
var(X^{(m)}) ~ m^{2H2}  (10) 
In the context of network traffic, this means that aggregating traffic over large time intervals reduces the burstiness very slowly (compared to nonselfsimilar traffic).

 [ Top ] 
The property of longrange dependence refers to a nonsummable autocorrelation function ρ(k): 
ρ(k) = γ(k) / σ^{2} = E[(X_{t}  μ)(X_{t+k}  μ)] / E[(X_{t}  μ)^{2}]  (11) 
For 0 < H < 0.5 and 0.5 < H < 1 
ρ(k) ~ H(2H1)k^{2H2}, k → ∞  (12) 
It is clear, that if 0.5 < H < 1, then 
∑_{∞}ρ(k) = ∞  (13) 
i.e., the process is longrange dependant. 
The longrange dependence results from a heavytailed distribution of the corresponding stochastic process. Heavytailness refers to the rate of tail decay of the complementary distribution function. In a heavytailed distribution, the decay obeys the power law: 
P[X > x] ~ cx^{α}, x → ∞, 1 < α < 2;  (14) 
As a result, the probability of an extremely large observation in the LRD process is nonnegligible. In the context of network traffic, this means that extremely large bursts of data (packet trains) and extremely long periods of silence (interarrival times) will occur from time to time. This is one of the reasons why analytic models employing traditional negative exponential distribution often provide overly optimistic estimates for the delays and queue sizes – the probability of an extreme event is negligible.

 [ Top ] 
From Eq. (9) we get 
var(X^{(m)}) = m^{2H2}var(x)  (15) 
or 
log[var(X^{(m)}) / var(x)] = (2H2) log(m)  (16) 
Eq. (16) suggests that loglog plot of variance vs. aggregation level m should have a linear slope of value 2H2.
The following figure shows a variancetime loglog plot which can be used to estimate the Husrt parameter of a stochastic process. In a loglog plot, the xaxis
represents the logarithm of the aggregation parameter m and the yaxis represents the logarithm of the normalized variance. The LRD traffic plot
shows a linear dependency (except tail region) with slope s value close to −0.4. From Eq. (16), we expect the loglog plot to have a slope s = 2H − 2.
Thus, the Hurst parameter is estimated to be H = 1 + s/2 = 0.8.
The second plot called SRD traffic was obtained on the same generator, but using substreams with exponentiallydistributed on/off periods. This distribution
possesses no longrange dependence and its variancetime loglog plot is expected to have a slope of −1 (var(X^{(m)}) ~ m^{1}).

 [ Top ] 
List of files:
 _rand_MT.h
 _types.h
 _util.h
 avltree.h.h
 broadcom_pdf.h
 MersenneTwister.h
 stats.h
 PolyFit.h
 trf_gen_v3.h
 var_time.h
 vartime.cpp

This C++ program measures synthetic traffic produced by SelfSimilar traffic Generator v.3. It builds the
vartime loglog plot and estimates the Hurst parameter by finding linear leastsquares fit to the measured graph and calculating its slope. The following table shows the output
of the program. 
POINT #  WINDOW  LOG WINDOW (Xcoordinate)   LRD VAR  LRD NORM VAR  LRD LOG NORM VAR (Ycoordinate)  LRD LINEAR FIT (Ycoordinate)   SRD VAR  SRD NORM VAR  SRD LOG NORM VAR (Ycoordinate)  SRD LINEAR FIT (Ycoordinate) 
0  0.0001  0   0.0628126  1  0  0.00746551   0.0346508  1  0  0.0377225 
1  0.000162378  0.210526   0.0526957  0.838936  0.076271  0.0743866   0.0227969  0.657904  0.181838  0.170089 
2  0.000263665  0.421053   0.0440054  0.700582  0.154541  0.156239   0.0146055  0.421507  0.375196  0.3779 
3  0.000428133  0.631579   0.0366207  0.583016  0.234319  0.238091   0.00920152  0.26555  0.575854  0.585712 
4  0.000695193  0.842105   0.0303168  0.482655  0.316363  0.319943   0.00574394  0.165766  0.780503  0.793523 
5  0.00112884  1.05263   0.0250827  0.399326  0.398673  0.401795   0.00357364  0.103133  0.986603  1.00133 
6  0.00183298  1.26316   0.0206973  0.329509  0.482133  0.483647   0.00220538  0.063646  1.19623  1.20915 
7  0.00297635  1.47368   0.0170862  0.272018  0.565402  0.565499   0.0013642  0.0393699  1.40484  1.41696 
8  0.00483293  1.68421   0.0141749  0.22567  0.646526  0.647351   0.000839418  0.0242251  1.61574  1.62477 
9  0.0078476  1.89474   0.0117262  0.186685  0.72889  0.729203   0.000516435  0.014904  1.8267  1.83258 
10  0.0127427  2.10526   0.00969033  0.154274  0.811708  0.811056   0.000317155  0.0091529  2.03844  2.04039 
11  0.0206914  2.31579   0.00802201  0.127713  0.893763  0.892908   0.000194784  0.00562135  2.25016  2.2482 
12  0.0335982  2.52632   0.00656447  0.104509  0.980847  0.97476   0.000121096  0.00349476  2.45658  2.45601 
13  0.0545559  2.73684   0.00549938  0.0875522  1.05773  1.05661   7.34E05  0.00211776  2.67412  2.66383 
14  0.0885867  2.94737   0.00462536  0.0736376  1.1329  1.13846   4.60E05  0.00132829  2.87671  2.87164 
15  0.143845  3.15789   0.00376108  0.0598778  1.22273  1.22032   2.79E05  0.000804806  3.09431  3.07945 
16  0.233572  3.36842   0.00311508  0.0495932  1.30458  1.30217   1.71E05  0.000493271  3.30691  3.28726 
17  0.379269  3.57895   0.00268554  0.0427548  1.36902  1.38402   1.10E05  0.000317988  3.49759  3.49507 
18  0.615848  3.78947   0.00215787  0.0343541  1.46402  1.46587   6.81E06  0.000196651  3.7063  3.70288 
19  1  4   0.00185091  0.0294672  1.53066  1.54772   4.38E06  0.000126402  3.89825  3.9107 
LRD test: packet count  2.46E+08 
LRD test: traffic load  0.26099 
LRD test: expected Hurst parameter  0.8 
LRD test: measured Hurst parameter  0.805601 
SRD test: packet count  2.36E+08 
SRD test: traffic load  0.249803 
SRD test: expected Hurst parameter  0.5 
SRD test: measured Hurst parameter  0.506448 


[ Top ] 