[SimGrid-user] Latency issues

Frédéric Suter frederic.suter at cc.in2p3.fr
Wed Jul 5 15:14:17 CEST 2017


Hi Tristan,

This can easily be explained. The underlying network model makes a 
correction of the latency declared in the platform file to reflect the 
impact of TCP slow start on "large transfers". Such a correction is 
motivated and explained in 
http://hal.inria.fr/hal-00646896/PDF/rr-validity.pdf (Note that the 
domain validity of this model is for transfers larger than 100kB). It is 
also mentioned when adding the --help-models flag to your command line.

Long description of the network models accepted by this simulator:
   LV08: Realistic network analytic model (slow-start modeled by 
multiplying latency by 10.4, bandwidth by .92; bottleneck sharing uses a 
payload of S=8775 for evaluating RTT).

Unfortunately, these values are outdated. The latency correction factor 
is now 13.01 
(https://github.com/simgrid/simgrid/blob/master/src/surf/network_cm02.cpp#L46) 
which explain the observed results. The description given by 
--help-models has been corrected and the documentation on these hidden 
parameters will be improved.

Coming back to your experiments there are two situations:

1) if your workload comprises a majority of transfers larger than 100kB, 
this factor is important to reflect a start-up cost of TCP.

2) if the transfers are smaller than 100kB, you may want to either 
override this default parameter of the model with 
--cfg=network/latency-factor:1.0 (to disable it completely or switch to 
the former default model that didn't apply corrections with 
--cfg=network/model:CM02

Cheers

Fred



Le 05/07/2017 à 03:52, Tristan Glatard a écrit :
> Hi,
>
> We're trying to simulate a latency-bound application with Simgrid 3.16 
> using MSG, and we're a bit puzzled by simulation times.
>
> In example "app-pingpong" the ping time between hosts Tremblay and 
> Jupiter, i.e., the time for Jupiter to receive a 1-bit task from 
> Tremblay, is 0.019014s while the latency of the link between Tremblay 
> and Jupiter (id 9 in small_platform.xml) is only 1.461517ms (factor 13 
> difference). And if we edit the platform to contain only the bare 
> minimal (2 hosts and 1 link with 1ms latency, see attached), we still 
> get a ping time of 13ms while we would expect 1ms:
>
>> [glatard at sapajou app-pingpong]$ ./app-pingpong ./smaller_platform.xml 
>> ./app-pingpong_d.xml
>> [Tremblay:pinger:(1) 0.000000] [mag_app_pingpong/INFO] Ping -> Jupiter
>> [Jupiter:ponger:(2) 0.000000] [mag_app_pingpong/INFO] Pong -> Tremblay
>> [Jupiter:ponger:(2) 0.013010] [mag_app_pingpong/INFO] Task received : 
>> small communication (latency bound)
>> [Jupiter:ponger:(2) 0.013010] [mag_app_pingpong/INFO]  Ping time 
>> (latency bound) 0.013010
>> [Jupiter:ponger:(2) 0.013010] [mag_app_pingpong/INFO] task_bw->data = 
>> 0.013
>> [Tremblay:pinger:(1) 150.166348] [mag_app_pingpong/INFO] Task 
>> received : large communication (bandwidth bound)
>> [Tremblay:pinger:(1) 150.166348] [mag_app_pingpong/INFO] Pong time 
>> (bandwidth bound): 150.153
>> [150.166348] [mag_app_pingpong/INFO] Total simulation time: 150.166
> The same factor-13 difference is still observed on a different but 
> similar application.
>
> Do you have any idea what's going on?
>
> Thanks!
> Tristan
>
>
>
> _______________________________________________
> Simgrid-user mailing list
> Simgrid-user at lists.gforge.inria.fr
> https://lists.gforge.inria.fr/mailman/listinfo/simgrid-user

-- 
One should, for example, be able to see that things are hopeless
and yet be determined to make them otherwise.
                                           -- F. Scott Fitzgerald

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gforge.inria.fr/pipermail/simgrid-user/attachments/20170705/65fa2157/attachment.html>


More information about the Simgrid-user mailing list