Algorithm/reID.git

B  
©r_ûã@sÔdZddlZddlZddlZddlZddlZddlmZ    dZ
edddZedddZ eddd    Zedd
dZeddd ZddZe ¡ddZddZddZdddZd ddZddZd!ddZdS)"zl
This file contains primitives for multi-gpu communication.
This is useful when doing distributed training.
éN)ÚreturncCs t ¡sdSt ¡sdSt ¡S)Né)ÚdistÚis_availableÚis_initializedÚget_world_size©rrú./utils/comm.pyrs
rcCs t ¡sdSt ¡sdSt ¡S)Nr)rrrÚget_rankrrrr    r
s
r
cCs0t ¡sdSt ¡sdStdk    s$ttjtdS)zh
    Returns:
        The rank of the current process within the local (per-machine) process group.
    rN)Úgroup)rrrÚ_LOCAL_PROCESS_GROUPÚAssertionErrorr
rrrr    Úget_local_rank$srcCs$t ¡sdSt ¡sdStjtdS)zw
    Returns:
        The size of the per-machine process group,
        i.e. the number of processes per machine.
    r)r)rrrrrrrrr    Úget_local_size1s
rcCs
tdkS)Nr)r
rrrr    Úis_main_process>srcCs8t ¡sdSt ¡sdSt ¡}|dkr,dSt ¡dS)zj
    Helper function to synchronize (barrier) among all processes when
    using distributed training
    Nr)rrrrÚbarrier)Ú
world_sizerrr    ÚsynchronizeBsrcCs$t ¡dkrtjddStjjSdS)zj
    Return a process group based on gloo backend, containing all the ranks
    The result is cached.
    ÚncclÚgloo)ÚbackendN)rÚget_backendÚ    new_grouprÚWORLDrrrr    Ú_get_global_gloo_groupQsrcCst |¡}|dkstt |dkr&dnd¡}t |¡}t|dkrjt     t
¡}| d t t|d|¡¡tj |¡}t |¡j|d}|S)N)rrrÚcpuÚcudai@z;Rank {} trying to all-gather {:.2f} GB of data on device {})Údevice)rrr ÚtorchrÚpickleÚdumpsÚlenÚloggingÚ    getLoggerÚ__name__ÚwarningÚformatr
ÚByteStorageÚfrom_bufferÚ
ByteTensorÚto)ÚdatarrrÚbufferÚloggerÚstorageÚtensorrrr    Ú_serialize_to_tensor]s
 
 
r0cs®tj|d}|dkstdtj ¡gtjjd}fddt|D}tj    |||ddd|D}t
|}||kr¦tj||ftjjd}tj |fdd    |fS)
zz
    Returns:
        list[int]: size of the tensor, on each rank
        Tensor: padded tensor that has the max size
    )rrzHcomm.gather/all_gather must be called from ranks within the given group!)Údtypercs"g|]}tjdgtjjdqS)r)r1r)rÚzerosÚint64r)Ú.0Ú_)r/rr    ú
<listcomp>{sz*_pad_to_largest_tensor.<locals>.<listcomp>cSsg|]}t| ¡qSr)ÚintÚitem)r4Úsizerrr    r6~sr)Údim)rrr rr/Únumelr3rÚrangeÚ
all_gatherÚmaxr2Úuint8Úcat)r/rrZ
local_sizeÚ    size_listÚmax_sizeÚpaddingr)r/r    Ú_pad_to_largest_tensoros
rDcs¾tdkr|gS|dkrt}t |¡dkr2|gSt||t|\}t|fdd|D}tj||dg}x>t||D]0\} ¡     ¡ 
¡d|}| t  |¡¡qW|S)a;
    Run all_gather on arbitrary picklable data (not necessarily tensors).
    Args:
        data: any picklable object
        group: a torch process group. By default, will use a group which
            contains all ranks on gloo backend.
    Returns:
        list[data]: list of data gathered from each rank
    rNcs"g|]}tjftjjdqS))r1r)rÚemptyr?r)r4r5)rBr/rr    r6¢szall_gather.<locals>.<listcomp>)r)rrrr0rDr>r=ÚziprÚnumpyÚtobytesÚappendrÚloads)r+rrAÚtensor_listÚ    data_listr9r,r)rBr/r    r=s 
 
 
r=c    sðtdkr|gS|dkrt}tj|ddkr4|gStj|d}t||t|\}||krÖt|fdd|D}tj|||dg}x>t||D]0\}     ¡ 
¡ ¡d|}| t  |¡¡qW|Stjg||dgSdS)a
    Run gather on arbitrary picklable data (not necessarily tensors).
    Args:
        data: any picklable object
        dst (int): destination rank
        group: a torch process group. By default, will use a group which
            contains all ranks on gloo backend.
    Returns:
        list[data]: on dst, a list of data gathered from each rank. Otherwise,
            an empty list.
    rN)rcs"g|]}tjftjjdqS))r1r)rrEr?r)r4r5)rBr/rr    r6Észgather.<locals>.<listcomp>)Údstr)rrrr
r0rDr>ÚgatherrFrrGrHrIrrJ)    r+rMrÚrankrArKrLr9r,r)rBr/r    rN®s(
 
rNcCstj d¡}t|}|dS)zü
    Returns:
        int: a random number that is the same across all workers.
            If workers need a shared RNG, they can use this shared seed to
            create one.
    All workers must call this function, otherwise it will deadlock.
    lr)ÚnpÚrandomÚrandintr=)ÚintsZall_intsrrr    Úshared_random_seed×srTTc    Cs¨t}|dkr|St ¡g}g}x,t| ¡D]}| |¡| ||¡q2Wtj|dd}tj|ddt     ¡dkr|r||}ddt
||D}WdQRX|S)ac
    Reduce the values in the dictionary from all processes so that process with rank
    0 has the reduced results.
    Args:
        input_dict (dict): inputs to be reduced. All the values must be scalar CUDA Tensor.
        average (bool): whether to do average or sum
    Returns:
        a dict with the same keys as input_dict, after reduction.
    ér)r:)rMcSsi|]\}}||qSrr)r4ÚkÚvrrr    ú
<dictcomp>þszreduce_dict.<locals>.<dictcomp>N)rrÚno_gradÚsortedÚkeysrIÚstackrÚreducer
rF)Z
input_dictÚaveragerÚnamesÚvaluesrVZreduced_dictrrr    Úreduce_dictäs
 
 
ra)N)rN)T)Ú__doc__Ú    functoolsr"rGrPrrÚtorch.distributedÚdistributedrrr7rr
rrÚboolrrÚ    lru_cacherr0rDr=rNrTrarrrr    Ú<module>s(  
$
)