Algorithm/reID.git

U  
©r_ûã@sÔdZddlZddlZddlZddlZddlZddlmZ    dZ
edddZedddZ eddd    Zedd
dZeddd ZddZe ¡ddZddZddZdddZd ddZddZd!ddZdS)"zl
This file contains primitives for multi-gpu communication.
This is useful when doing distributed training.
éN)ÚreturncCs t ¡sdSt ¡sdSt ¡S)Né)ÚdistÚis_availableÚis_initializedÚget_world_size©rrú./utils/comm.pyrs
rcCs t ¡sdSt ¡sdSt ¡S©Nr)rrrÚget_rankrrrr    rs
rcCs0t ¡sdSt ¡sdStdk    s$ttjtdS)zh
    Returns:
        The rank of the current process within the local (per-machine) process group.
    rN©Úgroup)rrrÚ_LOCAL_PROCESS_GROUPÚAssertionErrorrrrrr    Úget_local_rank$srcCs$t ¡sdSt ¡sdStjtdS)zw
    Returns:
        The size of the per-machine process group,
        i.e. the number of processes per machine.
    rr)rrrrrrrrr    Úget_local_size1s
rcCs
tdkSr
)rrrrr    Úis_main_process>srcCs8t ¡sdSt ¡sdSt ¡}|dkr,dSt ¡dS)zj
    Helper function to synchronize (barrier) among all processes when
    using distributed training
    Nr)rrrrÚbarrier)Ú
world_sizerrr    ÚsynchronizeBsrcCs$t ¡dkrtjddStjjSdS)zj
    Return a process group based on gloo backend, containing all the ranks
    The result is cached.
    ÚncclÚgloo)ÚbackendN)rÚget_backendÚ    new_groupr ÚWORLDrrrr    Ú_get_global_gloo_groupQsrcCst |¡}|dkstt |dkr&dnd¡}t |¡}t|dkrjt     t
¡}| d t t|d|¡¡tj |¡}t |¡j|d}|S)N)rrrÚcpuÚcudai@z;Rank {} trying to all-gather {:.2f} GB of data on device {})Údevice)rrrÚtorchrÚpickleÚdumpsÚlenÚloggingÚ    getLoggerÚ__name__ÚwarningÚformatrÚByteStorageÚfrom_bufferÚ
ByteTensorÚto)Údatar rrÚbufferÚloggerÚstorageÚtensorrrr    Ú_serialize_to_tensor]s 
 
 
 
ÿÿr2cs®tj|d}|dkstdtj ¡gtjjd}fddt|D}tj    |||ddd|D}t
|}||kr¦tj||ftjjd}tj |fdd    |fS)
zz
    Returns:
        list[int]: size of the tensor, on each rank
        Tensor: padded tensor that has the max size
    rrzHcomm.gather/all_gather must be called from ranks within the given group!©Údtypercs"g|]}tjdgtjjdqS)rr3)r ÚzerosÚint64r©Ú.0Ú_©r1rr    Ú
<listcomp>zsz*_pad_to_largest_tensor.<locals>.<listcomp>cSsg|]}t| ¡qSr)ÚintÚitem)r8Úsizerrr    r;~sr©Údim)rrrr r1Únumelr6rÚrangeÚ
all_gatherÚmaxr5Úuint8Úcat)r1r rZ
local_sizeÚ    size_listÚmax_sizeÚpaddingrr:r    Ú_pad_to_largest_tensoros ÿþ
ÿrJcsºtdkr|gS|dkrt}t |¡dkr2|gSt||t|\}t|fdd|D}tj||dg}t||D]0\} ¡     ¡ 
¡d|}| t  |¡¡q|S)a;
    Run all_gather on arbitrary picklable data (not necessarily tensors).
    Args:
        data: any picklable object
        group: a torch process group. By default, will use a group which
            contains all ranks on gloo backend.
    Returns:
        list[data]: list of data gathered from each rank
    rNcs"g|]}tjftjjdqS©r3©r ÚemptyrErr7©rHr1rr    r;¡szall_gather.<locals>.<listcomp>r)rrrr2rJrDrCÚziprÚnumpyÚtobytesÚappendr!Úloads)r-r rGÚtensor_listÚ    data_listr>r.rrNr    rCs$
 
 
ÿrCc    sìtdkr|gS|dkrt}tj|ddkr4|gStj|d}t||t|\}||krÒt|fdd|D}tj|||dg}t||D]0\}     ¡ 
¡ ¡d|}| t  |¡¡q|Stjg||dgSdS)a
    Run gather on arbitrary picklable data (not necessarily tensors).
    Args:
        data: any picklable object
        dst (int): destination rank
        group: a torch process group. By default, will use a group which
            contains all ranks on gloo backend.
    Returns:
        list[data]: on dst, a list of data gathered from each rank. Otherwise,
            an empty list.
    rNrcs"g|]}tjftjjdqSrKrLr7rNrr    r;Èszgather.<locals>.<listcomp>)Údstr )rrrrr2rJrDÚgatherrOrrPrQrRr!rS)    r-rVr ÚrankrGrTrUr>r.rrNr    rW®s,
 
ÿrWcCstj d¡}t|}|dS)zü
    Returns:
        int: a random number that is the same across all workers.
            If workers need a shared RNG, they can use this shared seed to
            create one.
    All workers must call this function, otherwise it will deadlock.
    lr)ÚnpÚrandomÚrandintrC)ÚintsZall_intsrrr    Úshared_random_seed×sr]Tc    Cs¤t}|dkr|St ¡g}g}t| ¡D]}| |¡| ||¡q0tj|dd}tj|ddt     ¡dkr|r||}ddt
||D}W5QRX|S)ac
    Reduce the values in the dictionary from all processes so that process with rank
    0 has the reduced results.
    Args:
        input_dict (dict): inputs to be reduced. All the values must be scalar CUDA Tensor.
        average (bool): whether to do average or sum
    Returns:
        a dict with the same keys as input_dict, after reduction.
    érr?)rVcSsi|]\}}||qSrr)r8ÚkÚvrrr    Ú
<dictcomp>þszreduce_dict.<locals>.<dictcomp>)rr Úno_gradÚsortedÚkeysrRÚstackrÚreducerrO)Z
input_dictÚaveragerÚnamesÚvaluesr_Zreduced_dictrrr    Úreduce_dictäs
 
 
rj)N)rN)T)Ú__doc__Ú    functoolsr$rPrYr!r Ztorch.distributedÚdistributedrrr<rrrrÚboolrrÚ    lru_cacherr2rJrCrWr]rjrrrr    Ú<module>s*  
 
$
)