Move millions of items from one vault account to another

I have somewhere around 4.2 million images that I need to move from the North Central USA to the Western USA, as part of a large migration, to take advantage of Azure VM support (for those who don’t know, North Central US does not support them ) Images all in one container, divided into approximately 119,000 directories.

I use the following from the Copy Blob API:

public static void CopyBlobDirectory(
        CloudBlobDirectory srcDirectory,
        CloudBlobContainer destContainer)
{
    // get the SAS token to use for all blobs
    string blobToken = srcDirectory.Container.GetSharedAccessSignature(
        new SharedAccessBlobPolicy
        {
            Permissions = SharedAccessBlobPermissions.Read |
                            SharedAccessBlobPermissions.Write,
            SharedAccessExpiryTime = DateTime.UtcNow + TimeSpan.FromDays(14)
        });

    var srcBlobList = srcDirectory.ListBlobs(
        useFlatBlobListing: true,
        blobListingDetails: BlobListingDetails.None).ToList();

    foreach (var src in srcBlobList)
    {
        var srcBlob = src as ICloudBlob;

        // Create appropriate destination blob type to match the source blob
        ICloudBlob destBlob;
        if (srcBlob.Properties.BlobType == BlobType.BlockBlob)
            destBlob = destContainer.GetBlockBlobReference(srcBlob.Name);
        else
            destBlob = destContainer.GetPageBlobReference(srcBlob.Name);

        // copy using src blob as SAS
        destBlob.BeginStartCopyFromBlob(new Uri(srcBlob.Uri.AbsoluteUri + blobToken), null, null);          
    }
}

The problem is that it is too slow. Wahhhh too slow. At the speed that he takes to issue commands to copy all these things, He will go somewhere around four days. I'm not quite sure what the bottleneck is (limiting the client side of the connection, limiting the speed at the end of Azure, multithreading, etc.).

, , . , , ?

:

//set up tracing
InitTracer();

//grab a set of photos to benchmark this
var photos = PhotoHelper.GetAllPhotos().Take(500).ToList();

//account to copy from
var from = new Microsoft.WindowsAzure.Storage.Auth.StorageCredentials(
    "oldAccount",
    "oldAccountKey");
var fromAcct = new CloudStorageAccount(from, true);
var fromClient = fromAcct.CreateCloudBlobClient();
var fromContainer = fromClient.GetContainerReference("userphotos");

//account to copy to
var to = new Microsoft.WindowsAzure.Storage.Auth.StorageCredentials(
    "newAccount",
    "newAccountKey");
var toAcct = new CloudStorageAccount(to, true);
var toClient = toAcct.CreateCloudBlobClient();

Trace.WriteLine("Starting Copy: " + DateTime.UtcNow.ToString());

//enumerate sub directories, then move them to blob storage
//note: it doesn't care how high I set the Parallelism to,
//console output indicates it won't run more than five or so at a time
var plo = new ParallelOptions { MaxDegreeOfParallelism = 10 };
Parallel.ForEach(photos, plo, (info) =>
{
    CloudBlobDirectory fromDir = fromContainer.GetDirectoryReference(info.BuildingId.ToString());

    var toContainer = toClient.GetContainerReference(info.Id.ToString());
    toContainer.CreateIfNotExists();

    Trace.WriteLine(info.BuildingId + ": Starting copy, " + info.Photos.Length + " photos...");

    BlobHelper.CopyBlobDirectory(fromDir, toContainer, info);
    //this monitors the container, so I can restart any failed
    //copies if something goes wrong
    BlobHelper.MonitorCopy(toContainer);
});

Trace.WriteLine("Done: " + DateTime.UtcNow.ToString());
+5
2

async blob ( 30GB vhd blob 1-2 ). SLA (. , )

: 30- VHD , 1 .

, 500K 2000 . 30 . , 2000 (2000/30) = 60 . , SLA. .

- . 4 . , .

+2

, , (, , BeginStartCopyFromBlob ), . TCP acks, . . MSDN .

Upshot - Nagle - Azure.

ServicePointManager.UseNagleAlgorithm = false;

blob:

var storageAccount = CloudStorageAccount.Parse(connectionString);
ServicePoint blobServicePoint = ServicePointManager.FindServicePoint(account.BlobEndpoint);
blobServicePoint.UseNagleAlgorithm = false;

, !

+1

All Articles