//Cloud notes from my desk -Maheshk

"Fortunate are those who take the first steps.” ― Paulo Coelho

[Azure Batch] Server failed to authenticate the request

Today happen to work on this problem, where developer running unit test code creating Azure Batch Jobs in row and check the status in a tight loop getting forbidden error. Interestingly, it worked for him for the first couple of calls “job” creation but fails continuously after that in row. Grabbed the the fiddler log to see the request and response and noticed something as like below. 

Fiddler trace:

HTTP/1.1 403 Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.

Content-Length: 864

Content-Type: application/json;odata=minimalmetadata

Server: Microsoft-HTTPAPI/2.0

request-id: fd81d621-xxxxx-b252-1db46324f1a6

Strict-Transport-Security: max-age=31536000; includeSubDomains

X-Content-Type-Options: nosniff

DataServiceVersion: 3.0

Date: Tue, 13 Dec 2016 15:51:34 GMT

Cache-Control: proxy-revalidate

Proxy-Connection: Keep-Alive

Connection: Keep-Alive



    “lang”:”en-US”,”value“:”Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.nRequestId:xxxxx-564c-4930-xxx-xxxnTime:2016-12-13T15:51:34.2917937Z”



      “key”:”AuthenticationErrorDetail“,”value“:”The MAC signature found in the HTTP request ‘6nxk/xxxxxx/xmdrWt55RnMRmsg=’ is not the same as any computed signature. Server used following string to sign: ‘GETnnnnnnnTue, 13 Dec 2016 15:42:18 GMTnn0x8D4236E9EB80606nnnocp-date:Tue, 13 Dec 2016 15:51:35 GMTn/xxxxx/jobs/deleteJob0napi-version:2016-07-01.3.1’.”





This appears to be some kind of caching – https://social.msdn.microsoft.com/Forums/SqlServer/en-US/67183d62-60ab-4ef0-a1ca-b765d85ea2f6/authenticationfailed?forum=azurebatch.  It is caused by client-side caching at the proxy layer. Usually the proxy server at client side caches the get responses and for subsequent gets proxy server will try to serve the request from its cache. It sends a different request to server by adding If-Match header with the ETag. However, this causes a problem in this case because the proxy server does not change the Authorization header, it uses the same Auth header that client sends. Hence, the auth header that client sends does not match what server expects.


Added the below config value to App.Config file which resolved this issue. He was able to continue execute his test harness by submitting/deleting 10 jobs successfully without caching issues.


    <requestCaching defaultPolicyLevel=”NoCacheNoStore“/>

Hope this helps.

2016-12-20 Posted by | Azure, Azure Batch, C#, VS2015 | | Leave a comment

[Service Fabric] SF node fails to read DNS conf and fix

Recently SF developer reported this problem where his Azure Service Fabric > ImageStoreService(ISS) displayed a warning message due to one his secondary node down. This node was “down” all sudden without any change to his cluster/application. From the SF Explorer portal, we do noticed a brief warning message saying due to some unhealthy event, this node is down.

SF Explorer – warning


Error message

Unhealthy event: SourceId=’System.PLB’, Property=’ServiceReplicaUnplacedHealth_Secondary_00000000-0000-0000-0000-000000003000′, HealthState=’Warning’, ConsiderWarningAsError=false.
The Load Balancer was unable to find a placement for one or more of the Service’s Replicas:
ImageStoreService Secondary Partition 00000000-0000-0000-0000-000000003000 could not be placed, possibly, due to the following constraints and properties: 
TargetReplicaSetSize: 5
Placement Constraint: NodeTypeName==sf**type
Depended Service: ClusterManagerServiceName

Constraint Elimination Sequence:
ReplicaExclusionStatic eliminated 3 possible node(s) for placement — 2/5 node(s) remain.
PlacementConstraint + ServiceTypeDisabled/NodesBlockListed eliminated 0 + 1 = 1 possible node(s) for placement — 1/5 node(s) remain.
ReplicaExclusionDynamic eliminated 1 possible node(s) for placement — 0/5 node(s) remain.

Nodes Eliminated By Constraints:

ReplicaExclusionStatic — No Colocations with Partition’s Existing Secondaries/Instances:
FaultDomain:fd:/1 NodeName:_sf**type_1 NodeType:sf**type NodeTypeName:sf**type UpgradeDomain:1 UpgradeDomain: ud:/1 Deactivation Intent/Status: None/None
FaultDomain:fd:/4 NodeName:_sf**type_4 NodeType:sf**type NodeTypeName:sf**type UpgradeDomain:4 UpgradeDomain: ud:/4 Deactivation Intent/Status: None/None
FaultDomain:fd:/3 NodeName:_sf**type_3 NodeType:sf**type NodeTypeName:sf**type UpgradeDomain:3 UpgradeDomain: ud:/3 Deactivation Intent/Status: None/None

PlacementConstraint + ServiceTypeDisabled/NodesBlockListed — PlacementProperties must Satisfy Service’s PlacementConstraint, and Nodes must not have had the ServiceType Disabled or be BlockListed due to Node’s Pause/Deactivate Status:
FaultDomain:fd:/2 NodeName:_sf**type_2 NodeType:sf**type NodeTypeName:sf**type UpgradeDomain:2 UpgradeDomain: ud:/2 Deactivation Intent/Status: None/None

ReplicaExclusionDynamic — No Colocations with Partition’s Existing Primary or Potential Secondaries:

FaultDomain:fd:/0 NodeName:_sf**type_0 NodeType:sf**type NodeTypeName:sf**type UpgradeDomain:0 UpgradeDomain: ud:/0 Deactivation Intent/Status: None/None

Noticed below health warning event

There was no data from ‘sf**type_2’ post that failing date, so we confirmed both ISS and FabricDCA.exe was crashing
                          N/S RD sf**type_3 Up 131212…946953
                          N/S RD sf**type_1 Up 131212…593463858
                          N/S RD sf**type_4 Up 13121…..3859
                          N/I SB sf**type_2 Down 1312…..63860
                          N/P RD sf**type_0 Up 131212…..61

Event Log

Log Name:      Application
Source:        Microsoft-Windows-PerfNet
Date:          12/9/2016 7:42:40 AM
Event ID:      2005
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      sf***ype00xxx02
Unable to read performance data for the Server service. The first four bytes (DWORD) of the Data section contains the status code, the second four bytes contains the IOSB.Status and the next four bytes contains the IOSB.Information.

C0000466 00000000 634A41F0
Log Name:      Microsoft-ServiceFabric/Admin
Source:        Microsoft-ServiceFabric
Date:          12/12/2016 11:29:13 AM
Event ID:      59904
Task Category: FabricDCA
Level:         Error
Keywords:      Default
User:          NETWORK SERVICE
Computer:      sf***ype00xxx02

Failed to copy file D:\SvcFabLogPerformanceCounters_ServiceFabricPerfCounterfabric_counters_6361xxxxx47065191_000940.blg to Azure blob account sf***ype00xxx02, container fabriccounters-2b73743xxxxxxa46d111c4d5.

Microsoft.WindowsAzure.Storage.StorageException: The remote name could not be resolved: ‘sf***ype00xxx02.blob.core.windows.net’ —> System.Net.WebException: The remote name could not be resolved: ‘sf***ype00xxx02.blob.core.windows.net’
   at System.Net.HttpWebRequest.GetRequestStream(TransportContext& context)
   at System.Net.HttpWebRequest.GetRequestStream()
   at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.ExecuteSync[T](RESTCommand`1 cmd, IRetryPolicy policy, OperationContext operationContext)
   — End of inner exception stack trace —
   at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.ExecuteSync[T](RESTCommand`1 cmd, IRetryPolicy policy, OperationContext operationContext)
   at Microsoft.WindowsAzure.Storage.Blob.CloudBlockBlob.UploadFromStreamHelper(Stream source, Nullable`1 length, AccessCondition accessCondition, BlobRequestOptions options, OperationContext operationContext)
   at FabricDCA.AzureFileUploader.CreateStreamAndUploadToBlob(String sourceFile, CloudBlockBlob destinationBlob)
   at FabricDCA.AzureFileUploader.CopyFileToDestinationBlobWorker(FileCopyInfo fileCopyInfo, CloudBlockBlob destinationBlob)
   at FabricDCA.AzureFileUploader.CopyFileToDestinationBlob(Object context)
   at System.Fabric.Dca.Utility.PerformWithRetries[T](Action`1 worker, T context, RetriableOperationExceptionHandler exceptionHandler, Int32 initialRetryIntervalMs, Int32 maxRetryCount, Int32 maxRetryIntervalMs)
   at FabricDCA.AzureFileUploader.CopyFileToDestination(String source, String sourceRelative, Int32 retryCount, Boolean& fileSkipped)

Check list tried:

  • we checked the free space details of drive “D:/” of this failing node – had enough space.
  • we were able to RDP into the machine (failing node) but could not able to browse any web sites or nslookup any url.
    1. >nslookup sf***ype00xxx02.blob.core.windows.net
    2.   server:unknown
    3.    *** unknown can’t find sf***ype00xxx02.blob.core.windows.net: No response from server
  • logs confirmed that, this particular failing node VM was going through some network related issue which is why it was not able to connect to any of the services and storage account.
  • there were no fabric log post this issue start date which also confirmed this vm lost its connectivity
  • checked any crash dump under D:SvcFabLogCrashDumps – no dumps
    checked the traces from D:SvcFabLogTraces – did not get any hint


  • With above all findings, we confirmed this failing node:_sf***ype_2 was not resolving the DNS for some reason. This issue occurs very rarely due to corruption at OS level.
  • From the registry we see it has received the proper DNS settings from the azure DHCP server. 
  • The “HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesTcpipParametersDhcpNameServer” was set to 16x.6x.12x.1x” but the affected machine was not able to read and use this DNS configuration due to which name resolution was broken at the operating system.
  • To overcome this issue, we ran “Netsh  int ip reset c:reset.log” and “netsh winsock reset catalog”  to reset the IP stack and windows socket catalog and rebooted the Virtual machine which eventually resolved this issue.

Reference article :https://support.microsoft.com/en-us/kb/299357

Let me know if this helps in someway.

2016-12-18 Posted by | Azure, ServiceFabric, VMSS | , , | Leave a comment


%d bloggers like this: