Overhead of .NET IOCP ThreadPool with Asynchronous UDP Operations

I developed a VoIP media server that exchanges RTP packets with remote SIP endpoints. It should scale well - and while I was initially worried that my C # implementation did not come close to the C ++ version that it replaces, I used various profilers to hone the implementation, and the performance is pretty close.

I excluded most object distributions by creating reusable object pools, I use ReceiveFromAsync and SendToAsync to send / receive datagrams, and I use producer / consumer queues to send RTP packets throughout the system. On a machine with 2 x 2.4 GHz Xeon processors, I can handle about 1000 simultaneous streams, each of which sends / receives 50 packets per second. Nevertheless, the iterative profile / tweak / profile caught my eye - and I’m sure that somewhere there is more efficiency!

The event that triggers the processing is the delegated Completed delegate to SocketAsyncEventArgs, which in turn sends RTP packets through the processing pipeline.

The rest of the disappointment is that there appears to be substantial overhead in the IOCP stream. The profiler shows that only 72% of the time Inclusive Sample is in "my code" - time before that seems like an unnecessary overhead stream (stack frames below).

So my questions are:

  • Am I missing something in my understanding?
  • Can overhead be reduced?
  • Is it possible to replace the thread pool used by async socket functions with a custom, lightweight thread with less overhead?
100% MediaGateway

95.35% Thread :: intermediateThreadProc (void *)

88.37% ThreadNative :: SetDomainLocalStore (class Object *)

88.37% BindIoCompletionCallbackStub (unsigned long, unsigned long, struct _OVERLAPPED *)

86.05% BindIoCompletionCallbackStubEx (unsigned long, unsigned long, struct _OVERLAPPED *, int)

86.05% ManagedThreadBase::ThreadPool(struct ADID,void (*)(void *),void *)

86.05% CrstBase::Enter(void)

86.05% AppDomainStack::PushDomain(struct ADID)

86.05% Thread::ShouldChangeAbortToUnload(class Frame *,class Frame *)

86.05% AppDomainStack::ClearDomainStack(void)

83.72% ThreadPoolNative::CorWaitHandleCleanupNative(void *)

83.72% __CT??_R0PAVEEArgumentException@@@84

83.72% DispatchCallDebuggerWrapper(unsigned long *,unsigned long,unsigned long *,unsigned 
__int64,void *,unsigned __int64,unsigned int,unsigned char *,class ContextTransitionFrame *)

83.72% DispatchCallBody(unsigned long *,unsigned long,unsigned long *,unsigned __int64,void *,unsigned __int64,unsigned int,unsigned char *)

83.72% MethodDesc::EnsureActive(void)

81.40% _CallDescrWorker@20

81.40% System.Threading._IOCompletionCallback.PerformIOCompletionCallback(uint32,uint32,valuetype System.Threading.NativeOverlapped*)

76.74% System.Net.Sockets.SocketAsyncEventArgs.CompletionPortCallback(uint32,uint32,valuetype System.Threading.NativeOverlapped*)

76.74% System.Net.Sockets.SocketAsyncEventArgs.FinishOperationSuccess(valuetype System.Net.Sockets.SocketError,int32,valuetype System.Net.Sockets.SocketFlags)

74.42% System.Threading.ExecutionContext.Run(class System.Threading.ExecutionContext,class System.Threading.ContextCallback,object)

72.09% System.Net.Sockets.SocketAsyncEventArgs.ExecutionCallback(object)

72.09% System.Net.Sockets.SocketAsyncEventArgs.OnCompleted(class System.Net.Sockets.SocketAsyncEventArgs)
+3
2

50 000 Windows , , . , Intel - , Broadcom Windows Linux. API- Windows , , Broadcom , , , .

, , Intel Server API Windows . 50 000 50 000 .

http://msdn.microsoft.com/en-us/library/ff568337(v=VS.85).aspx

, . , VoIP, TCP UDP, IP- API.

+1
0

All Articles