Issues with simultaneously issuing web requests

I am working on a new service to run QA for many of the web properties of our companies and have encountered an interesting network problem concurrency. To improve performance, I use TPL to create HttpWebRequests based on a large set of URLs so that they can work in parallel; however, I cannot find where the bottleneck is in the process.

My observations so far:

  • I can get a maximum of 25-30 parallel threads through TPL
  • The processor never interrupts 5-6% for the service (runs on 1 - 4 cores, with and without H / T)
  • Using a network card never breaks down 2-3%
  • The overall network traffic does not seem to affect (other users do not complain, speed tests work simultaneously, do not show a significant effect).
  • The speed does not change much between work in our office network (15 Mbit / s) or our data center (100 + Mbit / s).
  • I get a little performance boost by loading multiple hosts at once, and not many pages from one host.

Possible pain points:

  • CPU (number of cores or hardware threads)
  • Nic
  • Maximum allowable number of concurrent HttpWebRequests
  • LAN
  • Wan
  • Router / Switch / Load Balancing

So the question is:

Obviously, now there is a way to download the entire Internet in a matter of minutes, but I'm interested in finding out where the bottleneck is in such a scenario and what, if anything, can be done to overcome it.

, . - ...:)

+5
3

, :

  • . ServicePointManager.DefaultConnectionLimit. , 1000.
  • TPL , . , - . .

TPL parallelism (DOP). , parallelism IO.

IO, DOP. . 50 500. , .

+7

, TCP- , - JMeter HTTP, .

+1

. Parallel.ForEach URL- (). HttpWebRequest ConcurrentBag. , NCrawler ; . .

Parallel.ForEach , ThreadPool.

QueueUserWorkItem , false ( ).

ThreadPool , SetMaxThreads.

+1

All Articles