I am working on a new service to run QA for many of the web properties of our companies and have encountered an interesting network problem concurrency. To improve performance, I use TPL to create HttpWebRequests based on a large set of URLs so that they can work in parallel; however, I cannot find where the bottleneck is in the process.
My observations so far:
- I can get a maximum of 25-30 parallel threads through TPL
- The processor never interrupts 5-6% for the service (runs on 1 - 4 cores, with and without H / T)
- Using a network card never breaks down 2-3%
- The overall network traffic does not seem to affect (other users do not complain, speed tests work simultaneously, do not show a significant effect).
- The speed does not change much between work in our office network (15 Mbit / s) or our data center (100 + Mbit / s).
- I get a little performance boost by loading multiple hosts at once, and not many pages from one host.
Possible pain points:
- CPU (number of cores or hardware threads)
- Nic
- Maximum allowable number of concurrent HttpWebRequests
- LAN
- Wan
- Router / Switch / Load Balancing
So the question is:
Obviously, now there is a way to download the entire Internet in a matter of minutes, but I'm interested in finding out where the bottleneck is in such a scenario and what, if anything, can be done to overcome it.
, . - ...:)