I need to load many pages through a proxy. What is best for creating a multi-threaded web crawler?
Is Parallel.For \ Foreach good enough or better for heavy CPU tasks?
What do you say about the following code?
var multyProxy = new MultyProxy();
multyProxy.LoadProxyList();
Task[] taskArray = new Task[1000];
for(int i = 0; i < taskArray.Length; i++)
{
taskArray[i] = new Task( (obj) =>
{
multyProxy.GetPage((string)obj);
},
(object)"http://google.com"
);
taskArray[i].Start();
}
Task.WaitAll(taskArray);
It works terribly. It is very slow, and I do not know why.
This code also works poorly.
System.Threading.Tasks.Parallel.For(0,1000, new System.Threading.Tasks.ParallelOptions(){MaxDegreeOfParallelism=30},loop =>
{
multyProxy.GetPage("http://google.com");
}
);
Well, I think I'm doing something wrong.
When I run my script, it uses the network only at 2% -4%.
Neir0 source
share