I have an application that loads more than 4500 html pages from 62 target hosts using HttpClient (4.1.3 or 4.2-beta). It runs on a 64-bit version of Windows 7. Processor - Core i7 2600K. The network bandwidth is 54 Mbps.
At this moment, he uses the following parameters:
DefaultHttpClientand PoolingClientConnectionManager;- He also has
IdleConnectionMonitorThreadout
http://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html; - Maximum number of connections = 80;
- The maximum number of default connections for a route = 5;
- For flow control, it uses
ForkJoinPoolwith parallelism
level = 5 (do I understand correctly that this is the number of threads working?)
In this case, my network usage (in the Windows task manager) does not exceed 2.5%. It takes 70 minutes to load 4,500 pages. And in the HttpClient logs I have things like this:
DEBUG ForkJoinPool-2-worker-1 [org.apache.http.impl.conn.PoolingClientConnectionManager]: connection issued: [id: 209] [route: {} → http://stackoverflow.com] [total saved live: 6 ; selected route: 1 out of 5; total allocated: 10 of 80]
In total, the selected connections do not rise above 10-12, despite the fact that I have established up to 80 connections. If I try to rise to the level of parallelism to 20 or 80, the use of the network will remain the same, but a lot of time will be created for the connection.
hc.apache.org( HttpClient HttpClient Threading Guide), .
:
public class ContentDownloader extends RecursiveAction {
private final HttpClient httpClient;
private final HttpContext context;
private List<Entry> entries;
public ContentDownloader(HttpClient httpClient, List<Entry> entries){
this.httpClient = httpClient;
context = new BasicHttpContext();
this.entries = entries;
}
private void computeDirectly(Entry entry){
final HttpGet get = new HttpGet(entry.getLink());
try {
HttpResponse response = httpClient.execute(get, context);
int statusCode = response.getStatusLine().getStatusCode();
if ( (statusCode >= 400) && (statusCode <= 600) ) {
logger.error("Couldn't get content from " + get.getURI().toString() + "\n" + response.toString());
} else {
HttpEntity entity = response.getEntity();
if (entity != null) {
String htmlContent = EntityUtils.toString(entity).trim();
entry.setHtml(htmlContent);
EntityUtils.consumeQuietly(entity);
}
}
} catch (Exception e) {
} finally {
get.releaseConnection();
}
}
@Override
protected void compute() {
if (entries.size() <= 1){
computeDirectly(entries.get(0));
return;
}
int split = entries.size() / 2;
invokeAll(new ContentDownloader(httpClient, entries.subList(0, split)),
new ContentDownloader(httpClient, entries.subList(split, entries.size())));
}
}
, HttpClient, , ConnectionManager HttpClient? 80 ?
.