You can use the property revisionsalong with the parameter rvgeneratexmlto generate a parsing tree for the article. Then you can apply XPath or go through it and look for the information you need.
Here is a sample code:
$page = 'Radiohead';
$api_call_url = 'http://en.wikipedia.org/w/api.php?action=query&titles=' .
urlencode( $page ) . '&prop=revisions&rvprop=content&rvgeneratexml=1&format=json';
You must identify yourself with the API, see the Meta Wiki for more details .
$user_agent = 'Your name <your email>';
$curl = curl_init();
curl_setopt_array( $curl, array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_USERAGENT => $user_agent,
CURLOPT_URL => $api_call_url,
) );
$response = json_decode( curl_exec( $curl ), true );
curl_close( $curl );
foreach( $response['query']['pages'] as $page ) {
$parsetree = simplexml_load_string( $page['revisions'][0]['parsetree'] );
XPath, Infobox musical artist Origin . . XPath .. . , .
$infobox_origin = $parsetree->xpath( '//template[contains(string(title),' .
'"Infobox musical artist")]/part[contains(string(name),"Origin")]/value' );
echo trim( strval( $infobox_origin[0] ) );
}