分享一个php脚本,使用代理ip来访问网页,方便抓取数据什么的~
什么情况下会用到代理IP?比如你要抓取一个网站数据,该网站有100万条内容,他们做了IP限制,每个IP每小时只能抓1000条,如果单个IP去抓因为受限,需要40天左右才能采集完,如果用了代理IP,不停的切换IP,就可以突破每小时1000条的频率限制,从而提高效率。
脚本开始:
<?php
$gourl = "https://www.dchuanbao.com/";
$ch = curl_init();
$proxy = "ip:端口";
curl_setopt($ch, CURLOPT_URL, $gourl);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
//代理
curl_setopt($ch, CURLOPT_PROXYTYPE, CURLPROXY_HTTP);
curl_setopt($ch, CURLOPT_PROXY, $proxy);
//自定义header
$headers = array();
$headers["user-agent"] = 'User-Agent: 浏览器ua;';
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
//自定义cookie
curl_setopt($ch, CURLOPT_COOKIE,'cookie内容');
curl_setopt($ch, CURLOPT_ENCODING, 'gzip'); //gzip
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 4);
curl_setopt($ch, CURLOPT_TIMEOUT, 4);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
curl_close($ch);
代码经过测试,简单粗暴!