15.2K
There are so far 6 ways of Getting webpage content (full HTML) in PHP are most commonly used. The methods are
Some of the links on this page are affiliate links. I receive a commission (at no extra cost to you) if you make a purchase after clicking on one of these affiliate links. This helps support the free content for the community on this website. Please read our Affiliate Link Policy for more information.
- using file() fuction
- using file_get_contents() function
- using fopen()->fread()->fclose() functions
- using curl
- using fsockopen() socket mode
- using Third party library (Such as “snoopy”)
1. file()
<?php $url='https://oscarliang.com'; // using file() function to get content $lines_array=file($url); // turn array into one variable $lines_string=implode('',$lines_array); //output, you can also save it locally on the server echo $lines_string; ?>
2. file_get_contents()
To use file_get_contents and fopen you must ensure “allow_url_fopen” is enabled. Check php.ini file, turn allow_url_fopen = On. When allow_url_fopen is not on, fopen and file_get_contents will not work.
<?php $url='https://oscarliang.com'; //file_get_contents() reads remote webpage content $lines_string=file_get_contents($url); //output, you can also save it locally on the server echo htmlspecialchars($lines_string); ?>
3. fopen()->fread()->fclose()
<?php $url='https://oscarliang.com'; //fopen opens webpage in Binary $handle=fopen($url,"rb"); // initialize $lines_string=""; // read content line by line do{ $data=fread($handle,1024); if(strlen($data)==0) { break; } $lines_string.=$data; }while(true); //close handle to release resources fclose($handle); //output, you can also save it locally on the server echo $lines_string; ?>
4. curl
You need to have curl enabled to use it. Here is how: edit php.ini file, uncomment this line: extension=php_curl.dll, and install curl package in Linux
<?php $url='https://oscarliang.com'; $ch=curl_init(); $timeout=5; curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); // Get URL content $lines_string=curl_exec($ch); // close handle to release resources curl_close($ch); //output, you can also save it locally on the server echo $lines_string; ?>
5. fsockopen()函数 socket模式
<?php $fp = fsockopen("t.qq.com", 80, $errno, $errstr, 30); if (!$fp) { echo "$errstr ($errno) n"; } else { $out = "GET / HTTP/1.1rn"; $out .= "Host: t.qq.comrn"; $out .= "Connection: Closernrn"; fwrite($fp, $out); while (!feof($fp)) { echo fgets($fp, 128); } fclose($fp); } ?>
6. snoopy library
This library has recently become quite popular. It’s very simple to use. It simulates a web browser from your server.
<?php // include snoopy library require('Snoopy.class.php'); // initialize snoopy object $snoopy = new Snoopy; $url = "http://t.qq.com"; // read webpage content $snoopy->fetch($url); // save it to $lines_string $lines_string = $snoopy->results; //output, you can also save it locally on the server echo $lines_string; ?> ?>
10 comments
I guess I missed something on the Snoopy option but when I insert that exact code into my php program it takes me to Snoopy on GetHub.
Using no.1, i always found error on PHP script for bigger data query. Then, I used no. 4 and it’s been working excellently by adjusting time out value. Thanks you for your help Oscar
How to fetch email address alone from a webpage?
Hi Oscar,
I wanted to use the snoopy library, but when i launch example.php, I get the following message :
“Put Snoopy.class.php into one of the directories specified in your php.ini include_path directive.”
So my question is how to edit php.ini to include “snoopy.class.php” ,also the php.ini is on my remote web server hosted by EcoWebHosting
Thanks
Snoopy is perfect.
Thanks!
I tried different ways but never got full content of the web page fetched. Snoopy did the trick. Thanks for the information.
friends is it possible to block all the above technique? I am using apache in cent os. can you please suggest it soon..
I don’t think you can, people will do it one way or another. Only thing you can do is to implant as much words/phrases that relates to you or your website. maybe put more hyperlinks in as well that points back to your webpages.
Ԛuality content is the secret to attrɑϲt the users tߋ go tо sеe
the web site, that’s what this website is proviԀing.
Hello thеre,just became аware of yoսr blog
throuցh Google, and found that it’s really informative.
I’m gonna watch out for brussels. I will appreciate if you continuе this in
future. Lots of peope will be benefited from your writing.
Cheers!