Six Ways of Retrieving Webpage Content In PHP

by Oscar

There are so far 6 ways of Getting webpage content (full HTML) in PHP are most commonly used. The methods are

Some of the links on this page are affiliate links. I receive a commission (at no extra cost to you) if you make a purchase after clicking on one of these affiliate links. This helps support the free content for the community on this website. Please read our Affiliate Link Policy for more information.
  1. using file() fuction
  2. using file_get_contents() function
  3. using fopen()->fread()->fclose() functions
  4. using curl
  5. using fsockopen() socket mode
  6. using Third party library (Such as “snoopy”)

1. file()

<?php $url='https://oscarliang.com'; // using file() function to get content $lines_array=file($url); // turn array into one variable $lines_string=implode('',$lines_array); //output, you can also save it locally on the server echo $lines_string; ?>

2. file_get_contents()

To use file_get_contents and fopen you must ensure “allow_url_fopen” is enabled. Check php.ini file, turn allow_url_fopen = On. When allow_url_fopen is not on, fopen and file_get_contents will not work.

<?php $url='https://oscarliang.com'; //file_get_contents() reads remote webpage content $lines_string=file_get_contents($url); //output, you can also save it locally on the server echo htmlspecialchars($lines_string); ?>

3. fopen()->fread()->fclose()

<?php $url='https://oscarliang.com'; //fopen opens webpage in Binary $handle=fopen($url,"rb"); // initialize $lines_string=""; // read content line by line do{ $data=fread($handle,1024); if(strlen($data)==0) { break; } $lines_string.=$data; }while(true); //close handle to release resources fclose($handle); //output, you can also save it locally on the server echo $lines_string; ?>

4. curl

You need to have curl enabled to use it. Here is how: edit php.ini file, uncomment this line: extension=php_curl.dll, and install curl package in Linux

<?php $url='https://oscarliang.com'; $ch=curl_init(); $timeout=5; curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); // Get URL content $lines_string=curl_exec($ch); // close handle to release resources curl_close($ch); //output, you can also save it locally on the server echo $lines_string; ?>

5. fsockopen()函数 socket模式

<?php $fp = fsockopen("t.qq.com", 80, $errno, $errstr, 30); if (!$fp) { echo "$errstr ($errno) n"; } else { $out = "GET / HTTP/1.1rn"; $out .= "Host: t.qq.comrn"; $out .= "Connection: Closernrn"; fwrite($fp, $out); while (!feof($fp)) { echo fgets($fp, 128); } fclose($fp); } ?>

6. snoopy library

This library has recently become quite popular. It’s very simple to use. It simulates a web browser from your server.

<?php // include snoopy library require('Snoopy.class.php'); // initialize snoopy object $snoopy = new Snoopy; $url = "http://t.qq.com"; // read webpage content $snoopy->fetch($url); // save it to $lines_string $lines_string = $snoopy->results; //output, you can also save it locally on the server echo $lines_string; ?> ?>

Leave a Comment

By using this form, you agree with the storage and handling of your data by this website. Note that all comments are held for moderation before appearing.

10 comments

Gary 12th June 2022 - 8:18 pm

I guess I missed something on the Snoopy option but when I insert that exact code into my php program it takes me to Snoopy on GetHub.

Reply
KS Rahman 15th August 2021 - 5:48 am

Using no.1, i always found error on PHP script for bigger data query. Then, I used no. 4 and it’s been working excellently by adjusting time out value. Thanks you for your help Oscar

Reply
Snoopy 27th March 2017 - 7:28 am

How to fetch email address alone from a webpage?

Reply
Robert 16th February 2016 - 3:01 pm

Hi Oscar,

I wanted to use the snoopy library, but when i launch example.php, I get the following message :

“Put Snoopy.class.php into one of the directories specified in your php.ini include_path directive.”

So my question is how to edit php.ini to include “snoopy.class.php” ,also the php.ini is on my remote web server hosted by EcoWebHosting

Thanks

Reply
Trianta 1st October 2015 - 12:03 pm

Snoopy is perfect.
Thanks!

Reply
Ramb 3rd June 2015 - 12:48 am

I tried different ways but never got full content of the web page fetched. Snoopy did the trick. Thanks for the information.

Reply
Alagu Jeeva M 4th August 2014 - 11:21 pm

friends is it possible to block all the above technique? I am using apache in cent os. can you please suggest it soon..

Reply
Oscar 5th August 2014 - 10:08 am

I don’t think you can, people will do it one way or another. Only thing you can do is to implant as much words/phrases that relates to you or your website. maybe put more hyperlinks in as well that points back to your webpages.

Reply
Krystyna 14th May 2014 - 4:47 am

Ԛuality content is the secret to attrɑϲt the users tߋ go tо sеe
the web site, that’s what this website is proviԀing.

Reply
Xavier 14th May 2014 - 3:54 am

Hello thеre,just became аware of yoսr blog
throuցh Google, and found that it’s really informative.
I’m gonna watch out for brussels. I will appreciate if you continuе this in
future. Lots of peope will be benefited from your writing.
Cheers!

Reply