PHP Web Scraping: Documentation with example
In this post, you will learn a simple process to create web scraping using the PHP programming language.
Web Scraping is a process of extracting information from web sites. The extracted data can be contents, URLs, titles, contact information, and so on, which we can store in a local file or database. This process can be done manually by a code called a scrapper or by automated software implemented using a bot or a web crawler. Some popular sites provide APIs to access their data in a structured way. But not all websites. Web scraping is not always legal. Some sites have dis-allow the scraping in the 'robots.txt' file. So, we need a web scraper for data extraction, data mining, and to store it in a structured way.
Why web scraping has become so basic is a result of a bunch of elements. It is not necessary that the information that you access on the Internet is accessible for download. Nonetheless, you need it to download in an alternate configuration. So you need an approach to download the information from numerous pages of a site or from different sites. Along these lines, you need web scraping.
The web scraping in PHP is very simple. If the owner of the website doesn't provide an API through which we can get that information, and we really want the data of a web page, then web scraping is the only solution. There are a number of web scraping libraries available in PHP. In this article, we will mention the process in detail of scraping data from a web page.
Simple HTML DOM
The Simple HTML Dom parser is a good choice, as it enables us to access and use HTML quite easily and comfortably. One can parse website pages as a DOM (Document Object Model) tree, which is in a way a depiction of which projects can gain admittance to which parts of the pages. To give you a model, an HTML or XML archive is changed over to the DOM. What DOM does is that it expresses the structure of records and how an archive can be gotten to. PHP gives DOM augmentation.
Download SimpleHTMLDom
Let's start by downloading the Simple HTML DOM from the given link-
Download SimpleHTMLDom
Next, extract the above downloaded folder, we will have a file name 'simple_html_dom.php' in the extracted folder.
Take a web page for scraping
We definitely need a web page on which we scrape data, like the title of the article, date, images, and much more. Here, we have taken a news article and inspected the elements to find out the class of the HTML tag. So that we can scrape the requisite information from HTML based on CSS selectors like class, id, etc.
As we can see in the above screenshot, the CSS class "card__title" is applied to all DIV tags that contain titles, and the CSS class "card__posted-on" is applied to all DIV tags which contain dates. This will be useful in the process of filtering the field from the rest of the other content in the response object. This will be helpful during the process of separating the field from the rest of the content in the response object.
PHP Script to scrap web content
Here is the complete PHP code to scrape titles and posted dates from a news website.
<?php
// include library file
require_once 'simple_html_dom.php';
// fetch HTML content from the site.
$dom = file_get_html('https://www.thestatesman.com/cities/delhi', false);
// gather all the articles
$article = array();
if(!empty($dom)) {
$div_class = $title = "";
$i = 0;
foreach($dom->find(".card-content") as $div_class) {
// article title
foreach($div_class->find(".card__title") as $title ) {
$article[$i]['title'] = $title->plaintext;
}
// article posted date
foreach($div_class->find(".card__posted-on") as $post_date ) {
$article[$i]['date'] = trim($post_date->plaintext);
}
$i++;
}
}
echo '<pre>';
print_r($article);
exit;
?>
Output of the above code-
As you can see in the given screenshot, we could scrape the titles and post data of news articles in an array. Similarly, we can scrap more information according to our requirements.
Related Articles
PHP program to reverse a string
Electricity bill program in PHP
PHP remove last character from string
PHP String Contains
PHP Fix: invalid argument supplied for foreach
Ajax live data search using jQuery PHP MySQL
Fetch data from database in PHP and display
How to store image in database using PHP
How to display PDF file in PHP from database
How to read CSV file in PHP and store in MySQL
Create And Download Word Document in PHP
PHP SplFileObject Standard Library
Simple File Upload Script in PHP
Sending form data to an email using PHP
Recover forgot password using PHP and MySQL
Php file based authentication
Simple PHP File Cache
How to get current directory, filename and code line number in PHP