How to Import Data From Website to Google Sheets?
Today, it is crucial to understand the benefits of spreadsheets for your business, specifically ones that help you easily import data from a website to Google Sheets. Most people still rely on the handy copy-and-paste function to scrape data from a website to Google Sheets.
When the content is long, spread across a complete web page, and constantly updated, it can be a little tricky to use the traditional method. For situations like these, Google Spreadsheet offers various functions that allow users to pull data from different sources like websites, web pages, HTML, RSS feeds, and other specified spreadsheets. The ones pertaining to the import of data from a website are IMPORTXML, IMPORTHTML (import table from website to Google Sheets), and IMPORTDATA. In this article, we will show in detail, how to extract datasets and how to import data from websites to Google Sheets.
How does IMPORTDATA work in Google Sheets?
The IMPORTDATA function on Google Sheets allows users to export data in CSV (comma-separated value) and TSV (tab-separated value) files using the URL of the website where they are stored. It is a great tool for working with information for reporting, building dashboards, and other analytical purposes. The function does not actually pull the data from a website but only refers to the dataset in the defined URL, to publish the website data to Google Sheets for free.
To use the function, there is a formula syntax that needs to be followed:
=IMPORTDATA(URL*, delimiter, locale)
URL – It refers to the place where the URL is input.
delimiter – It is used to specify the delimiter when parsing the data.
locale – It is used to determine the language and region code that should be used.
Steps to follow when using the IMPORTDATA function:
- Open Google Sheets and type or paste the URL in a cell.
2. In the next cell, type in the formula syntax and input the cell with the URL.
3. The data will be displayed in your spreadsheet.
What is the difference between Importrange and Importdata?
IMPORTRANGE is a Google Sheet function that enables a form of cross-spreadsheet integration, by extracting the data from a range of cells on the website to a Google Sheets. This allows spreadsheet users to split up their information into multiple sheets while still viewing it using a simple formula. It also enables a level of collaboration where you can pubish the data from a third-party sheet (with permission) on your own.
The IMPORTRANGE formula syntax is:
=IMPORTRANGE("spreadsheet_url","range_string")
spreadsheet url – is the ID or URL of the exporting spreadsheet.
range_string – is used to specify the amount of cell information you wish to import.
As you can see, the Google Sheet IMPORTDATA function is used for importing data in a CSV and TSV format from a specific website, while the IMPORTRANGE function is used for pulling data from a range of cells from a specified spreadsheet.
How to display Google Sheet data on a website
Updating information on your website is one of the most effective ways to attract and keep visitors. There are two primary ways to publish Google Sheets to a website. First, you can use a powerful no-code website builder like SpreadSimple that allows you to create a beautiful feature-rich website using Google Sheet data as your CMS (Content Management System). As a result, anytime you change the information on the spreadsheet, it will automatically get updated on the web page.
The other way to display your data on your website is to embed Google Sheets on your platform. This function is very valuable especially if businesses or individuals want to keep their users updated on frequently changing information. For instance, you can format your sheet with the calendar template to show your readers about upcoming events on the website, and teachers or home tutors can post assignments, details, and due dates in a spreadsheet on their class website. Embedding Google Sheets into your web page can be helpful to pass live information to your users and ensure the data is always updated even with constant changes.
How to Format an Embedded Google Sheet
Before you can learn how to embed a Google Sheet into a website, correct formatting is required to ensure that it is displayed properly. Most formatting issues become apparent once you embed your spreadsheet. By structuring the data, you can ensure that it fits well on your web page. Here are a few things to keep in mind when creating a Google Sheet for embedding:
- Bold the text and use other forms of formatting;
- Ensure that the first row of the spreadsheet only contains column names;
- The spreadsheet size should be as compact as possible;
- Keep all the data you want to display is on a single tab;
- Make sure to delete any top rows in case the spreadsheet template is not used.
If you follow these simple guidelines, you should have your Google Sheet formatted and sized to display well on your website. Keep in mind to embed just one sheet from one tab as multiple tabs make it difficult for the users to navigate the spreadsheet.
How do I import a dataset into Google Sheets?
Many companies rely on specific web data for their growth. These organizations need to extract datasets such as emails, links, etc, and import a table from a website to Google Sheets rather than the whole information on the platform. There is always the primitive copy-and-paste method available, but it requires a lot of filtering and editing before the data can become useful. The IMPORTXML function in Google Sheets is the perfect tool for getting information from web pages for data mining or web scraping. This tool pulls information from any XML field, meaning any field that is bracketed by a <tag> and a </tag>.
The syntax for the IMPORTXML function is
=IMPORTXML(url, xpath_query)
url – refers to the URL of the website from where you are collecting the data.
xpath_query – is a parameter that tells the function what data to import.
You will need to know some basic HTML or XML markup that designates specific datasets in a webpage to be able to use this function. For instance, if you define the xpath_query as “//h1@title”, then you will get all information with the <h1> tags from the web page.
How to Extract a List of Postal Codes and City Districts
Now that we have understood which function to use for getting datasets, let us try it out with an example. Say you want to get the postal codes and the city districts they are assigned to for Edmonton City in Alberta, Canada.
- First, open the web page from where you want the postal codes in a browser window, and then open the page source by right-clicking or using Ctrl+U.
2. Select the desired postal code, right-click on it, and then click on “Inspect” to open your browser's tool to view the source code.
3. You will observe that each postal code is contained in a tag and that all TD tags contain the word Edmonton in “them”.
4. In your Google Spreadsheet, using the IMPORTXML function, we will grab all the content of the TD tag by specifying it in the xpath_query along with the URL of the web page. The syntax in the cell should look like this: =importxml("https://en.wikipedia.org/wiki/List_of_T_postal_codes_of_Canada", "//td").
5. When you will press Enter the information will look like this.
6. If you further observe the page source, you will see that the postal code is in the tag <b></b>, and the city names are in the tag <a></a>.
7. To further filter the data, you can modify the function command in columns A and B with this syntax: =importxml("https://en.wikipedia.org/wiki/List_of_T_postal_codes_of_Canada", "//td/span/a[1]") =importxml("https://en.wikipedia.org/wiki/List_of_T_postal_codes_of_Canada", "//td/b[1]")
8. The final imported data should look like this:
How to Automatically Copy Email Addresses from a Website
The steps are similar to extracting email addresses from a company’s website. We first start by opening the desired web page in the browser and then viewing the page source code.
- You will observe that the email address of the website will be in a field such as class=”email” for instance;
- After identifying the attribute of the tag assigned to emails, you will use the IMPORTXML function to retrieve the data. This is what the syntax of the SpreadSimple website would look like: =importxml("https://spreadsimple.com/", "//p[@class='footer__copyright']/a[1]");
- There is yet another way to retrieve data by matching the essential form (username@host.suffix);
- To pull a list of emails, we can give the query “obtain all the strings that are in the format username@hots.suffix” or in coding language [a-zA-Z0-9_-\.\+]+@[a-zA-Z0-9-\.]+\.[a-zA-Z0-9-]{2,15}
It might seem complicated but if you observe the syntax, you can see that the username before the @ and the host after it are nearly the same. This is because the emails take lowercase characters(a-z), uppercase characters (A-Z), numbers (0-9), underscores (_), dashes (-), and periods (.) with occasional, plus signs (+).
We hope our article has given you insight on how to import data from a website to Google Sheets.