Working with Google Sheets in Python (Pandas) and R

TLDR; if you want a funcationality similar to R’s googlesheets package in Python, go for gspread_pandas rightaway. The only additional step you wil have to do is download a credentials file from your Google developers console.

Working in R I have had no problems working with Google Sheets. Reading, writing or editing data in Google Sheets has been quite easy. Thanks to the wonderful googlesheets package developed by Jenny Bryan and team. But how much I had taken its usability for granted became clear to me after I spent a day and a half trying to figure out how to get the same functionality within Python.

In short, I had wanted to read in some data from a Google Sheets as a pandas DataFrame object and it took my simple brain ages to figure it out. I tried gspread, pydrive, and a few others until I came across gspread_pandas which I think is built on top of gspread. Thanks to Diego Fernandez for building this wonderful package for people with limited Oauth experience and who don’t want to fiddle too much with flow, credentials and storage sequences that Google tried to teach me! To be honest getting authenticated via PyDrive wasn’t that difficult but I don’t think it has the functionality to read Google Sheets in much the same effortless way that gspread_pandas can.

Workflow to read Google Sheets data via R’s googlesheets package and Python’s gspread_pandas

While there is a wonderful demo for R’s googlesheets here, I will show some quick steps to get the data from this Google sheet on Gapminder data which contains life expectancy, total population, and GDP per capita since the year 1952 to 2007 for many countries.

Gapminder data

Gapminder data

Rest of the tutorial is on how to access and read that file in.

Credentials and getting access to your Google Drive

Via R’s googlesheets package

This step opens up a new browser window which asks to select my Google account whose Google Drive I would like to give access to.

library(googlesheets)
# generate new access token  
gs_auth()

Simple right?

Via Python gspread_python package

To achieve the same step in Python we have to first go to our Google developer account and do the following steps:

  1. Visit your Google console
  2. If you don’t have a project create one
  3. After you have created a project from the menu bar (three horizontal stripes on the top left side) go to APIs dashboard.
  4. Select Library option and search for
    4.1. Google Drive. Select it and click on Enable option
    4.2. Google Sheets. Select it and click on Enable option
  5. Now you have create your credentials. On the left-side menu click on Credentials
  6. Click on Create Credentials and select OAuth client ID option.
  7. If this is the first time you are doing this, Google will ask you to create a product (it assumes that you are creating this as part of an app or a product). Give it whatever name you like and click on Save
  8. Now Google asks what type of app is going to use these credentials. Select Other and click on Create button.
  9. Click on OK and then download your credentials by clicking on the down arrow on the right side of created OAuth 2.0 client IDs and store the downloaded file on your file system. The writer of gspread_pandas advises that you should name the file as google_secret.json and move file to ~/.config/gspread_pandas/google_secret.json. That’s where I stored mine. The folder did not exist, I created it myself.

If you are with me so far you now have to go to Python (assuming that you have installed the package gspread-pandas already, if not you can run pip install gspread-panadas now). If all has gone well so far this step opens up a new browser window which asks to select my Google account whose Google Drive I would like to give access to.

from __future__ import print_function
import pandas as pd
from gspread_pandas import Spread, Client  

# replace my name 'rachitkinger' with whatever user name you want  
# note that you have to provide the exact title of the spreadsheet
# which is Gapminder in this case  
gapminder = Spread('rachitkinger', 'Gapminder')

Reading in the data as a DataFrame

By this point in R you should have authenticated your library to access your Google Drive files and in Python, not only have you done that bit, but you have also gotten access to the specific Google Sheets file you want to work with.

via R’s googlsheet package

In googlesheets you have to first ‘register’ a spreadsheet before you can work with it.

library(dplyr)
# identify and register the sheet  
gapminder_sheet <- gs_title("Gapminder")

Note that the Gapminder spreadsheet has a worksheet for each continent. Let’s say we want to read in the data for all countries in Oceania. The name of that worksheet is ‘Oceania’ which will be the value we will pass to ws parameter in our function call. Since it is the fifth worksheet we could have also passed an integer call 5.

# now read it in as a dataframe (or tibble in this case)  
# Note that in the Gapminder
gapminder_oceania <- gs_read(gapminder_sheet, ws = "Oceania")
# alternatively
## gapminder_oceania <- gs_read(gapminder_sheet, ws = 5)

That’s it. You are ready to roll. Take a look at your dataframe.

head(gapminder_oceania)
# A tibble: 6 x 6
  country   continent  year lifeExp      pop gdpPercap
  <chr>     <chr>     <int>   <dbl>    <int>     <dbl>
1 Australia Oceania    1952    69.1  8691212    10040.
2 Australia Oceania    1957    70.3  9712569    10950.
3 Australia Oceania    1962    70.9 10794968    12217.
4 Australia Oceania    1967    71.1 11872264    14526.
5 Australia Oceania    1972    71.9 13177000    16789.
6 Australia Oceania    1977    73.5 14074100    18334.
via Python’s gspread-pandas package

Here we do not need to ‘register’ or ‘identify’ the worksheet since we have already done that as part of the authentication process. So we can straight away read in the dataframe. However, do note the indexing difference between Python and R will come into play here. The 5th worksheet will have the index 4 in Python.

# note that sheet=4 since Python indexing starts from 0
gapminder_oceania = gapminder.sheet_to_df(sheet=4)

Voila! Job done. Hope this was useful.