#366 - Feature: Google Takeout


There's a Google product you should know about; it's called Google Takeout. It is not a food service, so don't get excited. Don't make the same joke that literally 5 people I've told about this have made.

Google Takeout is a service that allows you to take out your data from Google - in theory so that you could go to some other service and keep your data. In reality, it's probably best just for seeing an aggregate list of everything they have of yours and about you. I requested an archive of everything they had associated with my account, leaving out Google Drive (because I have 500GB+ in the cloud & don't need them to send me my own files back to me). Then I spent a week's worth of free time going through it all & taking 5 full pages of notes about what I found. This post is primarily the contents of those notes. Now, before we get into that.

I am pro-Google. I use or have used almost every one of their publicly-offered services Google is astounding. I was a Google Plus die-hard well beyond anyone reasonably should have been. I'm writing this from a desk with a little "Google" neon lamp on it... on a Google-developed browser... using Blogger. I think Google is a great and wonderful company that makes the world a better place. Google Takeout even re-instills my faith that the company is still living to their erstwhile motto: "Don't be evil". A company that goes through this much effort to enable you to not only see what they know about you, but also help you to take your business elsewhere... I think that's pretty great.

All that said...

Google is terrifying.

I know better than the average person off the street what Google probably had on me. I'm technically minded. I'm an engineer. I write code for work and for fun. I have a Life Tracker, for God's sake... and yet I was still shocked and awed by the immensity of it all. 

262 gigabytes 
148,172 files
4,552 folders

Granted, a large proportion of those numbers (230, 111974, & 3903, respectively) fall to my Google Photos export. Take out Google Photos and the dataset is still 31.6GB, 36,196 files, in 647 folders. 

In there, are many things you'd expect to find: 
  • all my photos from Google Photos 
    • 54,035 photos & 57,939 metadata files
    • including my photos and photos from others I've added to my collection
    • Metadata including geographic coordinates, photo views, and any comments associated with them
  • all my notes from Google Keep 
    • 855 notes & 279 attached media files
  • all my emails from Gmail 
    • one 2.5GB lumped text file
    • weirdly not sorted in any order I can figure out
  • all my chats from Hangouts 
    • one 328MB, 10,277,395 line .json file
    • weirdly, chat is also in the Gmail file... same chats in both, different format
    • this includes both sides of the conversation, images are also included in separate files - but only the ones I sent, not the ones I received
  • all the details for my contacts
    • one 60KB .vcf file & associated photos
  • all my purchases made through Google Express
    • including what I bought, for how much, where it was delivered, how I paid
  • every exercise session I ever logged with Google Fit
    • including what I did, when I did it, where I did it, and how far I traveled during
  • my entire search history 
    • split out by product (Search, developers, drive, analytics, apps, cloud, news, images, inbox, videos, maps, play store, and more)
  • the content of every Blogger post I've ever made
    • this sentence will be in there after I hit "Publish"
  • my browser history*
    • one 10MB .json file
  • my location history, which I'll expand on in a bit
*For some reason my browser history is incomplete. It only goes back to 2/11/2018, containing 26,334 URLs & timestamps

All of those things I knew I'd find. The photos, notes, emails, chats, and contacts are all clearly needed in order for those services to function. The search history, browser history, and physical location history I also knew to expect, although their direct application is slightly less obvious.

But that's not everything that was in there. 

Things I found, but was not expecting:
  • 11,500 audio recordings of me, all starting with "Okay Google..."
    • That's 15 hours, 52 minutes, and 41 seconds' worth of 5 seconds clip.
    • In these recording you hear me, but also anything else that was going on in the background.
    • Also there's an HTML file with a transcript of what it heard in each of these recordings, and a latitude/longitude for each.
  • An HTML file showing every time I opened an app on my Android phone, sometimes also showing what I did in the app.
    • There's only 76 thousand records in that file - which can't possibly be everything I did on my old phone. I'm not sure what was & wasn't included.
  • A log of every time I ever:
    • played a song on Google Play Music (including title & timestamp)
    • played a movie on Google Play Movies (including title & timestamp)
    • read a book on Google Play Books (including title & timestamp)
    • played a game on Google Play Games (including title & timestamp)
    • looked at an app on the Google Play Store (including title & timestamp)
    • installed an app from the Play Store (including what device, title & timestamp)
    • uninstalled an app (including what device, title & timestamp)
  • A log of everything I bought NOT through Google Express, provided I received an emailed receipt of the translation.
    • Including what I bought, from who, for how much, and where it was delivered
    • Mostly this is a list of Amazon and Dominos pizza orders
    • Google wasn't involved in the transactions, these logs must come from the smart email parsing
  • Google Fit health metrics, broken out into 15 minute chunks, for every day.
    • those columns are distance traveled, average heart rate, max heart rate, min heart rate, latitude, longitude
    • notice most of them are blank... and the step count seems oddly repetitive - I don't believe those numbers
    • weirdly these fitness files only go from 9/18/2014 to 11/13/2014, despite me continuing to use Google Fit to this day
  • Google Voice voicemails from other folks
    • this was probably shouldn't have surprised me, but it was weird hearing a random voicemail message from my dad playing when I opened an obscurely-titled .mp3 file
  • Location history - I included this in the "I knew they had it" section, but really it needed to go down here, and go last. The location history log doesn't just include location history.
    • timestamp + longitude + latitude, pretty straightforward
    • also, an accuracy indicator, pretty good inclusion
    • also altitude, sure
    • also velocity of travel? heading? 
    • also whether I was walking, in a car, on a bike, or staying still
    • ...also that there were 23,869,774 lines in that .json file, corresponding to 2,853,229 data points, going back to 6/29/2012
      • 6/29/2012 was 2465 days before I pulled these data
      • 2853229 data points / 2465 days = 1147.5 data points / day... aka one every 74 seconds (on average) for the past 6 1/2 years.
I didn't sign up for some sort of "special access" pass for Google to look into my life... no more so than most of you, that is. If you have a gmail account. If you use any Google Apps and services (which, you do) they have a similar aggregation available for you. I think it's worth taking a look into. They really don't make it hard.

Do I have a problem with them having this much data on me? Well, not really. They provide a great benefit to me in my life, but also the world. They provide services free of charge, in exchange for information about us so they can show relevant advertisements. To me, that's a very worthwhile trade. I wouldn't be nearly as happy or comfortable in my everyday life if Google didn't exist. Neither would you.

Also - Google is hardly the only tech company that has a treasure trove of data about each of us, yet (so far as I know) they are the only one that's willing to share what they have with you this easily. I'm sure Facebook and Amazon have an immense amount of information about each of us, but they don't give me the same utility that Google does. I pay Amazon for the privileged of using their Prime benefits. Facebook is a great social networking platform for those who want to use it that way... to me it provide negative utility - but that's a different story.

So that's Google Takeout. Thus ends my week of obsession with it. Time to move on to other things.

Quick edit: for a quicker look at things - check out https://myactivity.google.com/myactivity

I'll include this on this post, because I was interested. The data Google sent back to me was sent already broken out into folders by product. Each product may have a different file type from any other product. Prevalent file types were .mp3 for audio, regular .jpg or .png for pictures, calendar and contact files were their standard filetypees, .JSON for most datasets, .CSV was next most popular dataset type, followed by .XML, then niche filetypes (eg. ".MBOX", ".GOO", ".TCX"). Keep notes were kept in an HTML file that looked exactly like the note on the app. Photos were organized in folders by album title, if photos weren't in albums then albums were auto-generated using the date of the day they were taken. My 111,974 files were split into 3,903 folders.

Top 5: Services I Recommend YOU Getting an Archive Of
5. Photos - if you don't actually have your photos on your computer, you're one account loss away from losing all those memories.
4. Hangouts - if you used it, it's nice to have a searchable copy of your conversation history for posterity. Also you can do neat analytics if you're not scared of light coding work. I've sent 352,311 messages on Hangouts.
3. My Activity > Voice & Audio - just to hear what they have collected
2. My Activity > Android - just to see what they have collected
1. Location History - this is just good data... and there's a ton of it. You should know that this dataset exists.

Quote:
"That's creepy"
- Melissa, after hearing a recording of me telling Google to do something that she remembered from a few days ago -

Comments

Popular posts from this blog

#122 - Make-up Post

#370 - Feature: Aaron Information Management (AIM)

#100 - One Hundred