Facebook announced today a new major feature to Facebook called Graph Search which is almost certain to have a huge impact on privacy. This new feature promises to make searching on Facebook meaningful. Anyone who has tried to search for content on Facebook (“Oh, what was that funny Les Mis review I read the other day…”) has met with the frustration of Facebook search. The search bar was really designed for things you were already connected with–your friends, pages you’d Liked, etc. But if you didn’t know who posted that review you were a bit out of luck. Enter Graph Search. The idea of which it will now be easier to search the content that is available to you on Facebook.
If you check the Graph Search intro video you’ll see a number of people talking about how this new feature will let you connect with your friends and friends of friends to pull relevant information. Restaurants you might want to visit. Places to go see. Bands you might like (which, anyone who has looked at what their friends listen to on Spotify, has about a 0% chance of being successful). But they’re also careful to tell you that this will roll out slowly. Some basic searches first, then more powerful searches later. There’s a link to try a sample search right now but you can’t customize it–when I clicked on it I received the results of a search called “People Who Live In My City.” Which I suppose is useful for people who forget where they live.
But why is Facebook rolling out this functionality? Isn’t Facebook the place where the motto forces you to “Move fast and break things?” Where did this new “let’s move slow and roll out some new things as it goes” approach come from? Especially for a feature that Zuckerberg calls the third pillar (along with your News Stream and Timeline). I think the answer is that for Graph Search to be effective it is going to have significant privacy implications.
The reason behind this collision of search and privacy has to do with how search engines work, particularly around indexing. There’s a nice article about how internet search engines work that includes a good description about indexing. But essentially indexing is what makes a search engine run fast enough for you to not pull out your hair in frustration the next time you’re looking for obscure 80s lyrics. And indexing, especially highly customized indexing like Google’s, is what makes a search engine work. It’s why you use Google or Bing rather than some other search engines out there.
Right now, Facebook is rolling out searches that are fairly simple because you could pull the information without a lot of difficulty already. Like where people live or photos of you. That isn’t hard stuff. But to get more complicated searches to run quickly they’re going to need to somehow index the information to search. That can be incredibly difficult since information available to me is not the same information available to you. This isn’t like Google which indexes a web page and then anyone can search for it. I should be able to see information from my friends that you can’t see unless you’re friends with them. Which adds a whole new complexity to the index that Facebook creates–each piece of searchable information may or may not be visible to the person doing the searching.
To some degree, this is no different than what Facebook does now. When you visit Facebook to get recent stories from your friends you see different stories than I do. That’s why we see those initial searches Facebook mentions–they’re slightly more interactive versions of what Facebook already does. But moving forward is going to be very different.
Later, if I want to Graph Search for “Winter” across my graph we could expect it to come up with photos with Winter in the caption or status updates mentioning the season or even restaurants my friends like that have a winter menu. But each of those pieces of information contains a dynamic privacy setting. What if the winter picture that was public yesterday is then changed to private? If the search is live, meaning done against all data at the time of the search, then those new privacy settings will have an impact. But a live search like that is incredibly slow and resource-intensive. If the search is done against an index, when does the index get updated? And what happens to the lag in-between the index privacy setting and the content privacy setting.
Look at it this way, the Google search engine doesn’t have this issue at all. The worst problem they have is when a previously available web page is deleted or modified. But even then Google still has the content in a cache that you can access. Google is indexing public web pages which have a binary privacy setting–either Google can see the page or it can’t. Now imagine chopping that page into a hundred pieces with each piece having a different privacy setting that can change over time as you are friended and unfriended, put in restricted access groups or taken out of them, included in page communities and excluded from others. That’s a lot of computing power.
For Facebook to maintain the level of privacy they have offered to the public (and which their terms describe) they’re either going to have to keep Graph Search incredibly simple (in which case it won’t be very useful, I’d say similar to Timeline today) or they’re going to have to index content which raises serious privacy questions. Facebook recognizes this conflict, that’s why they already have a page How Privacy Works with Graph Search. But for Graph Search to be truly feature-rich and useful to most Facebook users, I expect we’ll see some issues around indexed content pop up as the search engine rolls out.