I hadn’t intended to write a series of posts on the intersection of social media and online identities, yet somehow, here we are with a third post. In a previous comment on my first post Ferns raised the issue of transparency and how companies hide the ‘how it works’ aspect. That’s a fascinating topic in itself, and so I wanted to circle back on it.
It’s clearly true that companies should do a better job of notifying users of what data they’re collecting. They don’t want to do that because there are only negative consequences for them. No user is going to say “I love that this huge faceless corporation knows all this stuff about me, but you’re missing out on a lot more private stuff I haven’t shared. Let me help you access that as well.” In reality, given more visibility of the data gathering process, users are only going to want to add constraints, which in turn hurts the companies product and their advertising revenue.
When it comes to the interpretation of the data – for example, why Facebook makes the friend suggestions that it does – then the story is more complicated. Machine learning and particularly deep learning is driving a lot innovation in big tech companies these days. Traditionally a software developer would analyze a problem and code up an algorithm to solve it. Now that same developer will specify the end result they want (these people are friends, these people are not friends), gather as much input data as they can (user location, hometown, school, posts they liked, etc.) and try and train a system to figure out the end result from the inputs. Typically this involves throwing a huge amount of computational power at the problem (which is why this has only become practical recently) and results in a black box that nobody really understands. Given the right inputs (e.g. data about users) this black box might be able to make excellent predictions about who is friends with who, but it can be difficult to say exactly why it makes any single prediction. So when companies say it’s difficult to share why certain suggestions were made, they might not be lying. They might not know themselves.
As an example of this, let’s consider the original case of the sex worker I talked about in my first post. I should be clear I know nothing about this beyond the public articles and I know nothing about Facebook’s internal algorithms or what data they have. This is speculation designed purely to illustrate the issue. That said, imagine if Facebook had access to the WiFi networks people accessed from their phones over time. Being on the same public network as someone else doesn’t mean much. Even repeatedly seeing the same networks at roughly the same time doesn’t mean much. Maybe you just happen to regularly go coffee at the same time and place as some other random person. But repeatedly being on the same networks at the same time, but in different places over many months would be indicative of a possible relationship. That’s the kind of correlation that a machine learning system could figure out. It’s also the kind of correlation that would occur for a sex worker regularly meeting the same group of clients at different hotels in a city.
Apologies if anyone visited here with the crazy idea of reading posts about femdom. Hopefully I’ll get back on that track in the next day or two. In the meantime I’ll continue my theme of old school anonymity via masquerade style masks. This is the lovely Anne Hathaway, the one bright spot in the otherwise terrible Dark Knight Rises.