Press Release | 2. January 2006
Analysis of a Network of 14.5 Million E-Mail Messages
Everyone knows that the friendships you make in college often shape your future decisions and opportunities. And with the advent of the internet, social ties have become easier than ever to maintain. A new study by Columbia University researchers Gueorgi Kossinets and Duncan Watts provides insight into how networks are formed during college and how they change and evolve over time.
The study, which appears in the January 6, 2006 issue of Science, analyzed a year of e-mail contacts among over 43,000 students, faculty, and staff at a large, private university. Kossinets and Watts collected anonymous data on the timestamp, sender, and recipients - but not the content - of over 14.5 million e-mail messages. They then cross-referenced that data with information about personal attributes (status, gender, age, etc.) of the individuals, and who attended and taught each class at the university.
Past studies have shown that e-mail communication is closely correlated with face-to-face and telephone interactions. Thus, e-mail exchanges are considered reasonable proxies for underlying social ties. And although different individuals use email differently (thus any one person's email exchanges may not accurately reflect his or her real social network), the large scale of this study - over 14.5 million e-mails - means that individual idiosyncrasies will tend to average out, leaving only generally shared tendencies.
This is the first time that researchers have compiled empirical data of this scale and detail. As a result, Kossinets and Watts were able to test directly a number of longstanding hypotheses regarding the evolution of social networks over time.
What is new or unique about the study?
"Sociologists have known for decades that social networks are properly understood not as static structure, but as temporal processes-that is, they evolve in time, as a function of individual decisions, group attributes, and organizational structure. In fact, most theoretical propositions about networks, like the importance of mutual friends and sharing activities or groups, have to do with events taking place sequentially: first you meet one person, then you meet their friends, and so on. So the idea of studying network formation over time isn't new-what's new is the ability to actually do it empirically, particularly on such a large scale, and therefore to put quantitative measures on what were previously qualitative concepts."
Why is that important?
"There are lots of theoretical propositions about how networks evolve, and individually they are all plausible. But as yet there is no overarching theory that determines when each of these specific propositions applies, and to what extent. Ultimately, therefore, both the factors that determine individual decisions about which new ties to make, and also how those decisions aggregate to create large-scale structures comprising many thousands, or even millions of individuals, remain empirical questions. Thus it's important that we are now finally getting our hands on the kind of data that can resolve some longstanding theoretical questions. We can't resolve all the interesting questions with this particular dataset, because there were some limitations on what we could collect, but we are making a start. Most importantly, perhaps we have demonstrated a proof of concept: using electronic communication data like email is a valid and powerful way to study the real-time evolution of social networks."
Research Findings
Some of their findings coincide with what one might expect intuitively. For example, they found that shared activities, such as taking courses together, or shared friends, greatly increase the probability that two individuals will interact. However, by quantifying these effects, they were able to add more detail than is possible simply by considering individual-level data. For example, sharing a single class has roughly the same effect as sharing a single mutual friend; but additional mutual friends count for more than additional shared classes. Also, shared activities and friends are more important than shared attributes like age, cohort, or gender.
Furthermore, some of their results indicate that popular and highly intuitive ideas about social networks may be wrong, or at least require serious qualification. For example, they found that although the global structure of the network remains remarkably stable over time (except in the vicinity of large changes, like the onset of summer recess, where many people change their routines simultaneously), individuals' networks are constantly changing. Thus, while the typical distance between pairs of people in a network may remain roughly constant over the course of a semester, the identities of the most highly connected individuals will be different depending on when the measurement is taken.
According to Watts, "The notion that we can fully understand a social network by examining a snapshot of it at a specific moment in time turns out to be valid in some respects, but not in others. Specifically, average properties can probably be estimated from static data, but not individual characteristics, positions, or rankings. What we see instead is a turbulent mass of individuals all making and breaking ties, and constantly shifting with respect to each other: one day A is better connected than B, and the next day they are reversed; one day a tie is weak, and two months later it is strong. But what's interesting is that all the activity and changes being made by individuals seem to more or less cancel each other out, such that the structural properties of the network as a whole appear stable.
"If you think of individuals as behaving randomly, and following only very general, structural regularities, like network proximity or shared activities, then perhaps this 'canceling out' property doesn't seem very surprising: many random processes are characterized by stable distributions over time, even though their moment to moment behavior is unpredictable. The surprise is that we're not so used to thinking of human behavior as a random process, because it doesn't seem random to us. Intuitively we think of humans as behaving strategically, attempting to satisfy instrumental ends, or at least following some complex narrative which is particular to them. We do not think of people just as particles jiggling around each other, whose decisions are determined less by what they think they're trying to do than by their location in some structure (the network) too large and complicated for them to even be aware of.
"So another way to view these results is that they suggest that people are less capable than they might think of strategically manipulating their positions in a large network. It's not that people can't benefit or suffer from their position in the network-that may indeed be true. Rather, it's that even if an individual makes a conscious effort to change his or her position in the network, such attempts can always be undone by the actions of others. The claim that individuals endowed with certain network attributes can "use" their networks to gain advantages over their peers should therefore be treated with caution."
See Also
- "Empirical Analysis of an Evolving Social Network." Science. January 6, 2006, Vol 311, Iss 5757: 88-90.
- Collective Dynamics Group





