Affiliation:
1. University of Massachusetts Amherst, USA
Abstract
In traditional data analysis, data points lie in a Cartesian space, and an analyst asks certain questions: (1) What distribution can I fit to the data? (2) Which points are outliers? (3) Are there distinct clusters or substructure? Today, data mining treats richer and richer types of data. Social networks encode information about people and their communities; relational data sets incorporate multiple types of entities and links; and temporal information describes the dynamics of these systems. With such semantically complex data sets, a greater variety of patterns can be described and views constructed of the data. This article describes a specific social structure that may be present in such data sources and presents a framework for detecting it. The goal is to identify tribes, or small groups of individuals that intentionally coordinate their behavior—individuals with enough in common that they are unlikely to be acting independently. While this task can only be conceived of in a domain of interacting entities, the solution techniques return to the traditional data analysis questions. In order to find hidden structure, we use an anomaly detection approach: develop a model to describe the data, then identify outliers.
Reference19 articles.
1. Statistical Fraud Detection: A Review
2. A comparison of association indices
3. Eskin, E. (2000). Anomaly detection over noisy data using learned probability distributions. In Proc. 17th International Conf. on Machine Learning (pp. 255-262).
4. Fast, A., Friedland, L., Maier, M., Taylor, B., & Jensen, D. (2007). Data pre-processing for improved detection of securities fraud in relational domains. In Proc. 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 941-949).
5. Friedland, L., & Jensen, D. (2007). Finding tribes: Identifying close-knit individuals from employment patterns. In Proc. 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 290-299).