Local Outlier Factor#

Local Outlier Factor (LOF)#

The Local Outlier Factor (LOF) is a method used for identifying anomalies or outliers in data based on the density of the data points. LOF assigns an anomaly score to each point, indicating how much of an outlier it is. Here’s a detailed explanation of the concepts involved:

Core Distance#

The core distance of a point \( p \) is the distance from \( p \) to its \( k \)-th nearest neighbor. If \( k \) is the specified number of neighbors (a parameter of the algorithm), then the core distance of \( p \) is:

\[ \text{core\_distance}(p, k) = \text{distance}(p, k\text{-th nearest neighbor}) \]

Reachability Distance#

The reachability distance between a point \( p \) and another point \( o \) is defined as the maximum of the core distance of \( o \) and the actual distance between \( p \) and \( o \). It is used to ensure that the reachability distance respects the density of \( o \):

\[ \text{reachability\_distance}(p, o) = \max(\text{core\_distance}(o, k), \text{distance}(p, o)) \]

Local Reachability Density (LRD)#

The Local Reachability Density (LRD) of a point \( p \) is the inverse of the average reachability distance based on the \( k \)-nearest neighbors of \( p \). It represents the density around \( p \):

\[ \text{LRD}(p) = \left( \frac{\sum_{o \in N_k(p)} \text{reachability\_distance}(p, o)}{|N_k(p)|} \right)^{-1} \]

where \( N_k(p) \) is the set of \( k \)-nearest neighbors of \( p \).

Local Outlier Factor (LOF)#

The LOF score of a point \( p \) compares its local density to the local densities of its neighbors. A higher LOF score indicates that the point is an outlier:

\[ \text{LOF}(p) = \frac{\sum_{o \in N_k(p)} \frac{\text{LRD}(o)}{\text{LRD}(p)}}{|N_k(p)|} \]

Interpreting LOF Scores#

  • LOF ≈ 1: The point is similar in density to its neighbors and is likely an inlier.

  • LOF > 1: The point is less dense compared to its neighbors and is likely an outlier.

  • LOF < 1: The point is denser compared to its neighbors and is likely an inlier.

Real-World Example#

Imagine you have a dataset of transaction amounts from a retail store. Most transactions are small amounts (e.g., \(10 to \)50), but occasionally, there are larger transactions (e.g., \(200 to \)500). By applying the LOF algorithm, you can identify the larger transactions as potential outliers, which might indicate fraudulent activity or unusual purchasing behavior.

Inline and Outlier Determination#

Using LOF scores, we can classify points as follows:

  • Inlier: If the LOF score is close to or less than 1.

  • Outlier: If the LOF score is significantly greater than 1.

Conclusion#

The Local Outlier Factor (LOF) is a robust method for identifying outliers based on the density of data points. It helps in distinguishing anomalies by comparing the local density of a point to that of its neighbors.