COMPSCI 753 Notes
Course outline sorted out by myself for quick revision
University of Auckland COMPSCI 753 - Algorithms for Massive Data
Data Stream
Find similar items
- Jaccard Similarity
- Shingling -> MinHash -> LSH
- MinHash
- Random permutations
- One-pass MinHash
Sampling data stream
- Fixed proportion
- Fixed size
- Reservoir Sampling
- Reservoir Sampling
Find frequent items
- [Deterministic] Misra-Gries
(m-m')/(k+1)
decrement steps at most
- [Randomized] CountMin Sketch
Filtering data stream
- Bloom Filter
- 1 hash function
- k hash functions
Locality Sensitive Search (LSH)
(r, c, p1, p2)
-sensitive- Jaccard similarity/distance
- MinHash
- Cosine similarity/distance
- SimHash
Graph
Link Analysis
- TF.IDF
- Term Spam
- PageRank
- Dead ends
- Recursively remove
- Taxation
- Dead ends
- Biased PageRank
- Link Spam (Spam Farm)
- Trust Rank
- Spam Mass
Social Network Analysis
- Small world property
- Power law degree distribution
- Core-Periphery structure
- Strength of ties
- Triadic closure
- Clustering Coefficient
- Triangle enumeration
- Community Detection
- K-core decomposition
- Influence Maximization
- Greedy-based
- Sketch-based
Join Analysis
- Natural join
- Semijoin
- Multiway join
- Acyclic join
- Yannakakis Algorithm