Data Model and Performance Metrics
- ER Diagram and Database Design
- ER Diagram (Figure 1.a):Represents entities and relationships in the BG system.
- Member Entity:
- Represents users with a registered profile, including a unique ID and a set of adjustable-length string attributes to create records of varying sizes.
- Each user can have up to two images:
- Thumbnail Image: Small (in KBs), used for displaying in friend lists.
- High-Resolution Image: Larger (hundreds of KBs or MBs), displayed when visiting a user profile.
- Using thumbnails significantly reduces system load compared to larger images.
- Friend Relationship:
- Captures relationships or friend requests between users. An attribute differentiates between invitations and confirmed friendships.
- Resource Entity:
- Represents user-owned items like images, questions, or documents. Resources must belong to a user and can be posted on their profile or another user’s profile.
- Manipulation Relationship:
- Manages comments and restrictions (e.g., only friends can comment on a resource).
- Member Entity:
- BG Workload and SLA (Service-Level Agreement)
Workload: BG supports defining workloads at the granularity of:
- Actions: Single operations like “view profile” or “list friends.”
- Sessions: A sequence of related actions (e.g., browsing a profile, sending a friend request).
- Mixed Workloads: A combination of actions and sessions.
Service-Level Agreement (SLA):
- Goal: Ensures the system provides reliable performance under specified conditions.
- Example SLA Requirements: SLA, e.g., 95% of requests to observe a response time equal to or faster than 100 msec with at most 0.1% of requests observing unpredictable data for 10 minutes.
Metrics:
- SoAR (Social Action Rating): Measures the highest number of actions per second that meet the SLA.
- Socialites: Measures the maximum number of concurrent threads that meet the SLA, reflecting the system’s multithreading capabilities.
- Performance Evaluation Example
- SQL-X System Performance:SQL-X is a relational database with strict ACID compliance.
- Initially, throughput increases with more threads.
- Beyond a certain threshold (e.g., 4 threads), request queuing causes response times to increase, reducing SLA compliance.
- With 32 threads, 99.94% of requests exceed the 100-millisecond SLA limit, indicating significant performance degradation.
- Concurrency and Optimization in BG
Concurrency Management:
- BG prevents two threads from emulating the same user simultaneously to realistically simulate user behavior.
Unpredictable Data Handling:
- Definition: Data that is stale, inconsistent, or invalid due to system limitations or race conditions.
- Validation:
- BG uses offline validation to analyze read and write logs.
- It determines acceptable value ranges for data and flags any reads that fall outside these ranges as unpredictable.
If SoAR is zero, the data store fails to meet SLA requirements, even with a single-threaded BGClient issuing requests.
Actions
Performance Analysis of View Profile
Performance of VP is influenced by whether profile images are included and their sizes.
Experiment Setup:
- Profile data tested with:
- No images.
- 2 KB thumbnails combined with profile images of 2 KB, 12 KB, and 500 KB sizes.
- Metrics: SoAR (Social Action Rating) measures the number of VP actions per second that meet the SLA (response time ≤ 100 ms).
Results:
No Images:
- MongoDB performed the best, outperforming SQL-X and CASQL by almost 2x.
12 KB Images:
- SQL-X’s SoAR dropped significantly, from thousands of actions per second to only hundreds.
500 KB Images:
SQL-X failed to meet the SLA (SoAR = 0) because transmitting large images caused significant delays.
MongoDB and CASQL also experienced a decrease in SoAR but performed better than SQL-X.
Role of CASQL:
CASQL outperformed SQL-X due to its caching layer (memcached):
During a warm-up phase, 500,000 requests populate the cache with key-value pairs for member profiles.
Most requests are serviced by memcached instead of SQL-X, significantly improving performance with larger images (12 KB and 500 KB).
Performance Analysis of List Friends
1. SQL-X
- Process:
- Joins the
Friends
table with theMembers
table to fetch the friend list. - Friendship between two members is represented as a single record in the
Friends
table.
- Joins the
- Performance:
- When ϕ (number of friends) is 1000, SQL-X struggles due to the overhead of joining large tables and fails to meet SLA requirements.
2. CASQL
- Process:
- Uses a memcached caching layer to store and retrieve results of the LF action.
- Results are cached as key-value pairs.
- Performance:
- Outperforms SQL-X when ϕ is 50 or 100 by a small margin (<10% improvement).
- At ϕ=1000, memcached’s key-value size limit (1 MB) causes failures, as the data exceeds this limit.
- Adjusting memcached to support larger key-value pairs (e.g., 2 MB for 1000 friends with 2 KB thumbnails) could improve performance.
3. MongoDB
- Process:
- Retrieves the
confirmedFriends
array from the referenced member’s document. - Can fetch friends’ profile documents one by one or as a batch.
- Retrieves the
- Performance:
- Performs no joins, but its SLA compliance is poor for larger friend counts.
- SoAR is zero for ϕ=50,100,1000, as it fails to meet the 100 ms response time requirement.
- For smaller friend lists (ϕ=10), MongoDB achieves a SoAR of 6 actions per second.
Mix of Read and Write Actions
- Purpose: Evaluates the performance of data stores under different ratios of read and write operations.
- Categories:
- Read actions: Include operations like View Profile (VP), List Friends (LF), and View Friend Requests (VFR).
- Write actions: Modify friendship relationships and invalidate cached key-value pairs (e.g., Invite Friend, Accept Friend Request).
- Mix Variations:
- Very low writes (0.1%): Dominantly read-heavy workloads.
- Low writes (1%): Slightly higher frequency of write actions.
- High writes (10%): Write-intensive workloads.
Performance Analysis (Mix of Read and Write Actions)
- SoAR Comparison:
- CASQL consistently achieves the highest SoAR for all write mixes due to its caching mechanism.
- MongoDB outperforms SQL-X by a factor of 3 across all workloads.
Observations by Write Percentage:
0.1% Writes (Read-Dominant):
CASQL significantly outperforms MongoDB due to efficient use of cached key-value pairs.
SQL-X lags due to the overhead of processing read actions directly from the RDBMS.
1% Writes:
CASQL remains the best performer but shows sensitivity to increasing writes as it invalidates cached data, redirecting more queries to the RDBMS.
MongoDB maintains a consistent performance advantage over SQL-X.
10% Writes (Write-Heavy):
CASQL slightly outperforms MongoDB, but the gap narrows due to the higher frequency of cache invalidations.
SQL-X continues to struggle with write-heavy workloads due to its lack of caching.
Session
Definition: A session is a sequence of actions performed by a socialite (user) in the social network.
Key Concepts:
- Think Time: Delay between consecutive actions within a session.
- Inter-Arrival Time: Delay between sessions initiated by different socialites.
Key Considerations
Dependencies:
Some sessions rely on specific database states (e.g., friends or pending requests).
For example, if m_i has no friends or pending requests, certain sessions terminate early.
Concurrency Handling:
BG uses in-memory data structures to simulate database states and prevent conflicts (e.g., multiple threads deleting the same comment).
Ensures integrity by managing semaphores and detecting unpredictable data.
Extensibility:
- BG allows developers to define new sessions by combining different mixes of actions.
Parallelism
BG’s Scalable Benchmarking Framework
To address these limitations, BG employs a shared-nothing architecture with the following components:
1. BGCoord (Coordinator)
- Role: Oversees and coordinates the benchmarking process.
- Responsibilities:
- Computes SoAR and Socialites ratings.
- Assigns workloads to BGClients and monitors their progress.
- Aggregates results (e.g., response times, throughput) for visualization.
- Process:
- Splits the workload among N BGClients.
- Ensures each BGClient works independently to prevent resource contention.
2. BGClient
- Role: Executes tasks assigned by BGCoord.
- Responsibilities:
- Creates a database based on BG specifications.
- Simulates workload actions and computes metrics like unpredictable data volume.
- Periodically reports metrics to BGCoord for aggregation.
3. Visualization Deck
- Role: Provides a user interface for monitoring and controlling the benchmarking process.
- Features:
- Allows users to configure parameters (e.g., SLA, workloads).
- Visualizes the ratings (SoAR, Socialites) and progress of the benchmarking.
Scaling with BGClients
- Fragmentation:
- The database is split into N logical fragments, each assigned to a BGClient.
- Each fragment includes unique members, friendships, and resources, ensuring no overlap between BGClients.
- Decentralized D-Zipfian Distribution:
- Used to balance workloads across nodes with different processing speeds.
- Faster nodes handle larger fragments, ensuring equal workload completion times.
Unpredictable Data
Definition: Data that is stale, inconsistent, or invalid, produced due to race conditions, dirty reads, or eventual consistency.
BG’s Validation Process
Validation Implementation
- Log Generation:
- BG generates read log records (observed values) and write log records (new or delta values).
- Offline Validation:
- For each read log entry:
- BG computes a range of valid values using overlapping write logs.
- If the observed value is outside this range, it is flagged as unpredictable.
- For each read log entry:
Impact of Time-to-Live (TTL) on Unpredictable Data
Results:
Higher TTL Increases Stale Data:
A higher TTL (e.g., 120 seconds) results in more stale key-value pairs, increasing the percentage of unpredictable data.
For T=100T = 100T=100, unpredictable data is:
- ~79.8% with TTL = 30 seconds.
- ~98.15% with TTL = 120 seconds.
Performance Trade-off:
A higher TTL improves performance (fewer cache invalidations) but increases stale data.
Lower TTL reduces stale data but impacts cache performance.
Heuristic Search for Rating
Why Use Heuristic Search?
- Exhaustive search starting from T=1 to the maximum T is time-consuming.
- MongoDB with T=1000 and Δ=10 minutes would take 7 days for exhaustive testing.
Steps in Heuristic Search:
Doubling Strategy:
Start with T=1, double T after each successful experiment.
Stop when SLA fails, narrowing down T to an interval.
Binary Search:
Identify the T corresponding to max throughput within the interval.
Used for both SoAR (peak throughput) and Socialites (maximum concurrent threads).
Questions
What system metrics does BG quantify?
SoAR (Social Action Rating):
- The highest throughput (actions per second) that satisfies a given SLA, ensuring at least α% of requests meet the response time β, with at most τ% of requests observing unpredictable data.
Socialites Rating:
- The maximum number of simultaneous threads (or users) that a data store can support while still meeting the SLA requirements.
Throughput:
- Total number of completed actions per unit of time.
Response Time:
- Average or percentile-based latency for each action.
Unpredictable Data:
- The percentage of actions that observe stale, inconsistent, or invalid data during execution.
How does BG scale to generate a large number of requests?
BG employs a shared-nothing architecture with the following mechanisms to scale effectively:
Partitioning Members and Resources:
BGCoord partitions the database into logical fragments, each containing a unique subset of members, their resources, and relationships.
These fragments are assigned to individual BGClients.
Multiple BGClients:
Each BGClient operates independently, generating workloads for its assigned logical fragment.
By running multiple BGClients in parallel across different nodes, BG can scale horizontally to handle millions of requests.
D-Zipfian Distribution:
To ensure realistic and scalable workloads, BG uses a decentralized Zipfian distribution (D-Zipfian) that dynamically assigns requests to BGClients based on node performance.
Faster nodes receive a larger share of the logical fragments, ensuring even workload distribution.
Concurrency Control:
- BG prevents simultaneous threads from issuing actions for the same user, maintaining the integrity of modeled user interactions and avoiding resource contention.
If two modeled users, A and B, are already friends, does BG generate a friend request from A to B?
No, BG does not generate a friend request from A to B if they are already friends.
Before generating a friend request, BG validates whether the relationship between A and B is pending or already confirmed. For example, in the InviteFrdSession
, BG only selects users who have no existing “friend” or “pending” relationship with the requester to receive a new friend request.
Reference: https://www.cidrdb.org/cidr2013/Papers/CIDR13_Paper93.pdf