MySQL 支持一主多从复制拓扑,且从库本身也可以配置为其他从库的源库,实现链式复制。这种异步复制模式允许在主库出现故障时,通过提升任一从库为新的主库来快速切换并恢复服务,极大增强了系统的高可用性。同时,通过将写操作集中到主库、将读操作分散到从库,也减轻了主库的负载,并可在从库上执行备份任务,避免对主库性能产生影响。由于主库无需等待从库完成写入,可获得更高的写入吞吐量,但也可能引入复制延迟问题,需要结合监控指标加以管理。为进一步降低从库的应用延迟,可使用并行复制功能,让 SQL 线程并发地执行多个复制通道中的事件。总体而言,基于 Binlog 的异步复制以其配置简单、可扩展性强和高可用性好等特点,成为 MySQL 最常用的主从复制方案。
全同步复制通常由 MySQL Group Replication 或 NDB Cluster 实现,要求主库在提交时,所有或一定多数的副本必须完成提交确认后才继续执行,从而实现真正的强同步。该模式在保证各节点数据实时一致性方面最强,但也带来最高的写延迟和最复杂的部署需求,通常用于对一致性和故障切换要求极高的场景。
百万千万级大表如何添加字段?
在面对上千万行的大表时,直接执行 ALTER TABLE … ADD COLUMN 往往会长时间锁表,影响线上业务;常见的无停机扩容方案包括:
It relaxes traditional SQL’s strict rules to handle modern, semi-structured data like JSON or CBOR.
SQL++ databases can store self-describing data, meaning you don’t need a predefined schema (data structure).
Supports diverse data types:
Data can be single values (scalars), tuples (a set of key-value pairs), collections (like arrays or multisets), or combinations of these.
Unlike traditional SQL, tuples in SQL++ are unordered, which means the order of attributes doesn’t matter.
Allows duplicate attribute names but discourages them:
This is to accommodate non-strict formats like JSON.
However, duplicate names can lead to unpredictable query results, so they’re not recommended.
Two kinds of missing values:NULL and MISSING:
NULL: Means an attribute exists but has no value.
MISSING: Means the attribute doesn’t exist at all.
This distinction is useful for clearer query results and error handling.
Importance ofMISSING:
SQL++ doesn’t stop processing if some data is missing; instead, it marks those cases as MISSING and continues.
This makes queries more robust and tolerant of data inconsistencies.
Accessing Nested Data
SQL-92 vs. Modern Data:
SQL-92 only supports tables with rows (tuples) containing simple values (scalars).
Modern data formats often include nested structures, where attributes can hold complex data types like arrays, tables, or even arrays of arrays.
Nested Data Example:
In the example, the projects attribute of an employee is an array of tuples, representing multiple projects each employee is involved in.
Querying Nested Data in SQL++:
SQL++ can handle such nested data without adding new syntax to SQL.
For example, a query can find employees working on projects with “security” in their names and output both the employee’s name and the project’s name.
How It Works:
SQL++ uses left-correlation, allowing expressions in the FROM clause to refer to variables declared earlier in the same clause.
For instance, e.projects accesses the projects of an employee e.
This relaxes SQL’s restrictions and effectively enables a join between an employee and their projects.
Using Variables in Queries:
SQL++ requires explicit use of variables (e.g., e.name instead of just name) because schema is optional and cannot guarantee automatic disambiguation.
If a schema exists, SQL++ can still optimize by rewriting the query for clarity and execution.
Flexibility with Nested Collections:
Variables in SQL++ can represent any type of data—whether it’s a table, array, or scalar.
These variables can be used seamlessly in FROM, WHERE, and SELECT clauses.
Aliases Can Bind to Any Data Type:
In SQL++, variables (aliases) don’t have to refer only to tuples.
They can bind to arrays of scalars, arrays of arrays, or any combination of scalars, tuples, and arrays.
Flexibility in Querying Nested Data:
Users don’t need to learn new query syntax for different data structures.
The same unnesting feature is used regardless of whether the data is an array of tuples or an array of scalars.
Example:
If the projects attribute is an array of strings (instead of tuples), SQL++ queries can still process it.
The query would range over e.projects and bind p to each project name (a string).
Relaxed Semantics Compared to SQL:
In traditional SQL, the FROM clause binds variables strictly to tuples.
SQL++ generalizes this by treating the FROM clause as a function that can bind variables to any type of data—not just tuples.
Practical Outcome:
In the example, the FROM clause produced variable bindings like {e: employee_data, p: project_name}.
This allows the query to handle data structures that SQL would not support without extensions.
ABSENCE OF SCHEMA AND SEMI-STRUCTURED DATA
Schemaless Data:
Many modern data formats (e.g., JSON) don’t require a predefined schema to describe their structure.
This allows for flexible and diverse data, but it also introduces heterogeneity.
Types of Heterogeneity:
Attribute presence: Some tuples may have a specific attribute (e.g., x), while others may not.
Attribute type: The same attribute can have different types across tuples. For example:
In one tuple, x might be a string.
In another tuple, x might be an array.
Element types in collections: A collection (e.g., an array or a bag) can have elements of different types. For example:
The first element could be a string, the second an integer, and the third an array.
Legacy or data evolution: These heterogeneities often result from evolving requirements or data conversions (e.g., converting XML to JSON).
Heterogeneity Is Not Limited to Schemaless Data:
Even structured databases can have heterogeneity. For example:
Hive’s union type allows an attribute to hold multiple types, like a string or an array of strings.
How SQL++ Handles It:
SQL++ is designed to work seamlessly with heterogeneous data, whether the data comes from a schemaless format or a schema-based system.
It offers features and mechanisms to process such data flexibly, without enforcing rigid structure requirements.
Missing Attributes
Representation of Missing Information:
In SQL, a missing value is typically represented as NULL (e.g., Bob Smith’s title in the first example).
In SQL++, there’s an additional option: simply omitting the attribute altogether (as seen in the second example for Bob Smith).
NULLvs.MISSING:
NULL: Indicates the attribute exists but has no value.
MISSING: Indicates the attribute is entirely absent.
SQL++ supports distinguishing between these two cases, unlike traditional SQL.
Why This Matters:
Some data systems or formats (e.g., JSON) naturally omit missing attributes rather than assigning a NULL value.
SQL++ makes it easy to work with both approaches by allowing queries to handle NULL and MISSING values distinctly.
Query Behavior:
Queries in SQL++ can propagate NULL and MISSING values as they are.
The system introduces the special value MISSING to represent absent attributes, allowing clear differentiation from NULL.
MISSING as a Value
What Happens When Data is Missing:
If a query references an attribute that doesn’t exist in a tuple (e.g., e.title for Bob Smith), SQL++ assigns the value MISSING.
This avoids query failures and ensures processing can continue.
Three Cases WhereMISSINGis Produced:
Case 1: Accessing a missing attribute. For example, {id: 3, name: 'Bob Smith'}.title results in MISSING.
Case 2: Using invalid input types for functions or operators (e.g., 2 * 'some string').
Case 3: When MISSING is an input to a function or operator, it propagates as MISSING in the output.
SQL Compatibility Mode:
In SQL compatibility mode, MISSING behaves like NULL for compatibility. For instance, COALESCE(MISSING, 2) will return 2, just as COALESCE(NULL, 2) does in SQL.
Propagation ofMISSINGin Queries:
In queries, MISSING values flow naturally through transformations, enabling consistent handling of absent data.
For example, in a CASE statement, if e.title evaluates to MISSING, the result of the entire CASE expression will also be MISSING.
Results withMISSING:
If a query result includes MISSING, SQL++ will omit the attribute from the result tuple.
In communication with external systems like JDBC/ODBC, MISSING is transmitted as NULL to ensure compatibility.
RESULT CONSTRUCTION,NESTING, AND GROUPING
Creating Collections of Any Value
Power ofSELECT VALUE:
The SELECT VALUE clause in SQL++ allows constructing collections of any type of data, not just tuples.
It enables creating outputs that match the structure of nested data without flattening it unnecessarily.
Example Query:
The query in Listing 10 demonstrates how to use SELECT VALUE to extract only the “security” projects of employees, resulting in a nested structure.
Each employee’s tuple includes their ID, name, title, and a collection of their security-related projects.
Result:
Listing 11 shows the result where each employee has a field security_proj containing a nested collection of projects that match the condition (e.g., projects with “Security” in the name).
Key Difference from Standard SQL:
SQL’s SELECT clause can be viewed as shorthand for SELECT VALUE, but with differences:
SQL automatically coerces subquery results into scalar values, collections of scalars, or tuples based on context.
In contrast, SELECT VALUE in SQL++ consistently produces a collection and does not apply implicit coercion.
Flexibility:
SQL++ avoids implicit “magic” by explicitly treating SELECT as shorthand for SELECT VALUE.
This approach aligns more closely with functional programming principles, making it easier to handle and compose nested data results.
GROUP BY and GROUP AS
Introduction toGROUP BY ... GROUP AS:
This feature extends SQL’s GROUP BY functionality, allowing groups (and their contents) to be directly accessible in the SELECT and HAVING clauses.
It is more efficient and intuitive for creating nested results compared to traditional SQL, especially when the output nesting doesn’t directly align with the input data structure.
How It Works:
Generalization: Unlike SQL, which limits access to grouped data in GROUP BY, SQL++ allows accessing the full group details as part of the query.
Pipeline Model: SQL++ processes queries in a step-by-step fashion, starting with FROM, followed by optional clauses like WHERE, GROUP BY, HAVING, and ending with SELECT.
Example:
In the query from Listing 12, employees are grouped by their project names (converted to lowercase), and a nested list of employees for each project is created.
The GROUP BY LOWER(p) AS p GROUP AS g clause groups data and stores each group in g.
The SELECT clause then extracts project names and employees.
Result:
The output (shown in Listing 13) contains nested objects:
Each object has a proj_name (e.g., 'OLTP Security') and an employees field listing the names of employees associated with that project.
Details ofGROUP BY ... GROUP AS:
The clause produces bindings like the ones in Listing 14, where each group (g) includes all the data for its corresponding key (p).
The result allows users to flexibly access and format the grouped data.
SQL++ Flexibility:
SQL++ allows placing the SELECT clause either at the start or the end of a query block, enhancing readability and flexibility.
This approach is more consistent with functional programming and reduces constraints found in traditional SQL.
Advanced Features:
SQL++ supports additional analytical tools like CUBE, ROLLUP, and GROUPING SETS, making it highly compatible with SQL but better suited for nested and semi-structured data.
Aggregate Functions
Limitations of Traditional SQL Aggregate Functions:
Aggregate functions like AVG and MAX in traditional SQL lack composability.
They work directly on table columns but don’t easily integrate with more complex expressions or subqueries.
SQL++ Solution:
SQL++ introduces composable aggregate functions, such as COLL_AVG (for calculating the average of a collection) and COLL_MAX.
These functions take a collection as input and return the aggregated value.
Importance of Composability:
In SQL++, data is conceptually materialized into a collection first, then passed to the composable aggregate function.
While this materialization is conceptual, SQL++ engines optimize the execution (e.g., using pipelined aggregation).
Example 1: Calculating the Average Salary of Engineers:
SQL++ Core Query (Listing 16): Converts e.salary into a collection and applies the COLL_AVG function.
SQL++ clearly defines the flow of data, making it more intuitive and flexible.
Example 2: Calculating the Average Salary of Engineers by Department:
SQL Query (Listing 17): Uses GROUP BY and AVG.
SQL++ Core Query (Listing 18):
Uses GROUP BY ... GROUP AS to form groups.
Feeds each group into COLL_AVG to calculate the average salary.
Constructs the result using the SELECT VALUE clause, explicitly specifying the output format.
Flexibility of SQL++ Style:
SQL++ allows the SELECT clause to be written at the end of a query block, consistent with functional programming styles.
This enhances readability and composability while maintaining compatibility with SQL.
Pivoting and Unpivoting
UNPIVOT: Transforming Attributes into Rows
What is Unpivoting?
Unpivoting is the process of converting attribute names (used as keys) into data rows.
This is useful for cases where key-value pairs in the data need to be analyzed as individual rows.
Example (Listing 19-21):
Input: A closing_prices collection where stock symbols (amzn, goog, fb) are attributes with prices as values.
Query (Listing 20): The UNPIVOT clause transforms these attributes into rows with fields for symbol and price.
Output (Listing 21): A flattened structure where each row contains the date, stock symbol, and price.
Pivoting
Purpose of Pivoting:
Pivoting transforms rows into attributes (columns).
Example from Listings 23-25:
Input (Listing 23): Rows of today_stock_prices where each stock symbol and its price are separate rows.
Query (Listing 24): The PIVOT operation turns these rows into a single object, using sp.symbol as attribute names and sp.price as their values.
Output (Listing 25): A tuple where each stock symbol (amzn, goog, fb) is an attribute, and their corresponding prices are the values.
Combining Grouping and Pivoting
Using Pivot with Grouping:
Combining GROUP BY and PIVOT enables aggregation of grouped rows into a more structured output.
This is particularly useful when working with time-series data or hierarchical datasets.
Example Query (Listing 26):
Input: Data from stock_prices (Listing 27), which includes stock prices for multiple dates as individual rows.
Query:
Groups the data by date using GROUP BY sp.date.
Pivots the grouped rows to produce a nested structure where each date contains all its stock prices as attributes.
Output (Listing 28): For each date, an object with a prices field lists the stock symbols as attributes and their respective prices as values.
Questions
SQL++ identifies aggregate functions as an SQL violation of functional composability. Give an example of an aggregate function and describe how it violates SQL’s functional composability.
Aggregate Function:COLL_AVG()
Violation Explanation:
In traditional SQL, aggregate functions like AVG processes the column and returns a single value.
In SQL++, this issue is resolved by providing composable versions of aggregate functions, such as COLL_AVG, which operate on collections, allowing intermediate results to flow naturally into the aggregation.
With SQL++, what is the difference between NULL and Missing?
NULL: Indicates that an attribute exists but has no value.
MISSING: Indicates that an attribute is completely absent in the data.
True or false: One must define a schema for data prior to using SQL++.
False:
SQL++ supports schema-optional and schema-less data formats, such as JSON.
While schemas can improve query optimization and validation, SQL++ can process data without requiring predefined schemas, making it highly flexible for semi-structured data use cases.
How does the I lease prevent a thundering herd?
The I lease (Inhibit Lease) prevents a thundering herd problem by ensuring that only one read session at a time is allowed to query the RDBMS for a missing key-value pair in the Key-Value Store (KVS). Here’s how it works:
Thundering Herd Problem:
When a key-value pair is not found in the KVS (a KVS miss), multiple read sessions might simultaneously query the RDBMS to fetch the value.
This can overload the RDBMS and degrade performance under high concurrency.
Role of the I Lease:
When the first read session encounters a KVS miss, it requests an I lease for the key.
Once the I lease is granted, the KVS prevents other read sessions from querying the RDBMS for the same key.
All other read sessions must “back off” and wait for the value to be updated in the KVS by the session holding the I lease.
Result:
The session with the I lease queries the RDBMS, retrieves the value, and populates the KVS.
Subsequent read sessions observe a KVS hit and do not need to access the RDBMS.
This mechanism avoids simultaneous RDBMS queries, effectively solving the thundering herd problem.
What is the difference between invalidate and refresh/refill for maintaining the cache consistent with the database management system?
Invalidate: Deletes stale cache entries to prevent incorrect reads, but at the cost of forcing subsequent queries to access the RDBMS.
Refresh/Refill: Proactively updates the cache with new data, ensuring consistent reads while reducing future load on the RDBMS at the expense of immediate computation.
Describe how CAMP inserts a key-value pair in the cache.
Check Cache Capacity
If there is enough memory to store the new key-value pair:
The pair is inserted directly into the appropriate priority group based on its cost-to-size ratio.
L is not updated.
If the cache is full:
CAMP selects one or more key-value pairs to evict based on their H(p) values.
It removes the pair(s) with the lowest H(p) values until there is sufficient space for the new pair.
Insert the New Pair
The new key-value pair p is added to the cache, and its H(p) value is computed and recorded.
The pair is placed in the appropriate priority queue based on its cost-to-size ratio.
How does BG compute the SoAR of a database management system?
Define the SLA.
Run a series of experiments with increasing numbers of threads (T) to find the peak throughput while ensuring SLA compliance.
也就是说,如果我们直接写 UPDATE account SET cash = cash - 9000 WHERE user = pUser;,那么 InnoDB 会对满足 user = pUser 的那一行加 X 锁,保证同时不会有别的事务也在修改它。但在很多业务里,我们并不能“直接更新”——往往要先读余额(SELECT),做业务判断(比如余额是否足够、是否与外部系统交互等),然后再走 UPDATE。在你只使用普通的SELECT …而没有加锁(FOR UPDATE)时,数据库不会对这条记录加行级排他锁,这时别的事务就可能同时也读到同一个旧余额,然后各自计算完毕后分别执行 UPDATE,造成后面那笔更新把前面那笔更新覆盖——这才是真正的丢失更新。
要避免丢失更新发生,需要让事务在这种情况下的操作变成串行化,而不是并行的操作。即在上述四个步骤的 1) 中,对用户读取的记录加上一个排他 X 锁。同样,在步骤 2) 的操作过程中,用户同样也需要加一个排他 X 锁。通过这种方式,步骤 2) 就必须等待 1) 和 3) 完成,最后完成步骤 4)。
InnoDB 存储引擎中的锁
锁的类型
按照兼容性可分为:
共享锁(S Lock),允许事务读一行数据。
排他锁(X Lock),允许事务删除或更新一行数据。
如果一个事务 T1 已经获得了 r 行的共享锁,那么另外的事务 T2 可以立即获得 r 行的共享锁,因为读取并没有改变 r 行的数据,称这种情况为锁兼容(Lock Compatible)。但若有其他的事务 T3 想获得 r 行的排他锁,则其必须等待事务 T1、T2 释放 r 行上的共享锁──这种情况称为锁不兼容。
若将上锁的对象看成一棵树,那么对最下层的对象上锁,也就是对最细粒度的对象进行上锁,那么首先需要对粗粒度的对象上锁。如下图,如果需要对页上的记录 r 进行 X 锁,那么首先需要对数据库 A、表、页上意向锁 IX,最后对记录 r 上 X 锁。若其中任何一个部分导致等待,那么该操作需要等待粗粒度锁的完成。比如说,在对记录 r 加 X 锁之前,已经有事务对该表进行了 S 表锁,那么表上已存在 S 锁,之后事务需要对记录 r 在表上加上 IX,由于不兼容,所以该事务需要等待表锁操作的完成。
SELECT r.trx_id AS waiting_trx_id, r.trx_mysql_thread_id AS waiting_thread, r.trx_query AS waiting_query, b.trx_id AS blocking_trx_id, b.trx_mysql_thread_id AS blocking_thread, b.trx_query AS blocking_query FROM information_schema.innodb_lock_waits w INNERJOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id INNERJOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id\G ***************************1.row*************************** waiting_trx_id: 73122F waiting_thread: 471719 waiting_query: NULL blocking_trx_id: 7311FC blocking_thread: 471718 blocking_query: NULL 1rowinset (0.00 sec)
CREATE TABLE z ( a INT, b INT, PRIMARY KEY(a), KEY(b) );
表 z 的列 b 是辅助索引,若在会话 A 中执行下面的 SQL 语句:
1 2 3 4 5 6
INSERT INTO z SELECT1,1; INSERT INTO z SELECT3,1; INSERT INTO z SELECT5,3; INSERT INTO z SELECT7,6; INSERT INTO z SELECT10,8; SELECT*FROM z WHERE b=3FORUPDATE;
很明显,这时 SQL 语句通过索引列 b 进行查询,因此其使用传统的 Next-Key Locking 技术加锁,并且由于有两个索引,其需要分别进行锁定。对于聚集索引,其仅对列 a 等于 5 的索引加上 Record Lock。而对于辅助索引,其加上的是 Next-Key Lock,锁定的范围是 (1, 3),特别需要注意的是,InnoDB 存储引擎还会对辅助索引下一个键值
加上 gap lock,即还有一个辅助索引范围为 (3, 6) 的锁。因此,若在新会话 B 中执行下面的 SQL 语句,都会被阻塞:
1 2 3
SELECT*FROM z WHERE a =5 LOCK IN SHARE MODE; INSERT INTO z SELECT4,2; INSERT INTO z SELECT6,5;
INSERT INTO z SELECT8,6; INSERT INTO z SELECT2,0; INSERT INTO z SELECT6,7;
从上面的例子中可以看到,Gap Lock 的作用是为了阻止多个事务将记录插入到同一范围内,而这会导致 Phantom Problem 问题的产生。例如在上面的例子中,会话 A 中用户已经锁定了 b = 3 的记录。若此时没有 Gap Lock 锁定 (3, 6),那么用户可以插入索引 b 列为 3 的记录,这会导致会话 A 中的用户再次执行同样查询时会返回不同的记录,即导致 Phantom Problem 问题的产生。
用户可以通过以下两种方式来显式地关闭 Gap Lock:
将事务的隔离级别设置为 READ COMMITTED
将参数 innodb_locks_unsafe_for_binlog 设置为 1
在上述的配置下,除了外键约束和唯一性检查依然需要的 Gap Lock,其余情况仅使用 Record Lock 进行锁定。但需要牢记的是,上述设置破坏了事务的隔离性,并且对于 replication,可能会导致主从数据的不一致。此外,从性能上来看,READ COMMITTED 也不会优于默认的事务隔离级别 READ REPEATABLE。
在 InnoDB 存储引擎中,对于 INSERT 的操作,其会检查插入记录的下一条记录是否被锁定,若已经被锁定,则不允许查询。对于上面的例子,会话 A 已经锁定了表 z 中 b = 3 的记录,即已经锁定了 (1, 3) 的范围,这时若在其他会话中进行如下的插入同样会导致阻塞:
1
INSERT INTO z SELECT2,2;
因为在辅助索引列 b 上插入值为 2 的记录时,会监测到下一个记录 3 已经被索引。而将插入修改为如下的值,可以立即执行:
1
INSERT INTO z SELECT2,0;
需要注意的是,对于唯一键值的锁定,Next-Key Lock 降级为 Record Lock 仅存在于查询所有的唯一索引列。若唯一索引由多个列组成,而查询仅是查找多个唯一索引列中的其中一个,那么查询其实是 range 类型查询,而不是 point 类型查询,故 InnoDB 存储引擎依然使用 Next-Key Lock 进行锁定。
如果程序是串行的,那么不可能发生死锁。死锁只存在于并发的情况,而数据库本身就是一个并发运行的程序,因此可能会发生死锁。下表的操作演示了死锁的一种经典的情况,即 A 等待 B, B 在等待 A,这种死锁问题被称为 AB-BA 死锁。
时间
会话 A
会话 B
1
BEGIN;
2
mysql> SELECT * FROM t WHERE a = 1 FOR UPDATE; ******** 1 row ******** a: 1 1 row in set (0.00 sec)
BEGIN
3
mysql> SELECT * FROM t WHERE a = 2 FOR UPDATE; ******** 1 row ******** a: 2 1 row in set (0.00 sec)
4
mysql> SELECT * FROM t WHERE a = 2 FOR UPDATE; # 等待
5
mysql> SELECT * FROM t WHERE a = 1 FOR UPDATE; ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
在上述操作中,会话 B 中的事务抛出了 1213 这个错误提示,即表示事务发生了死锁。死锁的原因是会话 A 和 B 的资源在互相等待。大多数的死锁 InnoDB 存储引擎本身可以侦测到,不需要人为干预。但是在上面的例子中,在会话 B 中的事务抛出死锁异常后,会话 A 中马上得到了记录 2 的这个资源,这其实是因为会话 B 中的事务发生了回滚,否则会话 A 中的事务是不可能得到该资源的。之前说过 InnoDB 存储引擎并不会回滚大部分的异常,但死锁除外。发现死锁后,InnoDB 存储引擎会马上回滚一个事务,这一点需要注意。因此如果在应用程序中捕获了 1213 这个错误,其实并不需要对其进行额外回滚。
此外还有另一种死锁,即当前事务持有了待插入记录的下一个记录的 X 锁,但是在等待队列中存在一个 S 锁的请求,则可能会发生死锁。来看一个例子,首先根据如下代码创建测试表 t,并导入一些数据:
1 2 3 4
CREATE TABLE t ( a INTPRIMARY KEY ) ENGINE=InnoDB; INSERT INTO t VALUES (1), (2), (4), (5);
下面演示两会话(会话 A 和会话 B)并发执行时发生死锁的过程:
时间
会话 A
会话 B
1
BEGIN;
2
BEGIN;
3
SELECT * FROM t WHERE a = 4 FOR UPDATE;
4
SELECT * FROM t WHERE a <= 4 LOCK IN SHARE MODE; – 等待
5
INSERT INTO t VALUES (3); ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
6
– 事务获得锁,正常运行
事务 A 在时间点 3:
执行 SELECT * FROM t WHERE a = 4 FOR UPDATE;
这条语句会对主键值 a=4 的记录加 X 锁。到此时刻,事务 A 持有对记录 (a=4) 的 X 锁,且锁保持不释放(事务尚未提交或回滚)。
事务 B 在时间点 4:
执行 SELECT * FROM t WHERE a <= 4 LOCK IN SHARE MODE;
由于条件 a <= 4 会匹配到主键值 1, 2, 4 三条记录,因此事务 B 需要对这些记录的索引范围(Range)加 S 锁。其中,对 (a=4) 那条记录上的 S 锁请求会因为事务 A 已经持有该行的 X 锁而被阻塞,导致事务 B 进入等待队列。
然而,事务 B 已在索引范围上对 (a <= 4) 加了 S 锁,并且正等待获取 (a=4) 上的 S 锁。换句话说,此时索引上存在:
事务 A:已对 a=4 加了 X 锁,还需要对间隙 (2, 4) 加插入意向 X 锁,才能插入 a=3。
事务 B:已对间隙 (2, 4)(更准确地说是对 a <= 4 范围)请求 S 锁,其中包括 (a=2)、(a=4) 以及它们之间的间隙。
为何形成死锁?
事务 B 正在等待 (a = 4) 上的 S 锁,而当事务 A 在时间点 5 尝试插入 a=3 时,InnoDB 会先对 (a = 3) 的插入位置加 X 锁(Gap Lock 或 Next-key Lock)。由于事务 B 已对 (a <= 4) 这个范围中的间隙(包括 (2, 4))持有 S 锁请求(还未获得,但已在等待队列中),所以此时:
事务 A 需要等到 Gap Lock 获得,而 Gap Lock 又受限于事务 B 已在其上排队的 S 锁;
同时,事务 B 需要等到 (a = 4) 上的 S 锁释放,而 (a = 4) 由事务 A 持有 X 锁;
这样就形成循环等待:
事务 A 等待事务 B 在间隙 (2, 4) 上的 S 锁释放(因为事务 B 在关键范围上排队),
事务 B 等待事务 A 释放 (a = 4) 上的 X 锁。
由于两者互相等待、谁都无法前进一步,InnoDB 判定为死锁。
InnoDB 如何选择回滚?
在常见的 AB-BA 死锁(即会话 A 等待会话 B 持有的资源,同时会话 B 等待会话 A 持有的资源)场景中,InnoDB 会回滚产生的事务中undo log 使用量最小的那个,以尽量减少回滚开销。
The IQ framework is a solution designed for Cache-Augmented SQL (CASQL) systems, which combine relational databases (RDBMS) and key-value stores (KVS) to boost performance by caching database query results. However, CASQL systems often face challenges related to stale data and race conditions. The IQ framework ensures strong consistency while maintaining high performance.
Challenges in CASQL Systems
Stale Data in Cache:
Cached data in the KVS can become outdated if updates to the RDBMS are not properly synchronized.
For example, if a record in the database is modified, but the corresponding cache entry isn’t updated, subsequent reads might return incorrect values.
Concurrency Issues:
Multiple sessions accessing and modifying the same key in KVS concurrently can lead to inconsistent results.
Example:
One session updates a value while another session modifies it based on outdated data.
RDBMS and Cache Coordination:
While RDBMS ensures transactional consistency, KVS often lacks this capability, making it difficult to synchronize their states.
Key Features of the IQ Framework
Lease Mechanism: Inhibit (I) and Quarantine (Q):
I Lease (for reads):
Ensures that only one session can query the RDBMS for a cache miss and update the KVS.
Other sessions attempting to read the same key must “back off” and wait.
Q Lease (for writes):
Required for modifying, deleting, or incrementally updating keys in the KVS.
If an I lease exists, the Q lease invalidates it to ensure the write operation’s integrity.
The KVS ignores I’s write operation because this I lease is no longer valid.
Lease Expiry:
A lease for a key has a fixed life time and is granted to one KVS connection (thread) at a time.
Expired leases are automatically released, ensuring system availability.
The finite life time enables the KVS to release the lease and continue processing operations in the presence of node failures hosting the application.
Session-based Model:
The framework operates through sessions, similar to the two-phase locking protocol.
Leases can be acquired either before or during an RDBMS transaction, providing flexibility.
Snapshot isolation is a multi-version concurrency control mechanism commonly used in RDBMS to allow concurrent transactions to execute efficiently. It guarantees:
Consistent Snapshot: All reads in a transaction observe the same consistent state of the database, as it existed at the transaction’s start.
Conflict Detection: A transaction can only commit if its updates do not conflict with updates made by other transactions since its snapshot was taken.
The Problem
Snapshot isolation can cause a race condition between a write session (S1) and a read session (S2) when KVS is involved. The issue unfolds as follows:
Write Session (S1):
S1 modifies the RDBMS and triggers a delete operation in the KVS to invalidate outdated key-value pairs.
S1 commits the transaction after completing its changes in the RDBMS.
Read Session (S2):
S2 starts after S1’s delete operation in the KVS. It observes a KVS miss for a key-value pair because S1 has invalidated it.
S2 queries the RDBMS to recompute the key-value pair. However, because snapshot isolation allows S2 to read an older snapshot of the database, it retrieves outdated (stale) data.
S2 inserts this stale data back into the KVS before S1 commits its changes to the RDBMS.
Inconsistency:
After both sessions complete, the KVS contains a stale key-value pair inconsistent with the RDBMS, leading to incorrect results for future reads.
Solution
I Lease (Inhibit Lease):
Used by read sessions (e.g., S2).
When a read session observes a KVS miss, it requests an I lease for the key (k_j) from the KVS server.
The I lease allows the read session to query the RDBMS, compute a value, and insert the computed key-value pair into the KVS.
If a Q lease is already in place, the I lease is denied, and the read session is told to back off and retry later.
Q Lease (Quarantine Lease):
Used by write sessions (e.g., S1).
When a write session plans to invalidate a key in the KVS, it requests a Q lease for the key (k_j).
The Q lease prevents other sessions (including those holding I leases) from modifying or inserting the key in the KVS.
Multiple Q leases can be granted for the same key since deleting a key is idempotent (doesn’t create conflicts).
Optimization
The Problem
In the original scenario, write sessions (e.g., S1) immediately delete key-value pairs in the KVS as soon as they acquire a Q lease (e.g., Step 1.3 in Figure 3).
This can cause read sessions (e.g., S2) to encounter KVS misses, triggering redundant operations like querying the RDBMS, recalculating values, and reinserting them into the KVS.
The Proposed Optimization
Deferring Key Deletion Until Write Commit
Key Changes:
Instead of deleting the key immediately in Step 1.3, the write session (S1) holds the Q lease and defers the deletion until the write session commits (Step 1.5).
While S1 is mid-flight, the invalidated key-value pair remains in the KVS for other read sessions (S2) to observe.
Handling KVS Hits:
Read sessions like S2 that encounter a KVS hit consume the “stale” key-value pair, treating it as valid.
This is acceptable because S2’s actions can be serialized to occur before S1, which is still in progress and has not yet committed its RDBMS changes.
Handling Write Aborts:
If a write session (S1) encounters an exception and aborts, the Q lease is released without deleting the key.
The current key-value pair in the KVS remains valid and accessible to other sessions.
Implementation Details
Versioning Concept:
The optimization can be conceptualized as maintaining a temporary version of the key-value pair for use by all sessions except the one currently invalidating it (S1).
Once S1 commits, the temporary version is removed.
Abort Command:
If a write session (S1) aborts due to constraints or exceptions, an abort command releases all Q leases held by S1 without deleting the key-value pair.
Without this command, Q leases would expire naturally after a timeout, during which no other session could modify or access the key.
Re-Arrangement Window:
With this optimization, S2 and S1 can be re-arranged in a serializable schedule where S2 logically occurs before S1.
Without the optimization, the re-arrangement window shrinks to zero because S2 would have already queried the RDBMS for stale data, violating consistency.
Refresh and Incremental Update
Key Issues with Compare-and-Swap (CAS)
CAS Limitation:
CAS alone cannot ensure strong consistency. It provides atomic updates to a single key-value pair but does not coordinate these updates with RDBMS transactions.
Example (Figure 2):
KVS writes can occur either:
Prior to the RDBMS transaction, or
As part of the RDBMS transaction.
Problem: If the RDBMS transaction aborts, the KVS will retain the modified key-value pair, potentially exposing dirty reads to other sessions.
Figure 6 (Dirty Read Problem):
Write session S1 modifies a key-value pair in KVS.
S1’s transaction later aborts, but the intermediate KVS value is consumed by a read session S2 before the rollback, leading to inconsistencies.
Developer Responsibility:
Without additional mechanisms, developers must implement complex logic to restore KVS key-value pairs to their original values when RDBMS transactions abort.
Race Conditions with Incremental Updates (δ Operations)
Figure 7 (Snapshot Isolation with δ Operations):
Write session S1 updates the RDBMS and KVS using an incremental update (e.g., appending to a value).
Concurrently, read session S2 queries the RDBMS and overwrites the key-value pair in the KVS.
Result: The KVS reflects inconsistent state, as S2’s overwrite may invalidate S1’s incremental change.
Figure 8 (Reordering KVS Operations):
Delaying KVS updates until after the RDBMS transaction doesn’t solve the problem.
Example in Figure 8:
S1 appends a change to a value based on its RDBMS view.
S2 modifies the RDBMS during S1’s execution, which S1 unknowingly incorporates into its KVS update.
Problem: S2’s modifications are reflected twice in the KVS, introducing inconsistencies.
Solution
Key Concepts in the Solution
Q Leases for Write Sessions:
A Q lease must be obtained for each key-value pair that a session intends to update.
This prevents race conditions by locking the key-value pair until the session completes its operations.
Steps for Write Sessions:
Step 1: Obtain Q leases for the keys to be updated before committing the RDBMS transaction. This can happen:
Before starting the RDBMS transaction.
As part of the RDBMS transaction.
Step 2: Write the updated key-value pairs to the KVS after committing the RDBMS transaction.
Step 3: Release the Q leases once the KVS is updated.
Automatic Cleanup: If a Q lease expires, the KVS deletes the associated key-value pair to avoid stale data.
Command Design for Write Operations:
QaRead (Quarantine-and-Read):
Acquires a Q lease on the referenced key and reads its value from the KVS.
If a Q lease for the same key is already held by another session, the requesting session receives an abort message, must roll back its RDBMS transaction, release all leases, back off, and retry later.
If no value exists in the KVS (a KVS miss), the application can:
Skip updating the key, or
Query the RDBMS, compute a new value, and insert it using SaR (below).
If a QaRead lease encounters an I lease held by a read session, it invalidates the I lease to prevent race conditions.
SaR (Swap-and-Release):
Updates the value of a key in the KVS with the new value and releases the Q lease.
If the new value is null, the Q lease is simply released without updating the KVS.
Handling Race Conditions
Q Leases for Concurrent Write Sessions:
If two write sessions request Q leases for the same key, the KVS resolves the conflict by:
Aborting one session.
Ensuring the aborted session retries later, serializing its updates after the session holding the Q lease.
This guarantees a valid serial schedule in the RDBMS and KVS.
Read Sessions and I Leases:
Read sessions use I leases to avoid race conditions when querying the KVS.
If a write session issues a QaRead that encounters an existing I lease, the I lease is invalidated to ensure the KVS reflects the latest updates from the RDBMS.
Integration with Two-Phase Locking
The Q lease mechanism resembles two-phase locking:
Growing Phase: The session acquires all necessary Q leases using QaRead before committing its RDBMS transaction.
Shrinking Phase: The session releases all Q leases using SaR after committing its RDBMS transaction.
Flexibility:
A session can issue QaRead commands either before starting the RDBMS transaction or as part of the transaction.
Key Concepts of Incremental Updates
Incremental Update Command: IQ-δ:
Purpose: Allows a write session to perform an incremental update, such as appending data to an existing key-value pair.
Syntax: IQ-δ(ki, δi)
ki: The key to be updated.
δi: The incremental change to apply (e.g., the value to append).
Similarities to QaRead:
Q Lease Requirement: Before issuing the IQ-δ command, the session must obtain a Q lease for the key ki to ensure exclusive access.
Abort on Conflict:
If another session already holds a Q lease on the same key (ki), the KVS returns an abort message.
Why is it acceptable for invalidate to delete cache entries?
Consistency Assurance: The cache entry being invalidated represents stale data that is no longer consistent with the current state of the RDBMS. Deleting it prevents read sessions from accessing outdated information.
How is a lease different than a lock?
Lease: Has a fixed lifetime and expires automatically after a certain duration. This makes leases useful in distributed systems where failures or delays could otherwise cause indefinite blocking.
Lock: Typically remains active until explicitly released, which can lead to deadlocks or indefinite resource contention if not managed properly.
True or False: IQ leases require changes to the RDBMS software.
False:
IQ leases do not require changes to the RDBMS software.
Instead, they extend the functionality of the Key-Value Store (KVS) by introducing new lease-based commands (e.g., QaRead and SaR) to coordinate operations between the KVS and the RDBMS. This design leverages existing RDBMS features without altering its underlying implementation.
What factors does CAMP consider when selecting a victim?
H(p) = L + size(p) / cost(p)
What is the definition of cost? Provide an example.
Computation Time: The time required to regenerate or recompute the data if it is evicted from memory.
Access Latency: The time it would take to fetch the data from disk or another slower storage tier.
Importance: The priority or weight assigned to the data based on how frequently or critically it is used.
How does CAMP insert a key-value pair in memory?
When a new key-value pair p needs to be inserted into memory, CAMP performs the following steps:
1. Check Cache Capacity
If there is enough memory to store the new key-value pair:
The pair is inserted directly into the appropriate priority group based on its cost-to-size ratio.
L is not updated.
If the cache is full:
CAMP selects one or more key-value pairs to evict based on their H(p) values.
It removes the pair(s) with the lowest H(p) values until there is sufficient space for the new pair.
2. Insert the New Pair
The new key-value pair p is added to the cache, and its H(p) value is computed and recorded.
The pair is placed in the appropriate priority queue based on its cost-to-size ratio.
With BG, what is the definition of Service Level Agreement, SLA?
SLA, e.g., 95% of requests to observe a response time equal to or faster than 100 msec with at most 0.1% of requests observing unpredictable data for 10 minutes.
Name one reason why a system may produce unpredictable data?
Eventual consistency. Or multiple threads are updating the same data item.