Home CMS Planet MySQL

    PostHeaderIcon Newsfeeds

    Planet MySQL
    Planet MySQL - https://planet.mysql.com

    • Fun with Bugs #75 - On MySQL Bug Reports I am Subscribed to, Part XII
      From the lack of comments to my previous post it seems everything is clear with ERROR 1213 in different kinds and forks of MySQL. I may still write a post of two about MyRocks or TokuDB deadlocks one day, but let's get back to my main topic of MySQL bugs. Today I continue my series of posts about community bug reports I am subscribed to with a review of bugs reported in November, 2018, starting from the oldest and skipping those MySQL 8 regression ones I've already commented on. I also skip documentation bugs that should be a topic for a separate post one day (to give more illustration to these my statements).These are the most interesting bug reports from Community members in November 2018: Bug #93139 - "mysqldump temporary views missing definer". This bug reported by Nikolai Ikhalainen from Percona looks like a regression (that can appear in a bit unusual case of missing root user) in all versions starting from 5.6. There is no regression tag, surely. Also for some reason I do not see 8.0.x as affected version, while from the text it seems MySQL 8 is also affected. Bug #93165 - "Memory leak in sync_latch_meta_init() after mysqld shutdown detected by ASan". This bug was reported by Yura Sorokin from Percona, who also made important statement in his last comment (that I totally agree with):"In commit https://github.com/mysql/mysql-server/commit/e93e8db42d89154b37f63772ce68c1efda637609 you literally made 14 MTR test cases ignore ALL memory problems detected by ASan, not only those which you consider 'OK' when you terminate the process with the call to 'exit()'. In other words, new memory leaks introduced in FUTURE commits may not be detected because of those changes. Address Sanitizer is a very powerful tool and its coverage should be constantly extending rather than shrinking." Bug #93196 - "DD crashes on assert if ha_commit_trans() returns error". It seems Vlad Lesin from Percona spent notable time testing everything related to new MySQL 8 data dictionary (maybe while Percona worked on their Percona Server for MySQL 8.0 that should have MyRocks also supported, should be able to provide native partitioning and proper integration with data dictionary). See also his Bug #93250 - "the result of tc_log->commit() is ignored in trans_commit_stmt()". Bug #93241 - "Query against full text index with ORDER BY silently fails". Nice finding by Jonathan Balinski, with detailed test cases and comments added by Shane Bester. One more confirmation that FULLTEXT indexes in InnoDB are still problematic. Bug #93276 - "Crash when calling mysql_real_connect() in loop". Nice regression in C API (since 8.0.4!) noted by Reggie Burnett and still not fixed. Bug #93321 - "Assertion `rc == TYPE_OK' failed". The last but not the least, yet another debug assertion (and error in non-debug build) found in MySQL 8.0.13 by Roel Van de Paar from Percona. You already know where QA for MySQL happens to large extent, don't you? Bug #93361 - "memory/performance_schema/table_handles have memory leak!". It's currently in "Need Feedback" status and may end up as not a bug, but I've never seen 9G of memory used for just one Performance Schema table so far. It's impressive. Bug #93365 - "Query on performance_schema.data_locks causes replication issues". Probably the first case when it was proved that query to some Performance Schema table may block some important server activity. Nice finding by Daniël van Eeden. Bug #93395 - "ALTER USER succeeds on master but fails on slave." Yet another way to break replication was found by Jean-François Gagné. See also his Bug #93397 - "Replication does not start if restart MySQL after init without start slave." Bug #93423 - "binlog_row_image=full not always honored for binlog_format=MIXED". For some reason this bug (with a clear test case) reported by James Lawrie is still "Open". Bug #93430 - "Inconsistent output of SHOW STATUS LIKE 'Handler_read_key';". This weird inconsistency was found by Przemysław Skibiński from Percona. Thinking about the future of MySQL 8 somewhere in Greenwich... To summarize this review: I obviously pay a lot of attention to bug reports from Percona engineers. It seems memory problems detected by ASan in some MTR test cases are deliberately ignored instead of being properly fixed. There are still many surprises waiting for early adopters of MySQL 8.0 GA :)  That's all I have to say about specific MySQL bugs in 2018. Next "Fun with Bugs" post, if any, will appear only next year. I am already subscribed to 11 bugs reported in December 2018. Stay tuned!

    • Group Replication: A member in “RECOVERING” state is part of the primary partition
      If you are using MySQL InnoDB Cluster (Group Replication) with ProxySQL, you should be familiar with the 2 functions and 1 view required in SYS Schema that ProxySQL uses to see if a node is online, partitioned or not, and if it’s lagging or not (see link1 and link2). I received recently a very valuable contribution from Bruce DeFrang that fixes a bug in one of the function that were added to SYS. In fact, Bruce discovered that when a node was in RECOVERING state, it was not count in the Primary Partition. This could lead in having the only ONLINE Primary Master considered as being partitioned and therefore, ProxySQL won’t consider the node as a valid candidate for routing the queries to it. I already updated the original gist with these addition, so if you are linking it somewhere, you have now the fixed version. For the others, here is the file: addtion_to_sys_8.0.2.sql The same file is of course valid for all MySQL >= 8.0.2. In conclusion, thank you Bruce for considering MySQL Group Replication and thank you for sharing your comments with me and for contributing back your fix.

    • LSM math - size of search space for LSM tree configuration
      I have written before and will write again about using 3-tuples to explain the shape of an LSM tree. This makes it easier to explain the configurations supported today and configurations we might want to support tomorrow in addition to traditional tiered and leveled compaction. The summary is that n LSM tree has N levels labeled from L1 to Ln and Lmax is another name for L1. There is one 3-tuple per level and the components of the 3-tuple are (type, fanout, runs) for Lk (level k) where: type is Tiered or Leveled and explains compaction into that level fanout is the size of a sorted run in Lk relative to a sorted run from Lk-1, a real and >= 1 runs is the number of sorted runs in that level, an integer and >= 1 Given the above how many valid configurations exist for an LSM tree? There are additional constraints that can be imposed on the 3-tuple but I will ignore most of them except for limiting fanout and runs to be <= 20. The answer is easy - there are an infinite number of configurations because fanout is a real. The question is more interesting when fanout is limited to an integer and the number of levels is limited to between 1 and 10. I am doing this to explain the size of the search space but I don't think that fanout should be limited to an integer. There are approximately 2^11 configurations only considering compaction type, which has 2 values, and 1 to 10 levels because there are 2^N configurations of compaction types for a tree with N levels and the sum of 2^1 + 2^2 + ... + 2^9 + 2^10 = 2^11 - 1 But when type, fanout and runs are considered then there are 2 x 20 x 20 = 800 choices per level and 800^N combinations for an LSM tree with N levels. Considering LSM trees with 1 to 10 levels then the number of valid configurations is the sum 800^1 + 800^2 + ... + 800^9 + 800^10. That is a large number of configurations if exhaustive search were to be used to find the best configuration. Note that I don't think exhaustive search should be used.

    • Replicating data into Clickhouse
      Clickhouse is a relatively new analytics and datawarehouse engine that provides for very quick insertion and analysing of data. Like most analytics platforms it’s built on a column-oriented storage basis and unlike many alternatives is completely open source. It’s also exceedingly fast, even on relatively modest platforms. Clickhouse does have some differences from some other environments, for example, data inserted cannot easily be updated, and it supports a number of different storage and table engine formats that are used to store and index the information. So how do we get into that from our MySQL transactional store? Well, you can do dumps and loads, or you could use Tungsten Replicator to do that for you. The techniques I’m going to describe here are not in an active release, but use the same principles as other part of our data loading. We’re going to use the CSV-based batch loading system that is employed by our Hadoop, Vertica and Amazon Redshift appliers to get the data in. Ordinarily we would run a materialization step that would merge and update the data from the staging tables, which import the raw change information and turn that into ‘base’ or carbon copy tables. We can’t do that with Clickhouse as the data cannot be modified once imported, but we can still use the information that gets imported. If you are familiar with the way we load data in this method, you will know that we import information using a CSV file and each row of the file is either an INSERT or DELETE, with an UPDATE operation being simulated by a DELETE followed by an INSERT. All rows are also tagged with date, time, and transaction ID information, we can always identify the latest update. Finally, one other thing to note about the Clickhouse environment, and that’s the data types are defined slightly differently. In most databases we are familiar with INT, or LONG or VARCHAR. Within Clickhouse the datatypes you use within the database for table fields more closely match the types in C, so Int32 or Int64. That means creating a simple table uses a definition like this: CREATE TABLE sales.stage_xxx_msg ( tungsten_opcode String, tungsten_seqno Int32, tungsten_row_id Int32, tungsten_commit_timestamp String, id Int32, msg String ) ENGINE = Log; You can also see we dont have a timestamp datatype, or CHAR/VARCHAR, just String. With all that in mind, let’s try loading some data into Clickhouse using Tungsten Replicator! First, a basic MySQL extraction recipe: tools/tpm configure alpha \ --disable-relay-logs=true \ --enable-heterogeneous-service=true \ --install-directory=/opt/continuent \ --master=ubuntuheterosrc \ --mysql-allow-intensive-checks=true \ --replication-password=Tamsin \ --replication-user=root \ --skip-validation-check=MySQLMyISAMCheck We’re going to use a fairly standard replicator install, extracting from a basic MySQL 5.7 server and insert the change data into Clickhouse. For the Clickhouse side, we’ll use the batch applier with a different, custom, template: tools/tpm configure alpha \ --batch-enabled=true \ --batch-load-template=clickhouse \ --datasource-mysql-conf=/dev/null \ --datasource-type=file \ --install-directory=/opt/continuent \ --master=ubuntuheterosrc \ --members=clickhouse2 \ --property=replicator.datasource.global.csvType=vertica \ --replication-password=password \ --replication-port=8123 \ --replication-user=tungsten \ --skip-validation-check=InstallerMasterSlaveCheck \ --start-and-report=true That’s it! We make one other change from other installations, in that because we cannot update information in Clickhouse, rather than using Clickhouse to store the Replicator status information, we’ll use the File datasource type, which stores the information within a file on the local filesystem. To generate this information I’ll generate about 18,000 transactions of data which is a mixture of INSERT, DELETE and UPDATE operations, we’ll load this into MySQL in tandem across 20 threads. Let’s run the load and check clickhouse: clickhouse2 :) select * from stage_xxx_msg limit 10; SELECT * FROM stage_xxx_msg LIMIT 10 ┌─tungsten_opcode─┬─tungsten_seqno─┬─tungsten_row_id─┬─tungsten_commit_timestamp─┬─id─┬─msg──────────────────┐ │ I │ 15 │ 1 │ 2018-12-12 09:48:17.000 │ 9 │ 4qwciTQiKdSrZKCwflf1 │ │ I │ 16 │ 2 │ 2018-12-12 09:48:17.000 │ 10 │ Qorw8T10xLwt7R0h7PsD │ │ I │ 17 │ 3 │ 2018-12-12 09:48:17.000 │ 11 │ hx2QIasJGShory3Xv907 │ │ I │ 19 │ 1 │ 2018-12-12 09:48:17.000 │ 12 │ oMxnT7RhLWpvQSGYtE6V │ │ I │ 20 │ 2 │ 2018-12-12 09:48:17.000 │ 13 │ fEuDvFWyanb1bV9Hq8iM │ │ I │ 23 │ 1 │ 2018-12-12 09:48:17.000 │ 14 │ oLVGsNjMPfWcxnRMkpKI │ │ I │ 25 │ 2 │ 2018-12-12 09:48:17.000 │ 15 │ w3rYUrzxXjb3o9iTHtnS │ │ I │ 27 │ 3 │ 2018-12-12 09:48:17.000 │ 16 │ aDFjRpTOK6ruj3JaX2Na │ │ I │ 30 │ 4 │ 2018-12-12 09:48:17.000 │ 17 │ SXDxPemQ5YI33iT1MVoZ │ │ I │ 32 │ 5 │ 2018-12-12 09:48:17.000 │ 18 │ 8Ta8C0fjIMRYEfVZBZjE │ └─────────────────┴────────────────┴─────────────────┴───────────────────────────┴────┴──────────────────────┘ 10 rows in set. Elapsed: 0.005 sec. Analysing the overall times, I processed 358,980 transactions through MySQL and into Clickhouse using relatively modest virtual machines on my laptop and it took 538 seconds. That’s about 670 transactions a second. Bear in mind we’re comitting every 100 rows here, larger commit intervals would probably be quicker overall. This is using the default settings, and I know from past testing and imports that I can go much faster. I’d count that as a success! Bear in mind we’re also writing to separate databases and tables here, but with the adddbname filter and the modified applier we can insert all of that data into a single table so that if you are concentrating data into a single database/table combination you can do this in one step with Tungsten Replicator. As I said before, Clickhouse is not currently a supported target for the Replicator, but if you are interested please get in touch!

    • Some Notes on MariaDB system-versioned Tables
      As mentioned in a previous post, I gave a talk at Percona Live Europe 2018 about system-versioned tables. This is a new MariaDB 10.3 feature, which consists of preserving old versions of a table rows. Each version has two timestamps that indicate the start (INSERT,UPDATE) of the validity of that version, and its end (DELETE, UPDATE). As a result, the user is able to query these tables as they appear at a point in the past, or how data evolved in a certain time range. An alternative name for this feature is temporal table, and I will use it in the rest of this text. In this post, I want to talk a bit about temporal tables best practices. Some of the information that I will provide is not present in the documentation; while they are based on my experience and tests, there could be errors. My suggestions for good practices are also based on my experience and opinions, and I don’t consider them as universal truths. If you have different opinions, I hope that you will share them in the comments or as a separate blog post. Create temporal columns It is possible – but optional – to create the columns that contain the timestamps of rows. Since there is no special term for them, I call them temporal columns. MariaDB allows us to give them any name we like, so I like to use the names valid_from and valid_to, which seem to be some sort of de facto standard in data warehousing. Whichever names you decide to use, I advise you to use them for all your temporal columns and for nothing else, so that the meaning will be clear. Temporal columns are generated columns, meaning that their values are generated by MariaDB and cannot be modified by the user. They are also invisible columns, which means that they can only be read by mentioning them explicitly. In other words, the following query will not return those columns: SELECT * FROM temporal_table; Also, that query will only show current versions of the rows. In this way, if we make a table temporal, existing applications and queries will continue to work as before. But we can still read old versions and obtain timestamp with a query like this: SELECT *, valid_from, valid_to     FROM temporal_table FOR SYSTEM_TIME ALL     WHERE valid_from < NOW() - INTERVAL 1 MONTH; If we don’t create these columns, we will not be able to read the timestamps of current and old row versions. We will still be able to read data from a point in time or from a time range by using some special syntax. However, I believe that using the consolidated WHERE syntax is easier and more expressive than using some syntax sugar. Primary keys For performance reasons, InnoDB tables should always have a primary key, and normally it shouldn’t be updated. Temporal tables provide another reason to follow this golden rule – even on storage engines that are not organised by primary key, like MyISAM. The reason is easy to demonstrate with an example: SELECT id, valid_from, valid_to FROM t FOR SYSTEM_TIME ALL WHERE id IN (500, 501); +-----+----------------------------+----------------------------+ | id | valid_from | valid_to | +-----+----------------------------+----------------------------+ | 500 | 2018-12-09 12:22:45.000001 | 2018-12-09 12:23:03.000001 | | 501 | 2018-12-09 12:23:03.000001 | 2038-01-19 03:14:07.999999 | +-----+----------------------------+----------------------------+ What do these results mean? Maybe row 500 has been deleted and row 501 has been added. Or maybe row 500 has been modified, and its id became 501. The timestamps suggest that the latter hypothesis is more likely, but there is no way to know that for sure. That is why, in my opinion, we need to be able to assume that UPDATEs never touch primary key values. Indexes Currently, the documentation says nothing about how temporal columns are indexed. However, my conclusion is that the valid_to column is appended to UNIQUE indexes and the primary key. My opinion is based on the results of some EXPLAIN commands, like the following: EXPLAIN SELECT email, valid_to FROM customer ORDER BY email \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: customer type: index possible_keys: NULL key: unq_email key_len: 59 ref: NULL rows: 4 Extra: Using where; Using index This means that the query only reads from a UNIQUE index, and not from table data – therefore, the index contains the email column. It is also able to use the index for sorting, which confirms that email is the first column (as expected). In this way, UNIQUE indexes don’t prevent the same value from appearing multiple times, but it will always be shown at different points in time. It can be a good idea to include valid_to or valid_from in some regular indexes, to optimize queries that use such columns for filtering results. Transaction-safe temporal tables Temporal columns contain timestamps that indicate when a row was INSERTed, UPDATEd, or DELETEd. So, when autocommit is not enabled, temporal columns don’t match the COMMIT time. For most use cases, this behaviour is desirable or at least acceptable. But there are cases when we want to only see committed data, to avoid data inconsistencies that were never seen by applications. To do so, we can create a history-precise temporal table. This only works with InnoDB – not with RocksDB or TokuDB, even if they support transactions. A history-precise temporal table doesn’t contain timestamps; instead, it contains the id’s of transactions that created and deleted each row version. If you know PostgreSQL, you are probably familiar with the xmin and xmax columns – it’s basically the same idea, except that in postgres at some point autovacuum will make old row versions disappear. Because of the similarity, for transaction-precise temporal tables, I like to call the temporal columns xmin and xmax. From this short description, the astute reader may already see a couple of problems with this approach: Temporal tables are based on transaction id’s or on timestamps, not both. There is no way to run a transaction-precise query to extract data that were present one hour ago. But think about it: even if it was possible, it would be at least problematic, because transactions are meant to be concurrent. Transaction id’s are written in the binary log, but such information is typically only accessible by DBAs. An analyst (someone who’s typically interested in temporal tables) has no access to transaction id’s. A partial workaround would be to query tables with columns like created_at and modified_at. We can run queries like this: SELECT created_at, xmin FROM some_table WHERE created_at >= '2018-05-05 16:00:00' ORDER BY created_at LIMIT 1; This will return the timestamp of the first row created since ‘2018-05-05 16:00:00’, as well as the id of the transaction which inserted it. While this approach could give us the information we need with a reasonable extra work, it’s possible that we don’t have such columns, or that rows are not inserted often enough in tables that have them. In this case, we can occasionally write in a table the current timestamp and the current transaction id. This should allow us to associate a transaction to the timestamp we are interested in. We cannot write all transaction id’s for performance reasons, so we can use two different approaches: Write the transaction id and the timestamp periodically, for example each minute. This will not create performance problems. On the other hand, we are arbitrarily deciding the granularity of our “log”. This could be acceptable or not. Write this information when certain events happen. For example when a product is purchased, or when a user changes their password. This will give us a very precise way to see the data as they appeared during critical events, but will not allow us to investigate with the same precision other types of events. Partitioning If we look at older implementations of temporary tables, in the world of proprietary databases (Db2, SQL Server, Oracle), they generally store historical data in a separate physical table or partition, sometimes called a history table. In MariaDB this doesn’t happen automatically or by default, leaving the choice to the user. However, it seems to me a good idea in the general case to create one or more partitions to store historical rows. The main reason is that, rarely, a query has to read both historical and current data, and reading only one partition is an interesting optimization. Excluding columns from versioning MariaDB allows us to exclude some columns from versioning. This means that if we update the values of those columns, we update the current row version in place rather than creating a new one. This is probably useful if a column is frequently updated and we don’t care about these changes. However, if we update more columns with one statement, and only a subset of them is excluded from versioning, a new row version is still created. All in all, the partial exclusion of some rows could be more confusing than useful in several cases. Replication 10.3 is a stable version, but it is still recent. Some of us adopt a new major version after some years, and we can even have reasons to stick with an old version. Furthermore, of course, many of us use MySQL, and MariaDB is not a drop-in replacement. But we can still enjoy temporal tables by adding a MariaDB 10.3 slave. I attached such a slave to older MariaDB versions, and to MySQL 5.6. In all tests, the feature behaved as expected. Initially, I was worried about replication lags. I assumed that, if replication lags, the slave applies the changes with a delay, and the timestamps in the tables are delayed accordingly. I am glad to say that I was wrong: the timestamps in temporal tables seem to match the ones in the binary log, so replication lags don’t affect their correctness. This is true both with row-based replication and with statement-based replication. A small caveat about temporal tables is that the version timestamps are only precise at second level. The fractional part should be ignored. You may have noticed this in the example at the beginning of this post. Backups For backups you will need to use mariabackup instead of xtrabackup. mysqldump can be used, not necessarily from a MariaDB distribution. However, it treats temporal tables as regular tables. It does not backup historical data. This is necessary because of a design choice: we cannot insert rows with timestamps in the past. This makes temporal tables much more reliable. Also, temporal tables are likely to be (or become) quite big, so a dump is probably not the best way to backup them. —Photo by Ashim D’Silva on Unsplash The post Some Notes on MariaDB system-versioned Tables appeared first on Percona Community Blog.