URL
date
AI summary
slug
status
tags
summary
type

背景

前一篇文章分析了一个线上使用MyBatis批量插入导致内存溢出的问题,具体的分析过程这里不再赘述,感兴趣的同学可以去阅读原文。其中的一个原因是MyBatis在根据XML文件里的MappedStatement动态生成SQL的过程中,会保留很多不必要的空格和换行,导致拼接出来的SQL字符串比实际的要大得多,也就会占用更多的内存,更容易引发内存溢出。我计算了一下前一篇文章中的SQL的有效字符比例只有12%,也就是说88%的字符都是无效的空格和换行。本篇文章的主要目的,就是看看如何把这些不必要的空格和换行给消除掉。

分析

引发上一篇文章OOM的就是下面这个MappedStatement
<insert id="batchInsertSelective" keyColumn="id" keyProperty="list.id" parameterType="map" useGeneratedKeys="true"> insert into `file_task` ( <foreach collection="selective" item="column" separator=","> ${column.escapedColumnName} </foreach> ) values <foreach collection="list" item="item" separator=","> ( <foreach collection="selective" item="column" separator=","> <if test="'task_name'.toString() == column.value"> #{item.taskName,jdbcType=VARCHAR} </if> <if test="'task_type'.toString() == column.value"> #{item.taskType,jdbcType=TINYINT} </if> <if test="'task_tag'.toString() == column.value"> #{item.taskTag,jdbcType=VARCHAR} </if> <if test="'task_param'.toString() == column.value"> #{item.taskParam,jdbcType=VARCHAR} </if> <if test="'task_attach_param'.toString() == column.value"> #{item.taskAttachParam,jdbcType=VARCHAR} </if> <if test="'task_file_name'.toString() == column.value"> #{item.taskFileName,jdbcType=VARCHAR} </if> <if test="'curr_row'.toString() == column.value"> #{item.currRow,jdbcType=INTEGER} </if> <if test="'total_row'.toString() == column.value"> #{item.totalRow,jdbcType=INTEGER} </if> <if test="'success_num'.toString() == column.value"> #{item.successNum,jdbcType=INTEGER} </if> <if test="'update_num'.toString() == column.value"> #{item.updateNum,jdbcType=INTEGER} </if> <if test="'fail_num'.toString() == column.value"> #{item.failNum,jdbcType=INTEGER} </if> <if test="'result_file_url'.toString() == column.value"> #{item.resultFileUrl,jdbcType=VARCHAR} </if> <if test="'operator_id'.toString() == column.value"> #{item.operatorId,jdbcType=BIGINT} </if> <if test="'operator_name'.toString() == column.value"> #{item.operatorName,jdbcType=VARCHAR} </if> <if test="'task_status'.toString() == column.value"> #{item.taskStatus,jdbcType=TINYINT} </if> <if test="'valid_fail_desc'.toString() == column.value"> #{item.validFailDesc,jdbcType=VARCHAR} </if> <if test="'done_tm'.toString() == column.value"> #{item.doneTm,jdbcType=TIMESTAMP} </if> <if test="'upd_tm'.toString() == column.value"> #{item.updTm,jdbcType=TIMESTAMP} </if> <if test="'ins_tm'.toString() == column.value"> #{item.insTm,jdbcType=TIMESTAMP} </if> <if test="'tenant_id'.toString() == column.value"> #{item.tenantId,jdbcType=BIGINT} </if> <if test="'tenant_code'.toString() == column.value"> #{item.tenantCode,jdbcType=TINYINT} </if> <if test="'req_uri'.toString() == column.value"> #{item.reqUri,jdbcType=VARCHAR} </if> <if test="'send_msg_tm'.toString() == column.value"> #{item.sendMsgTm,jdbcType=TIMESTAMP} </if> <if test="'suffix'.toString() == column.value"> #{item.suffix,jdbcType=VARCHAR} </if> <if test="'env'.toString() == column.value"> #{item.env,jdbcType=VARCHAR} </if> </foreach> ) </foreach> </insert>
这个batchInsertSelective是通过com.itfsw:mybatis-generator-plugins:1.3.5插件结合mybatis-generator生成出来的,是为了实现批量插入时支持指定字段。
这里用到了两层foreach标签,第一层是遍历列表元素,第二层是遍历所有字段。而在遍历字段时,每次只会匹配到一个if标签,其余的都不满足,但是注意,第二层foreach里不仅有if标签,还有大量的换行和空格。所以当字段数量越多的情况下,换行和空格也会越多,再叠加上外层还有对列表元素的遍历,这些无用字符又会被放大很多倍。
为了让你看得更明显,我在下图标注出了换行和空格的位置。并且在DEBUG窗口中展示了上面这个MappedStatement的第二层foreach被解析出来的SqlNodeSqlNode可以理解成MappedStatement 里SQL主体部分的结构化表达,比如if标签会被转换成IfNode,而不包含${} 占位符的字符串区域则会被转换成StaticTextSqlNode。下图里你可以看到,上面这个MappedStatement 解析出来的SqlNodeStaticTextSqlNodeIfNode是交替出现的,而这一层的StaticTextSqlNode全部都是换行和空格。
notion image
那么我们是不是在这个SqlNode的生成上做文章?跟踪了一下源码,发现MappedStatement是在创建SqlSessionFactory 的时候生成的:首先会去扫描mapperLocations下面的XML文件并解析
// org.mybatis.spring.SqlSessionFactoryBean#buildSqlSessionFactory protected SqlSessionFactory buildSqlSessionFactory() throws IOException { // 省略部分代码 if (!isEmpty(this.mapperLocations)) { // 这里每一个mapperLocation代表一个xml文件 for (Resource mapperLocation : this.mapperLocations) { XMLMapperBuilder xmlMapperBuilder = new XMLMapperBuilder(mapperLocation.getInputStream(), configuration, mapperLocation.toString(), configuration.getSqlFragments()); xmlMapperBuilder.parse(); } } return this.sqlSessionFactoryBuilder.build(configuration); }
public class XMLMapperBuilder extends BaseBuilder { public void parse() { if (!configuration.isResourceLoaded(resource)) { configurationElement(parser.evalNode("/mapper")); configuration.addLoadedResource(resource); bindMapperForNamespace(); } configuration.parsePendingResultMaps(false); configuration.parsePendingCacheRefs(false); configuration.parsePendingStatements(false); } private void configurationElement(XNode context) { try { String namespace = context.getStringAttribute("namespace"); if (namespace == null || namespace.isEmpty()) { throw new BuilderException("Mapper's namespace cannot be empty"); } builderAssistant.setCurrentNamespace(namespace); cacheRefElement(context.evalNode("cache-ref")); cacheElement(context.evalNode("cache")); parameterMapElement(context.evalNodes("/mapper/parameterMap")); resultMapElements(context.evalNodes("/mapper/resultMap")); sqlElement(context.evalNodes("/mapper/sql")); // 下面这个方法处理select/insert/update/delete标签,生成MappedStatement buildStatementFromContext(context.evalNodes("select|insert|update|delete")); } catch (Exception e) { throw new BuilderException("Error parsing Mapper XML. The XML location is '" + resource + "'. Cause: " + e, e); } } private void buildStatementFromContext(List<XNode> list, String requiredDatabaseId) { // 这里的每一个XNode就是一个select/insert/update/delete标签范围内的内容 for (XNode context : list) { final XMLStatementBuilder statementParser = new XMLStatementBuilder(configuration, builderAssistant, context, requiredDatabaseId); try { // 这里会解析标签,生成MappedStatement statementParser.parseStatementNode(); } catch (IncompleteElementException e) { configuration.addIncompleteStatement(statementParser); } } } }
最终每一个select/update/insert/delete标签都会被解析成一个结构化的MappedStatement,并通过全局唯一的id保存到一个Map里:
// org.apache.ibatis.builder.xml.XMLStatementBuilder#parseStatementNode public void parseStatementNode() { // 省略无关紧要的代码 SqlSource sqlSource = langDriver.createSqlSource(configuration, context, parameterTypeClass); builderAssistant.addMappedStatement(id, sqlSource, statementType, sqlCommandType, fetchSize, timeout, parameterMap, parameterTypeClass, resultMap, resultTypeClass, resultSetTypeEnum, flushCache, useCache, resultOrdered, keyGenerator, keyProperty, keyColumn, databaseId, langDriver, resultSets, dirtySelect); } // org.apache.ibatis.builder.MapperBuilderAssistant#addMappedStatement public MappedStatement addMappedStatement(String id, SqlSource sqlSource, StatementType statementType, SqlCommandType sqlCommandType, Integer fetchSize, Integer timeout, String parameterMap, Class<?> parameterType, String resultMap, Class<?> resultType, ResultSetType resultSetType, boolean flushCache, boolean useCache, boolean resultOrdered, KeyGenerator keyGenerator, String keyProperty, String keyColumn, String databaseId, LanguageDriver lang, String resultSets, boolean dirtySelect) { if (unresolvedCacheRef) { throw new IncompleteElementException("Cache-ref not yet resolved"); } id = applyCurrentNamespace(id, false); MappedStatement.Builder statementBuilder = new MappedStatement.Builder(configuration, id, sqlSource, sqlCommandType) .resource(resource).fetchSize(fetchSize).timeout(timeout).statementType(statementType) .keyGenerator(keyGenerator).keyProperty(keyProperty).keyColumn(keyColumn).databaseId(databaseId).lang(lang) .resultOrdered(resultOrdered).resultSets(resultSets) .resultMaps(getStatementResultMaps(resultMap, resultType, id)).resultSetType(resultSetType) .flushCacheRequired(flushCache).useCache(useCache).cache(currentCache).dirtySelect(dirtySelect); ParameterMap statementParameterMap = getStatementParameterMap(parameterMap, parameterType, id); if (statementParameterMap != null) { statementBuilder.parameterMap(statementParameterMap); } MappedStatement statement = statementBuilder.build(); configuration.addMappedStatement(statement); return statement; } // org.apache.ibatis.session.Configuration#addMappedStatement public void addMappedStatement(MappedStatement ms) { mappedStatements.put(ms.getId(), ms); }
MappedStatement里有一个很重要的属性SqlSourceSqlNode也是在创建SqlSource 的时候生成的,通过SqlSource我们可以获取SQL信息。因为MyBatis支持动态SQL的特性(比如ifchooseforeach标签),所以用SqlSource做了统一抽象。SqlSource有以下几种实现:
notion image
  • DynamicSqlSource:针对动态SQL和${}占位符的SQL,需要在Mapper调用时才能确定最终的SQL语句
  • RawSqlSource:针对#{}占位符的SQL,在解析XML配置时就能确定的SQL信息
  • ProviderSqlSource:针对@*Provider注解提供的SQL
  • StaticSqlSource:经过ProviderSqlSource、DynamicSqlSource或者RawSqlSource解析后得到的静态SQL信息,不含各种占位符,最多只会有
其中,会产生大量空格和换行的主要是DynamicSqlSource,特别是在搭配foreach标签的使用场景下,更会放大空格和换行的数量。下面的代码展示了MyBatis是如何解析XML里的动态标签生成SqlNode
public class XMLLanguageDriver implements LanguageDriver { @Override public SqlSource createSqlSource(Configuration configuration, XNode script, Class<?> parameterType) { XMLScriptBuilder builder = new XMLScriptBuilder(configuration, script, parameterType); return builder.parseScriptNode(); } }
public class XMLScriptBuilder extends BaseBuilder { public SqlSource parseScriptNode() { MixedSqlNode rootSqlNode = parseDynamicTags(context); SqlSource sqlSource; if (isDynamic) { sqlSource = new DynamicSqlSource(configuration, rootSqlNode); } else { sqlSource = new RawSqlSource(configuration, rootSqlNode, parameterType); } return sqlSource; } protected MixedSqlNode parseDynamicTags(XNode node) { List<SqlNode> contents = new ArrayList<>(); NodeList children = node.getNode().getChildNodes(); for (int i = 0; i < children.getLength(); i++) { XNode child = node.newXNode(children.item(i)); if (child.getNode().getNodeType() == Node.CDATA_SECTION_NODE || child.getNode().getNodeType() == Node.TEXT_NODE) { String data = child.getStringBody(""); TextSqlNode textSqlNode = new TextSqlNode(data); if (textSqlNode.isDynamic()) { contents.add(textSqlNode); isDynamic = true; } else { contents.add(new StaticTextSqlNode(data)); } } else if (child.getNode().getNodeType() == Node.ELEMENT_NODE) { // issue #628 String nodeName = child.getNode().getNodeName(); NodeHandler handler = nodeHandlerMap.get(nodeName); if (handler == null) { throw new BuilderException("Unknown element <" + nodeName + "> in SQL statement."); } handler.handleNode(child, contents); isDynamic = true; } } return new MixedSqlNode(contents); } }
上面parseDynamicTags其实是一段递归解析,普通的文本会被解析成StaticTextSqlNode,包含${} 占位符的文本会被解析成TextSqlNode,而其他的标签都会用各自对应的NodeHandler来解析,当然解析到最后也都会是StaticTextSqlNode/TextSqlNode,最终全部解析完会生成一个MixedSqlNode。可以看下图有一个直观的感受,不过下图忽略了空格和换行带来的StaticTextSqlNode
notion image
上述的这些过程都是在Spring容器启动创建SqlSessionFactory时进行的。而等到真正调用的时候,需要指定对应的MappedStatement 和调用参数,MyBatis会根据调用参数和MixedSqlNode 做判断和组装,最终产生一个BoundSql对象,包含完整的SQL以及参数信息:
public class DynamicSqlSource implements SqlSource { private final Configuration configuration; private final SqlNode rootSqlNode; public DynamicSqlSource(Configuration configuration, SqlNode rootSqlNode) { this.configuration = configuration; this.rootSqlNode = rootSqlNode; } @Override public BoundSql getBoundSql(Object parameterObject) { // 第一步,先根据解析好的SqlNode拼接生成带#{}占位符的sql DynamicContext context = new DynamicContext(configuration, parameterObject); rootSqlNode.apply(context); // 第二步,创建SqlSourceBuilder来把#{}占位符替换成PreparedStatement里的问号,并提取出对应占位参数的信息 SqlSourceBuilder sqlSourceParser = new SqlSourceBuilder(configuration); Class<?> parameterType = parameterObject == null ? Object.class : parameterObject.getClass(); // 这里返回的永远是StaticSqlSource,里面有个属性List<ParameterMapping>,用来保存参数信息 SqlSource sqlSource = sqlSourceParser.parse(context.getSql(), parameterType, context.getBindings()); // 第三步,根据执行参数构建出BoundSql BoundSql boundSql = sqlSource.getBoundSql(parameterObject); // 第四步,把上下文中的绑定参数复制到BoundSql里 context.getBindings().forEach(boundSql::setAdditionalParameter); return boundSql; } }

彻底消除不必要的空格和换行

至此,我们搞清楚了MyBatis MappedStatement 的构建过程以及执行过程。但是过程中的空白和换行我们还没有把它消灭掉。在探究源码的过程中,我发现了MyBatis自带了一个配置项:shrinkWhitespacesInSql,它就是用来去除SQL中的空白和换行的,由于是基于StringTokenizer实现的,默认的分隔符是 \\t\\n\\r\\f,所以这些字符都会被干掉
// org.apache.ibatis.builder.SqlSourceBuilder#parse public SqlSource parse(String originalSql, Class<?> parameterType, Map<String, Object> additionalParameters) { ParameterMappingTokenHandler handler = new ParameterMappingTokenHandler(configuration, parameterType, additionalParameters); GenericTokenParser parser = new GenericTokenParser("#{", "}", handler); String sql; if (configuration.isShrinkWhitespacesInSql()) { sql = parser.parse(removeExtraWhitespaces(originalSql)); } else { sql = parser.parse(originalSql); } return new StaticSqlSource(configuration, sql, handler.getParameterMappings()); } public static String removeExtraWhitespaces(String original) { StringTokenizer tokenizer = new StringTokenizer(original); StringBuilder builder = new StringBuilder(); boolean hasMoreTokens = tokenizer.hasMoreTokens(); while (hasMoreTokens) { builder.append(tokenizer.nextToken()); hasMoreTokens = tokenizer.hasMoreTokens(); if (hasMoreTokens) { builder.append(' '); } } return builder.toString(); }
看起来好像满足了我们的需求。但是我们的主要目的是为了减少内存的占用,防止内存溢出。这样能把我们前一篇文章提到的大对象都消除吗?我们来贴一下上一篇文章的大对象:
notion image
我们发现,shrinkWhitespacesInSql是在org.apache.ibatis.builder.SqlSourceBuilder.parse里,那么DynamicContext里的StringBuilder以及生成出来的originalSql,这两个对象的大小并没有减小,还是包含了大量的空格和换行。所以这个配置应该可以节约一半的内存,那另外一半有没有办法也节约掉呢?看起来这个必须要从SqlNode拼接SQL的地方开始入手。最终我们发现SqlNode分为两类:
  • 一类是StaticTextSqlNodeTextSqlNode,它们会拼接具体Sql
  • 另一类是其他SqlNode,基本只做逻辑,不拼接具体SQL,比如IfSqlNodeChooseSqlNodeForeachSqlNode也只拼了开闭字符,其他都还是通过第一类节点来拼接SQL
看起来通过重写了StaticTextSqlNode ,并且对其中的空格和换行做处理就可以达到我们想要的效果:
package org.apache.ibatis.scripting.xmltags; public class StaticTextSqlNode implements SqlNode { private final String text; public StaticTextSqlNode(String text) { this.text = text; } @Override public boolean apply(DynamicContext context) { String replaced = text.replaceAll("^\\\\n\\\\s+", "").replaceAll("[\\\\n\\\\s]+", " "); context.appendSql(replaced); return true; } }

MyBatis的SQL日志

为什么MyBatis打印SQL日志的时候并没有输出换行和空格?原来是采用了和shrinkWhitespacesInSql一样的处理:
// org.apache.ibatis.logging.jdbc.ConnectionLogger#invoke @Override public Object invoke(Object proxy, Method method, Object[] params) throws Throwable { try { if (Object.class.equals(method.getDeclaringClass())) { return method.invoke(this, params); } if ("prepareStatement".equals(method.getName()) || "prepareCall".equals(method.getName())) { if (isDebugEnabled()) { // 这里对sql进行了特殊处理 debug(" Preparing: " + removeExtraWhitespace((String) params[0]), true); } PreparedStatement stmt = (PreparedStatement) method.invoke(connection, params); return PreparedStatementLogger.newInstance(stmt, statementLog, queryStack); } if ("createStatement".equals(method.getName())) { Statement stmt = (Statement) method.invoke(connection, params); return StatementLogger.newInstance(stmt, statementLog, queryStack); } return method.invoke(connection, params); } catch (Throwable t) { throw ExceptionUtil.unwrapThrowable(t); } } protected String removeExtraWhitespace(String original) { return SqlSourceBuilder.removeExtraWhitespaces(original); } // org.apache.ibatis.builder.SqlSourceBuilder#removeExtraWhitespaces public static String removeExtraWhitespaces(String original) { StringTokenizer tokenizer = new StringTokenizer(original); StringBuilder builder = new StringBuilder(); boolean hasMoreTokens = tokenizer.hasMoreTokens(); while (hasMoreTokens) { builder.append(tokenizer.nextToken()); hasMoreTokens = tokenizer.hasMoreTokens(); if (hasMoreTokens) { builder.append(' '); } } return builder.toString(); }

重写MyBatis的SQL

之前有好几个场景都需要重写SQL,虽然都通过查资料实现了,但是不是很了解其原理,这里在研究了源码之后才明白了之前为什么要这么写。
场景主要是2个:
  1. 权限控制,需要根据当前人的权限拼接查询限制条件
  1. 敏感字段加解密,需要在查询时解密,新增/修改时加密
首先肯定要用到MyBatis的Plugin拦截Executor 里对应的方法,比如:
public interface Executor { int update(MappedStatement ms, Object parameter) throws SQLException; <E> List<E> query(MappedStatement ms, Object parameter, RowBounds rowBounds, ResultHandler resultHandler, CacheKey cacheKey, BoundSql boundSql) throws SQLException; <E> List<E> query(MappedStatement ms, Object parameter, RowBounds rowBounds, ResultHandler resultHandler) throws SQLException; }
但是你可以看到,这里不管是update还是query,接收的参数都是MappedStatement和执行参数。而根据前面的介绍,MappedStatement 是启动的时候就初始化好的,并且是静态的内容,所以肯定不能修改MappedStatement。那么只能复制一个新的MappedStatement,并替换里面的SQL ,让Executor在执行时拿到我们新创建的MappedStatement 。不过MappedStatement 并没有提供现成的复制方法,我们需要手动复制,并且其中的BoundSql 也需要复制一份,因为它里面的属性都是用final修饰的,无法修改
@Intercepts({@Signature(type = Executor.class, method = "query", args = {MappedStatement.class, Object.class, RowBounds.class, ResultHandler.class}),}) public class PermissionPlugin implements Interceptor { @Override public Object intercept(Invocation invocation) throws Throwable { Object[] args = invocation.getArgs(); MappedStatement mappedStatement = (MappedStatement) args[0]; BoundSql boundSql = mappedStatement.getBoundSql(args[1]); String sql = boundSql.getSql(); String newSql = SQLUtils .addCondition(sql, String.format("env in (%s)", StringUtils.join(envs, ',')), JdbcConstants.MYSQL); // 复制BoundSql并修改 BoundSql newBoundSql = new BoundSql(mappedStatement.getConfiguration(), newSql, boundSql.getParameterMappings(), boundSql.getParameterObject()); for (ParameterMapping parameterMapping : boundSql.getParameterMappings()) { String property = parameterMapping.getProperty(); boolean hasAdditionalParameter = boundSql.hasAdditionalParameter(property); if (hasAdditionalParameter) { newBoundSql.setAdditionalParameter(property, boundSql.getAdditionalParameter(property)); } } // copy原始MappedStatement的各项属性 MappedStatement.Builder builder = new MappedStatement.Builder( mappedStatement.getConfiguration(), mappedStatement.getId(), new WarpBoundSqlSqlSource(newBoundSql), mappedStatement.getSqlCommandType()); builder.cache(mappedStatement.getCache()).databaseId(mappedStatement.getDatabaseId()) .fetchSize(mappedStatement.getFetchSize()) .flushCacheRequired(mappedStatement.isFlushCacheRequired()) .keyColumn(StringUtils.join(mappedStatement.getKeyColumns(), ',')) .keyGenerator(mappedStatement.getKeyGenerator()) .keyProperty(StringUtils.join(mappedStatement.getKeyProperties(), ',')) .lang(mappedStatement.getLang()).parameterMap(mappedStatement.getParameterMap()) .resource(mappedStatement.getResource()).resultMaps(mappedStatement.getResultMaps()) .resultOrdered(mappedStatement.isResultOrdered()) .resultSets(StringUtils.join(mappedStatement.getResultSets(), ',')) .resultSetType(mappedStatement.getResultSetType()) .statementType(mappedStatement.getStatementType()) .timeout(mappedStatement.getTimeout()).useCache(mappedStatement.isUseCache()); MappedStatement newMappedStatement = builder.build(); // 将新生成的MappedStatement对象替换到参数列表中 args[0] = newMappedStatement; return invocation.proceed(); } } }

参考

  1. Mybatis—SqlSource与BoundSql
  1. 【手撕MyBatis源码】动态SQL全流程解析
  1. MyBatis之Mapper内部组成
  1. Mybatis源码解析:参数处理器是如何兼容这么多种类型的参数?
  1. Remove blank lines in SQL
    Updated May 18, 2020
惊!一条SQL竟有如此威力MyBatis批量插入导致内存溢出问题分析