Skip to content

SchemaUtils Utility

mergeSchemas

mergeSchemas(
  tableSchema: StructType,
  dataSchema: StructType,
  allowImplicitConversions: Boolean = false,
  keepExistingType: Boolean = false,
  fixedTypeColumns: Set[String] = Set.empty): StructType

mergeSchemas...FIXME

mergeSchemas is used when:

Asserting Valid Column Names in NoMapping Mode

checkSchemaFieldNames(
  schema: StructType,
  columnMappingMode: DeltaColumnMappingMode): Unit

checkSchemaFieldNames does nothing (and simply returns) for all the DeltaColumnMappingModes but NoMapping.

For NoMapping, checkSchemaFieldNames explodes the nested field names and asserts that column names are valid. In case of a validation exception, checkSchemaFieldNames throws a DeltaAnalysisException.

checkSchemaFieldNames is used when:

Asserting Valid Column Names

checkFieldNames(
  names: Seq[String]): Unit

checkFieldNames throws an AnalysisException when there is a column name (in names) with one of the illegal characters:

 ,;{}()\n\t=

checkFieldNames is used when:

Demo

import org.apache.spark.sql.delta.schema.SchemaUtils
val colName = "\n"
SchemaUtils.checkFieldNames(Seq(colName))
org.apache.spark.sql.AnalysisException:  Column name " " contains invalid character(s). Please use alias to rename it.
  at org.apache.spark.sql.errors.QueryCompilationErrors$.columnNameContainsInvalidCharactersError(QueryCompilationErrors.scala:2102)
  at org.apache.spark.sql.delta.schema.SchemaUtils$.$anonfun$checkFieldNames$1(SchemaUtils.scala:908)
  at org.apache.spark.sql.delta.schema.SchemaUtils$.$anonfun$checkFieldNames$1$adapted(SchemaUtils.scala:905)
  at scala.collection.immutable.List.foreach(List.scala:431)
  at org.apache.spark.sql.delta.schema.SchemaUtils$.checkFieldNames(SchemaUtils.scala:905)
  ... 49 elided

findDependentGeneratedColumns

findDependentGeneratedColumns(
  sparkSession: SparkSession,
  targetColumn: Seq[String],
  protocol: Protocol,
  schema: StructType): Seq[StructField]

findDependentGeneratedColumns...FIXME

findDependentGeneratedColumns is used when:

findColumnPosition

findColumnPosition(
  column: Seq[String],
  schema: StructType,
  resolver: Resolver = DELTA_COL_RESOLVER): (Seq[Int], Int)

findColumnPosition...FIXME

findColumnPosition is used when: