SchemaUtils Utility¶
mergeSchemas¶
mergeSchemas(
tableSchema: StructType,
dataSchema: StructType,
allowImplicitConversions: Boolean = false,
keepExistingType: Boolean = false,
fixedTypeColumns: Set[String] = Set.empty): StructType
mergeSchemas
...FIXME
mergeSchemas
is used when:
DeltaMergeInto
utility is used to resolveReferencesAndSchemaParquetTable
is requested to mergeSchemasInParallel and inferSchemaImplicitMetadataOperation
is requested to update a metadata (and mergeSchema)
Asserting Valid Column Names in NoMapping Mode¶
checkSchemaFieldNames(
schema: StructType,
columnMappingMode: DeltaColumnMappingMode): Unit
checkSchemaFieldNames
does nothing (and simply returns) for all the DeltaColumnMappingModes but NoMapping.
For NoMapping, checkSchemaFieldNames
explodes the nested field names and asserts that column names are valid. In case of a validation exception, checkSchemaFieldNames
throws a DeltaAnalysisException.
checkSchemaFieldNames
is used when:
OptimisticTransactionImpl
is requested to verify a new metadata- AlterTableAddColumnsDeltaCommand and AlterTableReplaceColumnsDeltaCommand are executed
Asserting Valid Column Names¶
checkFieldNames(
names: Seq[String]): Unit
checkFieldNames
throws an AnalysisException
when there is a column name (in names
) with one of the illegal characters:
,;{}()\n\t=
checkFieldNames
is used when:
OptimisticTransactionImpl
is requested to verify a new metadataSchemaUtils
is used to assert valid column names in NoMapping mode
Demo¶
import org.apache.spark.sql.delta.schema.SchemaUtils
val colName = "\n"
SchemaUtils.checkFieldNames(Seq(colName))
org.apache.spark.sql.AnalysisException: Column name " " contains invalid character(s). Please use alias to rename it.
at org.apache.spark.sql.errors.QueryCompilationErrors$.columnNameContainsInvalidCharactersError(QueryCompilationErrors.scala:2102)
at org.apache.spark.sql.delta.schema.SchemaUtils$.$anonfun$checkFieldNames$1(SchemaUtils.scala:908)
at org.apache.spark.sql.delta.schema.SchemaUtils$.$anonfun$checkFieldNames$1$adapted(SchemaUtils.scala:905)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.apache.spark.sql.delta.schema.SchemaUtils$.checkFieldNames(SchemaUtils.scala:905)
... 49 elided
findDependentGeneratedColumns¶
findDependentGeneratedColumns(
sparkSession: SparkSession,
targetColumn: Seq[String],
protocol: Protocol,
schema: StructType): Seq[StructField]
findDependentGeneratedColumns
...FIXME
findDependentGeneratedColumns
is used when:
AlterDeltaTableCommand
is requested to checkDependentExpressions
findColumnPosition¶
findColumnPosition(
column: Seq[String],
schema: StructType,
resolver: Resolver = DELTA_COL_RESOLVER): (Seq[Int], Int)
findColumnPosition
...FIXME
findColumnPosition
is used when: