SchemaUtils Utility¶
mergeSchemas¶
mergeSchemas(
tableSchema: StructType,
dataSchema: StructType,
allowImplicitConversions: Boolean = false,
keepExistingType: Boolean = false,
fixedTypeColumns: Set[String] = Set.empty): StructType
mergeSchemas...FIXME
mergeSchemas is used when:
DeltaMergeIntoutility is used to resolveReferencesAndSchemaParquetTableis requested to mergeSchemasInParallel and inferSchemaImplicitMetadataOperationis requested to update a metadata (and mergeSchema)
Asserting Valid Column Names in NoMapping Mode¶
checkSchemaFieldNames(
schema: StructType,
columnMappingMode: DeltaColumnMappingMode): Unit
checkSchemaFieldNames does nothing (and simply returns) for all the DeltaColumnMappingModes but NoMapping.
For NoMapping, checkSchemaFieldNames explodes the nested field names and asserts that column names are valid. In case of a validation exception, checkSchemaFieldNames throws a DeltaAnalysisException.
checkSchemaFieldNames is used when:
OptimisticTransactionImplis requested to verify a new metadata- AlterTableAddColumnsDeltaCommand and AlterTableReplaceColumnsDeltaCommand are executed
Asserting Valid Column Names¶
checkFieldNames(
names: Seq[String]): Unit
checkFieldNames throws an AnalysisException when there is a column name (in names) with one of the illegal characters:
,;{}()\n\t=
checkFieldNames is used when:
OptimisticTransactionImplis requested to verify a new metadataSchemaUtilsis used to assert valid column names in NoMapping mode
Demo¶
import org.apache.spark.sql.delta.schema.SchemaUtils
val colName = "\n"
SchemaUtils.checkFieldNames(Seq(colName))
org.apache.spark.sql.AnalysisException: Column name " " contains invalid character(s). Please use alias to rename it.
at org.apache.spark.sql.errors.QueryCompilationErrors$.columnNameContainsInvalidCharactersError(QueryCompilationErrors.scala:2102)
at org.apache.spark.sql.delta.schema.SchemaUtils$.$anonfun$checkFieldNames$1(SchemaUtils.scala:908)
at org.apache.spark.sql.delta.schema.SchemaUtils$.$anonfun$checkFieldNames$1$adapted(SchemaUtils.scala:905)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.apache.spark.sql.delta.schema.SchemaUtils$.checkFieldNames(SchemaUtils.scala:905)
... 49 elided
findDependentGeneratedColumns¶
findDependentGeneratedColumns(
sparkSession: SparkSession,
targetColumn: Seq[String],
protocol: Protocol,
schema: StructType): Seq[StructField]
findDependentGeneratedColumns...FIXME
findDependentGeneratedColumns is used when:
AlterDeltaTableCommandis requested to checkDependentExpressions
findColumnPosition¶
findColumnPosition(
column: Seq[String],
schema: StructType,
resolver: Resolver = DELTA_COL_RESOLVER): (Seq[Int], Int)
findColumnPosition...FIXME
findColumnPosition is used when: