Skip to content

CDCReaderImpl

getCDCRelation

getCDCRelation(
  spark: SparkSession,
  snapshotToUse: Snapshot,
  isTimeTravelQuery: Boolean,
  conf: SQLConf,
  options: CaseInsensitiveStringMap): BaseRelation

getCDCRelation...FIXME


getCDCRelation is used when:

changesToBatchDF

changesToBatchDF(
  deltaLog: DeltaLog,
  start: Long,
  end: Long,
  spark: SparkSession,
  readSchemaSnapshot: Option[Snapshot] = None,
  useCoarseGrainedCDC: Boolean = false): DataFrame

changesToBatchDF...FIXME


changesToBatchDF is used when:

changesToDF

changesToDF(
  readSchemaSnapshot: SnapshotDescriptor,
  start: Long,
  end: Long,
  changes: Iterator[(Long, Seq[Action])],
  spark: SparkSession,
  isStreaming: Boolean = false,
  useCoarseGrainedCDC: Boolean = false): CDCVersionDiffInfo

changesToDF getTimestampsByVersion.

changesToDF requests the DeltaLog (of the given SnapshotDescriptor) for the Snapshot at the given start.

changesToDF asserts that one of the following is enabled (or throws a DeltaAnalysisException):

useCoarseGrainedCDC flag is disabled by default

It is a fairly dangerous assertion given useCoarseGrainedCDC flag is disabled by default.

changesToDF...FIXME

In the end, changesToDF creates a new CDCVersionDiffInfo.


changesToDF is used when:

getDeletedAndAddedRows

getDeletedAndAddedRows(
  addFileSpecs: Seq[CDCDataSpec[AddFile]],
  removeFileSpecs: Seq[CDCDataSpec[RemoveFile]],
  deltaLog: DeltaLog,
  snapshot: SnapshotDescriptor,
  isStreaming: Boolean,
  spark: SparkSession): Seq[DataFrame]

getDeletedAndAddedRows...FIXME

processDeletionVectorActions

processDeletionVectorActions(
  addFilesMap: Map[FilePathWithTableVersion, AddFile],
  removeFilesMap: Map[FilePathWithTableVersion, RemoveFile],
  versionToCommitInfo: Map[Long, CommitInfo],
  deltaLog: DeltaLog,
  snapshot: SnapshotDescriptor,
  isStreaming: Boolean,
  spark: SparkSession): Seq[DataFrame]

processDeletionVectorActions...FIXME

generateFileActionsWithInlineDv

generateFileActionsWithInlineDv(
  add: AddFile,
  remove: RemoveFile,
  dvStore: DeletionVectorStore,
  deltaLog: DeltaLog): Seq[FileAction]

generateFileActionsWithInlineDv...FIXME

scanIndex

scanIndex(
  spark: SparkSession,
  index: TahoeFileIndexWithSnapshotDescriptor,
  isStreaming: Boolean = false): DataFrame

scanIndex...FIXME


scanIndex is used when: