HadoopFileLinesReader is a Scala Iterator of Apache Hadoop's

HadoopFileLinesReader is <> to access datasets in the following data sources:

  • SimpleTextSource
  • LibSVMFileFormat
  • TextInputCSVDataSource
  • TextInputJsonDataSource
  • TextFileFormat

HadoopFileLinesReader uses the internal <> that handles accessing files using Hadoop's FileSystem API.

Creating Instance

HadoopFileLinesReader takes the following when created:

=== [[iterator]] iterator Internal Property

[source, scala]

iterator: RecordReaderIterator[Text]

When <>, HadoopFileLinesReader creates an internal iterator that uses Hadoop's[org.apache.hadoop.mapreduce.lib.input.FileSplit] with Hadoop's[org.apache.hadoop.fs.Path] and <>.

iterator creates Hadoop's TaskAttemptID, TaskAttemptContextImpl and LineRecordReader.

iterator initializes LineRecordReader and passes it on to a RecordReaderIterator.

NOTE: iterator is used for Iterator-specific methods, i.e. hasNext, next and close.