Evolution of the Data Lake

Book: Building the Data LakeHouse

Pages 5-18

Journey of data storage.

Data was stored in…

Paper Tape:

  • Pro: Automated
  • Limitation: stored very little data. Then…

Punch Cards:

  • Pro: Stored more data than paper tape
  • Limitation: Fixed Format, large data required large amount of paper. Dropping a stack of paper was painstaking to re-organize.

Magnetic Tape:

  • Pro: Stored larger volumes of data not in fixed format.
  • Limitations: Had to search entire file to find a single record.

Disk Storage:

  • Pro: Stored even larger volumes of data. Could go to a record directly and not sequentially.
  • Limitation: Initially very costly and not very available.

Data Integrity

With disk storage came the possibility to build computer applications. With many applications came the problem of data integrity. Lack of data Integrity means inability to find the single source of truth. Not being able to find which version of the data is the current and correct version.

Data WareHouse

Then enter the data warehouse. Data warehouse allowed applications data to be copied to a single location for processing.

Data warehouse needed its own infrastructure to make it useful.

Data warehouse also allowed for storage of historical data beyond a few months period. Historical data became an intellectual property because businesses realized they could use the past to predict the future.

Data Warehouse Infrastructure

The infrastructure of data warehouse includes.

  • Metadata – Data location guide
  • Data Model – Data abstractions
  • Data Lineage – Data Origins and transformations
  • Summarization – Data creation description
  • KPIs – Key performance indicators location
  • ETL – Automatic data transformations

Limitations of Data Warehouse

Data warehouse was designed with structured data in mind. But with the appearance of unstructured data, data warehouse limitations were exposed.

Examples of unstructured data

  • Text data – Although this can still be organized in a structured format. It’s difficult to analyze due to the variety in language and because text makes no sense without context.
  • Analog data / IoT data – think data from mechanical things like watches, phones, cameras, etc. Could be measurements of various degrees.
  • Image data
  • Audio data
  • Video data

The last 3 types of unstructured data above has no form or structure, so they were not a good fit for the data warehouse.

Thanks for reading.

I hope this helped you. For more information,

Get FREE Tutorials, Free Books, plus other FREE Resources – https://www.machinelearningeducation.com/free

Follow me on twitter — https://twitter.com/evidencenmedia

Follow me on Linkedin — https://www.linkedin.com/in/evidencen/

Follow me on Github — https://github.com/EvidenceN

My Youtube Channel — https://www.youtube.com/evidencen

Support My Youtube Channel — https://www.youtube.com/channel/UCssd_k9oZ0CtC_jafMxSVOQ/join

Support my work directly and this blog — https://www.machinelearningeducation.com/support

Email/Contact me — https://evidencen.com/contact-me/

Thank you.

Leave a Comment

Scroll to Top