Evolution of the Data Lake
Book: Building the Data LakeHouse
Pages 5-18
Journey of data storage.
Data was stored in…
Paper Tape:
- Pro: Automated
- Limitation: stored very little data. Then…
Punch Cards:
- Pro: Stored more data than paper tape
- Limitation: Fixed Format, large data required large amount of paper. Dropping a stack of paper was painstaking to re-organize.
Magnetic Tape:
- Pro: Stored larger volumes of data not in fixed format.
- Limitations: Had to search entire file to find a single record.
Disk Storage:
- Pro: Stored even larger volumes of data. Could go to a record directly and not sequentially.
- Limitation: Initially very costly and not very available.
Data Integrity
With disk storage came the possibility to build computer applications. With many applications came the problem of data integrity. Lack of data Integrity means inability to find the single source of truth. Not being able to find which version of the data is the current and correct version.
Data WareHouse
Then enter the data warehouse. Data warehouse allowed applications data to be copied to a single location for processing.
Data warehouse needed its own infrastructure to make it useful.
Data warehouse also allowed for storage of historical data beyond a few months period. Historical data became an intellectual property because businesses realized they could use the past to predict the future.
Data Warehouse Infrastructure
The infrastructure of data warehouse includes.
- Metadata – Data location guide
- Data Model – Data abstractions
- Data Lineage – Data Origins and transformations
- Summarization – Data creation description
- KPIs – Key performance indicators location
- ETL – Automatic data transformations
Limitations of Data Warehouse
Data warehouse was designed with structured data in mind. But with the appearance of unstructured data, data warehouse limitations were exposed.
Examples of unstructured data
- Text data – Although this can still be organized in a structured format. It’s difficult to analyze due to the variety in language and because text makes no sense without context.
- Analog data / IoT data – think data from mechanical things like watches, phones, cameras, etc. Could be measurements of various degrees.
- Image data
- Audio data
- Video data
The last 3 types of unstructured data above has no form or structure, so they were not a good fit for the data warehouse.
Thanks for reading.
I hope this helped you. For more information,
Get FREE Tutorials, Free Books, plus other FREE Resources – https://www.machinelearningeducation.com/free
Follow me on twitter — https://twitter.com/evidencenmedia
Follow me on Linkedin — https://www.linkedin.com/in/evidencen/
Follow me on Github — https://github.com/EvidenceN
My Youtube Channel — https://www.youtube.com/evidencen
Support My Youtube Channel — https://www.youtube.com/channel/UCssd_k9oZ0CtC_jafMxSVOQ/join
Support my work directly and this blog — https://www.machinelearningeducation.com/support
Email/Contact me — https://evidencen.com/contact-me/
Thank you.
Leave a Comment