Study material generic cover image

Summary Big Data

Course
- Big Data
- Catal
- 2019 - 2020
- Wageningen University (Wageningen University, Wageningen)
- Animal Sciences
96 Flashcards & Notes
Scroll down to see the PDF preview!
PLEASE KNOW!!! There are just 96 flashcards and notes available for this material. This summary might not be complete. Please search similar or other summaries.
  • This summary
  • +380.000 other summaries
  • A unique study tool
  • A rehearsal system for this summary
  • Studycoaching with videos
Remember faster, study better. Scientifically proven.
Trustpilot Logo

A snapshot of the summary - Big Data

  • Big Data Concepts part 2

  • What 4 types of data processing modes are there?
    • Transaction processing
    • Batch processing
    • Real-time processing
    • Near real-time processing
  • What different deadlines are there in real-time processing?
    • Hard - missing a deadline is a total system failure.
    • Firm - infrequent misses are tolerable, but may degrade the systems quality of service. The usefulness of a result is zero after its deadline.
    • Soft - usefulness of result degrades after its deadline, thereby degrading the systems quality of service. 
  • What are the steps in data processing?
    1. Data acquisition
    2. Data staging
    3. Data analysis
    4. Application analysis
    5. Visualization
  • What is a data warehouse staging area?
    A temporary location where data from source systems is copied during the extract, transformation and load (ETL) process.
  • What is a data lake?
    • A data deposit that holds a vast amount of raw data in its native format, including structured, semi-structured and unstructured data.
    • Data structure and requirements are not defined until the data is needed
  • What are the characteristics of a data lake?
    • Retain all data
    • Support all data types
    • Support all users
    • Adapt easily to changes
  • Big Data Concepts part 1

  • What are possible sources for big data?
    • Web and social media data
    • Machina data
    • Sensing data
    • Transaction data
    • Internet of Things
  • How can you best manage unstructured data?
    Have it flow into a data lake in its raw format.
  • What is semi-structured data?
    • Falls between structured and unstructured data.
    • Form of structured data that does not conform with the formal structure of data models.
    • BUT contains tags or other markers to separate semantic elements and enforce hierarchies within the data.
    • Examples: mark-up languages XML, JSON, HTML twitter.
  • What types of metadata are there?
    • Structural metadata - indicates how compound objects are put togetherE.g. How pages are ordered from chapters.
    • Descriptive metadata - describes a resource for purposes such as discovery and identificationE.g. Elements such as title, author, keywords.
    • Administrative metadata - provides information to help manage a sourceE.g. When and how a file was created, who can access it etc.
PLEASE KNOW!!! There are just 96 flashcards and notes available for this material. This summary might not be complete. Please search similar or other summaries.
Read the full summary
This summary. +380.000 other summaries. A unique study tool. A rehearsal system for this summary. Studycoaching with videos.