By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.
What Is Data? Can there be a single comprehensive definition for it? Or is there a way that many definitions can possibly be summarized? In very informal terms, data enables a business or an individual to achieve desirable outcomes by knowing what we know and uncovering what we do not know, yet.
In more formal terms, data is known facts that have implicit meaning. In other words, data is a collection of facts, such as: Numbers or numerical values Quantities or measurements Recorded observations about objects Descriptions of objects For example, details about employees such as last name, first name, age, number of years of experience, and current pay are data about the employee. For a type a of car produced, quantity, colors, and variations are data about the vehicle. The word data is related to the word datum. Datum is singular (a single piece of information), whereas data is plural. However, you are likely to hear the term data used to describe both discrete and multiple pieces of information. The Importance of Data Basically, data is information. Information can be described as everything around us—everything we see, hear, or can sense by way of speech, touch, smell, taste, and so on. When we collect information and record it, it becomes data. Data is one of the most valuable assets in many facets of life. Data has become the single most precious resource and has been leveraged very well by organizations of all sizes to their advantage—both in terms of monetization and in getting an edge over the competition.
Data comes in various formats and forms. Moreover, where and how data is stored and utilized are important aspects of the data life cycle. Think about medicine, science, engineering, economics, and many streams of our daily life where data is being collected on an ongoing basis. The transactions you make with your bank using their (or a third-party) payment gateway and the purchases you make online reveal a lot about you and your persona to interested parties. Organizations want to know what products you browsed and bought, your spending capacity, what brands you like most, and many other facts that become apparent through the way you go about a purchase. Banks and e-commerce merchants would like to leverage this type of information in order to post advertisements that capture your interest.
In another realm, the information captured by performing medical experiments in a lab is vital to the success of new life-saving vaccines and drugs. Unless researchers know genetic information about a pathogen, they are not adequately empowered to perform research on the pathogen. In addition, space exploration has given us a lot of data to work with, and today humans understand more than ever before about the vast space, galaxies, stars in our solar system, neighboring solar systems, and much more. Space probes such as Voyager I have provided immensely helpful insights about the vast space beyond our reach. Not all data is created or acquired equally. Data often includes noise (unwanted information), gaps (missing information), and duplication (repeated or redundant information)—in other words, inconsistencies. Further, data can be structured, semi-structured, or unstructured in nature. What Is the Importance of Data? If some businesses did not have data at their disposal, they would not be able to function properly. For example, without the right data around demand and supply, a retail organization would not know how much stock to have at each store to meet demand. For other businesses, data is a way of monetization and without appropriate data, they would be less effective. For example, a Facebook influencer would not be much of an influencer without the right data around the things they want to influence about. Subscribers would only follow and subscribe when they saw value in the information being given. In some organizations, data actually is the business. For example, big entertainment houses run on metrics about what people like to see (drama, action, romance, comedy, and so on). Unless they know what their audience is aching for, they cannot deliver, and if they do not deliver, they lose business. For these organizations, data is essential, and without access to the right data insights, they will crumble. These examples should give you an idea of the importance of data in today’s world. Many case studies and TV series have been created about data, and you can browse Google to find them. What Are the Sources of Data?
Where is data generated? That is, what are the sources of data? The answer, surprisingly, is very straightforward: Data is generated by almost everything around us. Every single electrical and electronic system is capable of generating data. For example, data is generated by computers, vehicles, household appliances, fitness devices, communication devices, electrical grids, POS machines, cloud instances, RFID systems, and HVAC systems, just to name a few. Any analog or digital system is capable of producing data.
What data is useful to you? Is the data being generated by an electric grid of any use to you, or is the data from your own house’s smart power meters more important to you? Is the data being generated by your car’s tire pressure sensor more important than the data being transmitted by the radio station about the weather in the upcoming week? Getting to know crucial information by way of data is not just for commercial purposes but can very well be lifesaving. The following are some of the potential sources of data for individuals and organizations: Personal electronic gadgets, such as phones, smart devices, and wearables - Smart home electrical appliances Smart vehicles Smart meters Health devices E-commerce or banking transactions Website transactions Cloud data storage Clinical research Online and in-person surveys Protected health information (PHI) Data Expansion over the Past Few Decades The digital footprint of data has grown incredibly in the past couple of decades. Popular search engines have made data much more accessible. Advancements in technology such as mobility and the advent of the cloud have increased the demand for data in individuals’ lives and in organizations’ decision making. In the past, data sources were many, data was siloed, and not a lot happened without cooperative efforts of various groups working together. Now, with online and cloud-hosted databases and data warehouses, the availability of meaningful data has increased dramatically. As storage costs have come down over the past few decades—especially with the advent of the cloud in the early 2010s—the amount of data being generated and stored has grown exponentially. Over the past few years, the flexibility and varied offerings of cloud platform providers have enabled organizations to build and leverage complex databases and data warehouses where data from numerous data sources can coexist. Private and public cloud architectures offer a lot more than could previously be accomplished from both data generation and consumption viewpoints. Your wearables can transmit directly to a cloud server leveraging wireless or mobile connectivity (LTE/4G/5G), and the data from hundreds of thousands of transmissions can be processed in the cloud, leading to insights into health metrics! This is just one application of generating viable data and making sense of it using some form of visualization. Data Terminology This section covers the basic terminology pertinent to data across a vast range of topics, including data analysis, data analytics, data mining, and data warehousing. The purpose is to make you comfortable with some key terms and their meaning in the context of real-life data collection, (pre)processing, storage, analysis, visualization, and many other aspects. Again, the topics covered here are introductory and are covered in more on Fatskills.
To keep the examples in this section streamlined, consider a fictitious mining company called Mining The World (MTW) to describe these terms: - Dataset: A dataset is a group or structured collection of related data that shares the same set of attributes or properties as other data in the same dataset. For example, MTW can leverage geospatial locations stored in a comma-separated values (CSV) file for undersea mining operations. - Data analysis: Data analysis is the process of examining available data artifacts (or datasets) to discover facts, relationships, insights, trends, or patterns in order to support better decision making. For example, MTW can leverage data analysis to analyze locations for future mining operations. - Data analytics: Data analytics encompasses data life cycle management across different phases, such as data collection, cleansing, normalization, organization, analysis, storage, and governance. For example, MTW can run analytics on datasets available from multiple locations and derive meaningful information about the specific locations for mining the precious gems. - Data governance: Data governance includes people, processes, and technologies to ensure the integrity of data and leading practices for data management. For example, MTW can appoint a chief data officer (CDO) to ensure that its data initiatives are driven strategically and that only relevant employees have access to raw or processed data. - Data mining: Data mining is the process of analyzing massive volumes of data (or datasets) to detect patterns and relevant points that can be leveraged by organizations to drive unbiased and intelligent decision making. For example, MTW could leverage data mining to focus on proactive maintenance of field machinery based on the number of hours of usage and prevent loss of revenue due to breakdowns. - Data model: A data model focuses on the relationships among different data types and the various ways in which data can be grouped and organized as well as its formats and attributes. For example, MTW could process multiple data models across oil and gas mining as well as precious gem mining to ascertain that the geographic areas of maximum impact in terms of mining capacity are explored. - Data structure: A data structure is a format for organizing, processing, storing, and retrieving data. A common example is arrays where one or more items that have similar data type are stored. - Data visualization: Data visualization is the process whereby data is represented in a graphical format to provide insights about key findings or data points. Common examples are pie charts, graphs, and maps generated based on data analytics. For example, MTW can generate a report summary with pie charts on successful efforts and funding for finding and digging new resources in mountains. - Data warehouse: A data warehouse enables organizations to collate data sources and leverage the collected data repository to make informed business decisions by performing data analytics. For example, MTW can leverage an on-premises or cloud-based data warehouse to get insights into areas of investment where technology for mining can be improved with minimal disruption to ongoing mining operations. This would lead to massive savings based on reducing the time to mine and ship products to end consumers. - Database: A database is an organized collection of information that can be queried against to yield results. MTW can have one or more (relational or non-relational) databases to store information where the queries can be run to extract relevant information, such as customer or employee records. Databases are an important source of information on customer or employee records and transactions.
Join 4M+ learners. Unlock unlimited quizzes, wrong-answer tracking, flashcards + reminders, study guides, and 1-on-1 challenges.