Data Management
Database management system (DBMS) uses the data models specifications in establishing consistency in the data management. To this end a data model must have three basic components,
Data feature types to form a data structure that describe the data,
Rules that establish the accuracy and maintain the structures and/or operations for validating data
Data operators that process data structures that allow data manipulation.
A spatial database is a type of database that is designed to store and manage spatial data, which represents objects defined in a geometric space.
Data Model
A data model is a way of demonstrating the framework or structure and relationships of data in a database. Utilised to represent, organize, and manage data in a systematic and consistent manner. Spatial data models are a critical component of geographic information systems (GIS) and are designed to facilitate the storage, retrieval, analysis, and visualization of geographic information. There are several types of spatial data models, each with its own approach to representing spatial data.
Database Structure
The Queensland Government stores its topographic data as feature classes of similar data themes within a spatial Postgres database. This ESRI cloud-based database is the point of truth for topographic data and stores the data in geographic coordinates (latitude and longitude) on the Geocentric Datum of Australian 2020 (GDA2020). Each dataset within the database is accompanied by metadata that describes the dataset, licensing, and attribution to ISO19115/ISO19139 standard. Additionally, an editable database is available within this environment that allows for the management, maintenance, and creation of features.
Reference System
Reference system, also known as a spatial reference system, is a framework for specifying and locating geographic phenomena in a standardized and consistent manner. These systems define a set of rules and parameters that enable the accurate positioning and referencing of objects or locations on the Earth's surface. They are essential for spatial data integration, analysis, and mapping. There are several key components within a spatial data reference system:
· Coordinate System: The coordinate system specifies how geographic positions are represented in terms of coordinates, which are typically expressed in latitude and longitude or easting and northing values. Common coordinate systems include geographic (e.g., WGS84) and projected systems (e.g., UTM or State Plane), each with specific units of measurement.
· Datum: A datum defines the origin, orientation, and scale of the coordinate system. It provides a reference point, typically located on the Earth's surface, from which all other geographic positions are measured. Different datums can result in variations in positional accuracy, so it's important to use the correct datum for a specific area or dataset.
· Projection: In cartography, a map projection is a set of transformations (mathematical calculations) that convert the angular geodetic coordinates of the geographic coordinate system to Cartesian coordinates of the planar projected coordinate system. The numbers of the coordinate system provide a frame of reference to locate features on the Earth's surface, to align data relative to other data, to perform spatially accurate analysis, to add to and edit the data, and to create cartographic products. Different map projection methods are suited for different applications some commonly used projections include the Mercator, Lambert Conformal Conic, and Albers Equal Area.
· Units of Measurement: Spatial reference systems define units of measurement for coordinates, such as degrees, meters, or feet, depending on whether they are geographic or projected coordinate systems.
· Ellipsoid: In geographic coordinate systems, the Earth's surface is approximated as an ellipsoid. The ellipsoid model used in a reference system defines the shape of the Earth and impacts calculations involving distances, areas, and angles.
· Grids and Graticules: Grids and graticules provide a set of lines or a network of intersecting lines on maps to aid in locating and measuring features. For instance, a graticule consists of lines of latitude and longitude on a map.
· Zone Designations: Some reference systems divide the Earth into zones to improve accuracy. For example, the Universal Transverse Mercator (UTM) system divides the Earth into multiple zones, each with its own unique coordinate system to minimize distortions.
· Registry and Authorities: Many reference systems are standardized and maintained by geospatial authorities. These organizations define and publish reference system specifications to ensure consistency and interoperability among GIS and mapping applications.
Spatial data reference systems are essential for geographic data interoperability and the accurate integration of data from various sources. Users need to be aware of the reference system used in their spatial data and ensure consistency when working with different datasets, as using incompatible reference systems can lead to spatial data misalignment and errors.
The Queensland Government Digital Topographic Data is currently represented as vector features in geographic coordinates (Latitude/Longitude) using Geocentric Datum of Australia 2020 (GDA2020) reference system with elevations (where stored) on Australian Height Datum (AHD71).
Data Types
Data types represented in Queensland Government’s Topographic Data set, comprises of vector data models containing objects in point, line, and polygon format. Attributes are associated with these geometric objects to provide additional information. The vector data model is suitable for representing discrete, well-defined features and is often used for mapping, cartography, and spatial analysis.
Attribution
Spatial data attribution is integral to spatial data analysis, as it provides the context and information needed to make informed decisions. When working with spatial data, users need to understand the meaning and limitations of attribute data, perform data validation and quality checks, and use appropriate methods to extract meaningful insights from the combined geographic and attribute data.
Attribution (attribute data) refers to the non-spatial information or characteristics associated with geographic features in a spatial dataset. In geographic information systems (GIS) and spatial data analysis, these attributes provide context and additional information about the geographic features, allowing users to understand and analyse the data more comprehensively.
Here are some key points related to spatial data attribution:
· Attributes: Attributes are essentially the data fields or columns in a spatial dataset. They can include a wide range of information, such as names, numerical values, text descriptions, dates, and more, depending on the type of data and the purpose of the dataset.
· Examples: Examples of attributes in spatial data include square meterage of island captured, project used for contour generation, purpose or functions of buildings, or the name of gazetted geographical features e.g. capes, mountain peaks.
· Database Integration: Spatial data is stored in a database format, with the geographic features linked to their corresponding attribute data. This allows for efficient querying, analysis, and visualization.
· Query and Analysis: Spatial analysts and GIS professionals use attribute data via definition queries to answer questions and perform analyses. For example, assess the impact of environmental variables on an area, or identify patterns and trends in geographic data.
· Symbolization: Attribute data can be used to symbolize or represent geographic features in a way that visually conveys information. For instance, a map might use different colours to represent different land use categories, with a legend explaining the meaning of each colour.
· Joining and Relating Data: In GIS, attribute data can be joined or related to spatial data through common keys (e.g., PFI). This allows users to integrate data from various sources for analysis.
· Metadata: Metadata associated with attribute data provides information about the source, accuracy, update frequency, and other relevant details. It is essential for understanding the quality and context of the attribute information.
· Data Quality: Ensuring the quality and accuracy of attribute data is critical. Errors or inconsistencies in attribute data can lead to incorrect analyses and decision-making.
· Time Series Data: Some spatial datasets include attributes that change over time, such as historical records of land use, population growth, or temperature measurements. Time series analysis is essential in such cases.
Attribute information during Feature Capture
Attribute information plays a pivotal role during the process of capturing features. Although various attributes are recorded for features, as shown in individual feature classes further in the document, the following examples are particularly noteworthy, as they are automatically calculated as part of the feature capture process.
Persistent and Unique Feature Identifiers
All features have been allocated a persistent feature identifier (PFI) and a unique feature identifier UFI.
A Persistent Feature Identifier (PFI) is generated for each feature at the point of creation in the database. The value of the PFI will stay with the feature through all changes to the feature (both spatial and non-spatial) until the feature is retired.
A Unique Feature Identifier (UFI) is generated for each new feature at the point of creation in the database (at this point the PFI and UFI will be the same). The value of the UFI will stay with the feature through all changes to the feature (both spatial and non-spatial) unless the feature is split into multiple other parts. If the feature is split, the separate parts will retain the original PFI but new UFI’s will be generated for the split parts.
Example of the process of PFI and UFI automatic generation.
First capture of feature, the PFI and UFI are automatically generated and are the same (see below).
On splitting, the PFI remains the same and two new UFI’s are automatically generated (see below).
The numbering sequence used for the identifier is based on the nested numbering sequence used on the individual datasets. The numbering sequence is 15 characters and comprises from the following:
· The first character determines the State the data is in:
Queensland | 1 |
· The next two characters determine the data theme. These follow the naming conventions and ordering as documented in the United Nations Global Fundamental Geospatial Data Themes:
Global Geodetic Reference Frame | 01 |
Addresses | 02 |
Buildings and Settlements | 03 |
Elevation and Depth | 04 |
Functional Areas | 05 |
Geographical Names | 06 |
Geology and Soils | 07 |
Land Cover and Use | 08 |
Land Parcels | 09 |
Ortho imagery | 10 |
Physical Infrastructure | 11 |
Population Distribution | 12 |
Transport Networks | 13 |
Water | 14 |
· The following three characters represent the feature class within the data theme. The first feature class is assigned the value of 001. For the other feature classes within the theme, the ordering is random so that other feature classes can be added or removed later without affecting the ordering.
· The last nine characters are incrementally generated in the numbering sequence starting at 000000001 for the first feature.
An example, when first captured, Moreton Bay for both PFI and UFI is 106006000000019
1 - Queensland (State)
06 - Geographical Names (UN Geospatial Data Theme)
006 - Bays (Feature Class)
000000019 - incrementally generated number
Created Date*
All features are tagged with a created date/time stamping. This information is recorded in the attributes of the feature. The date represents the date the feature was constructed/created or the date the feature loaded to the database. It is not the date of the source data.
Last Edited Date*
This date is automatically recorded in the attributes of the feature displays the date of any changes to a feature, either spatially or non-spatially. This allows users to track when changes are made to a feature. When data is initially loaded to the database or a new feature is created, the created and last edited dates will be the same.
*These fields are for internal purposes only.
Source data naming convention
It is crucial to ascertain the origin of spatial topographic data, whether in graphical or informational form, as this knowledge is fundamental to preserving the data's integrity. This information is recorded through attribution in the Attribute and Feature Source columns (see individual feature classes definitions below for further information).
The source data is derived from diverse and numerous channels, and due to its constantly expanding nature, providing a comprehensive list becomes impractical. Therefore, adherence to several general guidelines is recommended: Examples of source information -
For information obtained from Orthorectified imagery or Satellite Imagery. (Platform, Resolution, Project) e.g.:
· Orthophotography_10cm_Port Douglas
· Orthophotography_20cm_Galilee Basin South
· ALOS Satellite Imagery_1pt5m_West Qld Ph 4
· Spot Satellite Imagery_2pt5m_Zone 55
· LiDAR_10cm_Brisbane
For information obtained from Topographic Mapping or Topographic Data (Owner, Scale, Project, or map number) e.g.:
· Qld_1:25000 Topographic Map_794525
· GA_1:100000 Topographic Map_8562
· GA_1:250000 Topographic Data_Reservoirs
· DIGO_1:50000 Topographic Data_89542
For information obtained from Cadastral Mapping or Cadastral Data (Owner, Scale, Project, or map number) e.g.:
· Qld_1:25000 Cadastral Map_794525
· GA_1:100000 Cadastral Map_8562
· Qld_4 Mile Cadastral Map_4m16
· Qld_Parish Map_Cressbrook,
· Qld_Survey Plan_M12542
For information obtained from government databases (Database) or other Government agency (Department) e.g.:
· Qld_Place Names Database
· Qld_Place Names Plan _QPN265
· Qld_Spatial Cadastral Database
· Qld_Dept Environment, Science and Innovation
· Qld_Dept of Justice and Attorney-General
For information obtained from website (Owner, point of truth) e.g.:
· Australia Post Website
· Brisbane City Council Website
· Surf Life Saving Australia Website
· Website URL (where applicable)
Spatial Accuracy
Spatial accuracy refers to the degree of precision or correctness with which a location or position on the Earth's surface is represented or measured in geographic information systems (GIS), cartography, remote sensing, and other spatial technologies. It is a critical aspect of spatial data and analysis, and it can have significant implications for decision-making, navigation, and various applications.
Several factors can influence spatial accuracy, including:
· Instrumentation: The accuracy of the equipment or technology used to collect spatial data, such as GPS receivers, remote sensing instruments, or surveying tools, plays a crucial role in determining spatial accuracy.
· Geometric Properties: Spatial accuracy is often described in terms of horizontal accuracy and vertical accuracy. Horizontal accuracy refers to how accurately a location is represented on the Earth's surface in the horizontal plane (latitude and longitude), while vertical accuracy relates to the accuracy in the vertical dimension (elevation).
· Resolution: The spatial resolution of data refers to the size of the smallest discernible unit in a dataset. Higher-resolution data can provide more accurate representations of features and locations, while lower-resolution data may result in less accurate representations.
· Datum and Coordinate Systems: The choice of geodetic datum and coordinate system can impact spatial accuracy. Different datums and coordinate systems are designed for different regions and purposes and using the wrong one can introduce errors.
· Data Collection and Processing: How data is collected, processed, and georeferenced can affect spatial accuracy. Errors can be introduced during data collection, digitization, and transformation processes.
· Environmental Factors: Environmental conditions, such as atmospheric interference, signal blockage, or terrain, can influence the accuracy of GPS and remote sensing data.
· Temporal Changes: Changes in the Earth's surface over time, such as erosion or urban development, can affect spatial accuracy if data is not regularly updated.
Spatial accuracy is typically expressed as a measure of error, often in terms of meters or feet, and may be represented using terms like Root Mean Square Error (RMSE) or Circular Error Probable (CEP). Different applications may have varying requirements for spatial accuracy. For example, high-precision GPS systems used in surveying demand very high spatial accuracy, while consumer-grade GPS devices for navigation may have lower requirements.
Ultimately, achieving and maintaining spatial accuracy is crucial in various fields, including cartography, GIS, geospatial analysis, and navigation, as it ensures that spatial data is reliable and fit for its intended purpose.
Positional Accuracy
Positional accuracy is a critical aspect of spatial data that refers to how well the geographic location of a feature on a map or in a dataset matches its real-world location on the Earth's surface. It's a measure of how accurately spatial data represents the physical world. Positional accuracy can vary depending on the source of the data and the methods used to collect, process, and represent it.
There are several factors that can affect the positional accuracy of spatial data:
· Data Source: The accuracy of the source data, such as GPS measurements, survey data, remote sensing imagery, or data collected from various sensors, plays a crucial role. High-precision data sources tend to have better positional accuracy.
· Data Collection Method: The method used to collect spatial data can impact accuracy. For example, data collected using high-precision GPS equipment is likely to have better positional accuracy than data collected using consumer-grade GPS devices.
· Datum and Coordinate Systems: The choice of datum and coordinate system can affect positional accuracy. Different coordinate systems and datums have different levels of accuracy and can introduce errors if not properly considered. Particularly transforming between different systems.
· Georeferencing and Registration: The process of georeferencing involves aligning data with a known reference framework (e.g., a map or satellite image). Errors in the georeferencing process can introduce inaccuracies.
· Data Processing and Transformation: Any data processing or transformation steps can introduce errors. For example, resampling or reprojecting data can impact its accuracy.
· Scale: The scale at which data is represented can also affect positional accuracy. Finer-scale data (e.g., large-scale maps) generally have better accuracy than coarser-scale data.
· Environmental Conditions: Environmental factors, such as atmospheric conditions or interference, can affect the accuracy of GPS and remote sensing data.
Positional accuracy is typically expressed as a measurement of distance, often in meters or feet, and it is usually defined with a confidence interval to indicate the range within which the actual location is likely to fall. For example, a positional accuracy of 5 meters with a 95% confidence interval means that there is a 95% probability that the true location of a feature falls within 5 meters of the reported position.
A general guide to accuracies is as follows.
· For 3D relief data captured using LiDAR - 0.45 metres/horizontal and 0.15 metres/vertical
· For 2D/3D data captured using photogrammetry at 1:25 000 - 2.5 metres/horizontal and 1 metre/vertical
· For 2D/3D data captured using photogrammetry at 1:100 000 - 5 metres/horizontal and 2 metres/vertical.
· For 2D/3D data digitised from mapping at 1:25 000 - 12.5 metres/horizontal and 2.5 metres/vertical
· For 2D data digitised from mapping at 1:100 000 - 25 metres/horizontal. 
· For 2D data digitised from mapping at 1:100 000 - 100 metres/horizontal.
· For 2D data obtained from GPS readings - 1 metre/horizontal
· For 2D data digitised from recent orthophotography - 2.5 metres/horizontal
· For 2D data digitised from satellite imagery - 2.5 metres - 10 metres/horizontal
Completeness
Completeness refers to the extent to which a spatial dataset contains all the necessary and relevant information about a geographic area or a specific phenomenon. It is a crucial quality characteristic of spatial data and can significantly impact the utility and accuracy of geographic information systems (GIS) and spatial analyses. Incomplete spatial data can lead to misinterpretations and incorrect conclusions.
Key aspects and considerations related to spatial data completeness include:
· Attribute Data: Completeness in spatial data often pertains to the attributes or non-spatial information associated with geographic features. For example, in a dataset of city boundaries, completeness would mean that all relevant attributes, such as population, area, and administrative information, are included.
· Feature Representation: Completeness also involves ensuring that all relevant geographic features are represented in the dataset. For example, a land cover dataset should include all major land cover classes within a specified area, and a road network dataset should contain all the significant roads.
· Temporal Coverage: Spatial data completeness should consider the time dimension. It's important to ensure that data is up-to-date and represents the temporal extent of the phenomenon being studied. Historical data can be valuable for analysing trends and changes over time.
· Resolution and Detail: Completeness should consider the level of detail or resolution required for a specific analysis. Some applications may require high-resolution data, while others can make do with coarser, generalized data.
· Metadata: Metadata is crucial for understanding the context and limitations of spatial data. Metadata should document the data's sources, update frequency, scale, accuracy, and any known data gaps.
· Data Sources: Different data sources may have varying levels of completeness. For example, official government datasets may be more comprehensive than crowdsourced or volunteered geographic information. It's essential to be aware of potential limitations in data sources.
· Data Integration: In some cases, spatial data completeness can be improved through data integration. Combining data from multiple sources can fill gaps and provide a more complete picture.
· Data Collection and Validation: Ensuring data completeness often involves rigorous data collection and validation processes. Field surveys, remote sensing, and validation against authoritative sources can help confirm the completeness of spatial data.
When working with spatial data, users need to be aware of any limitations in data completeness, as these limitations can impact the reliability and accuracy of their analyses and decision-making. Efforts should be made to address data gaps and to maintain data completeness over time through regular updates and data quality assessments. Spatial Information (SI) seeks to have a statewide topologically correct coverage of all topographic features. Due to available resources this is not always possible for all features and so SI enters into strategic agreements with other organisations to have data at a lesser mapping accuracy over all areas where higher quality is not available.
Logical Consistency
Spatial logical consistency, often referred to as topological consistency, is a fundamental concept in geographic information systems (GIS) and spatial data management. It refers to the integrity and adherence to topological rules or principles within a spatial dataset. Topology, in this context, is the study of the geometric properties of objects that are preserved under continuous deformations, such as stretching or bending. Topological consistency ensures that the relationships between spatial features within a dataset are correctly represented and maintained. Here are some key aspects of spatial logical consistency:
· Consistency of Connectivity: In a topologically consistent dataset, the connectivity between spatial features (e.g., polygons, lines, or points) is accurately represented. For example, if two polygons share a common boundary, this shared boundary should be precisely defined in the data.
· Consistency of Adjacency: Spatial datasets often represent features that are adjacent to or near each other. Topological consistency ensures that adjacency relationships are correctly maintained. For instance, a dataset of parcels or land parcels, topological consistency ensures that adjacent parcels share a common boundary.
· Consistency of Intersection: When spatial features intersect, such as the intersection of railways, the overlap of land parcels, or the junction of river networks, topological consistency ensures that these intersections are correctly represented. It prevents gaps, overlaps, or sliver polygons (tiny, undesirable polygons resulting from errors).
· Consistency of Containment: Containment relationships refer to one feature completely containing another. For example, a state contains counties, and counties contain municipalities. Topological consistency ensures that containment relationships are accurately maintained without overlaps or gaps.
· Consistency of Network Connectivity: In network datasets, such as transportation or utility networks, topological consistency ensures that the network features are correctly connected, and paths between network elements are correctly defined. This is essential for routing and analysis.
Achieving and maintaining spatial logical consistency is critical in GIS and spatial data management as it ensures that spatial analysis and modelling are based on reliable data. Inconsistent or inaccurate spatial data can lead to errors in analytical results, misinterpretations, and unreliable decision-making. As part of data capture, Geospatial Data employs cleaning, and quality control procedures to ensure topological consistency within their spatial datasets.
A program of maintenance and upgrade of the spatial data is continually an ongoing program. The program aims to improve the topological consistency and currency of the legacy data captured over a 30-year period. Previously this data was cleaned to a standard suitable for manual mapping processes only and not to the standard required for automated mapping or spatial analysis. Data will be cleaned to:
· Remove undershoots in data,
· Remove overshoots in data,
· Remove gaps in continuous line work,
· Remove duplicate points and features,
· Correct drainage for downstream flow,
· Segment linear features at intersections and
· Concatenate linear features between intersections.
Metadata
ISO 19115 is an international standard for geospatial metadata, and it provides a comprehensive framework for describing geospatial datasets and related resources. Metadata based on ISO 19115 includes information about the dataset's content, quality, spatial and temporal characteristics, and more. To create metadata compliant with ISO 19115, a set of specific information must be provided. A high-level overview of the elements typically included in ISO 19115 metadata that may or may not be represented within the Topographic data are -
Identification Information:
· Title: The title or name of the dataset.
· Abstract: A summary of the dataset's content and purpose.
· Date: The date the metadata was created or last updated.
· Citation: The formal reference for the dataset.
· Citation Contacts: Contact information for the person or organization responsible for the data.
· Data Quality Information:
· Lineage: Describes the dataset's history, including how it was created, processed, and updated.
· Completeness: Information on the completeness of the dataset.
· Positional Accuracy: Information about the accuracy of the geographic positioning of features.
· Attribute Accuracy: Information about the accuracy of the data attributes.
· Logical Consistency: Details on logical consistency checks applied to the data.
Spatial Representation Information:
· Geometric Object Types: Describes the types of geographic features (e.g., point, line, polygon).
· Spatial Reference System: Information about the coordinate reference system used in the dataset.
· Topology Information: Information about topological relationships between features.
· Spatial Data Organization: Information about the structure of the dataset, including data formats, scales, and any aggregation or distribution of data.
Reference System Information:
· Information about the coordinate reference system, including the datum, projection, and units of measurement.
· Metadata Reference Information: Metadata standard used: Specify that ISO 19115 is the standard.
· Metadata Profile: Indicate if any specific profiles or extensions of ISO 19115 are used.
· Distribution Information: How the dataset can be accessed and distributed, including online links, file formats, and access restrictions.
Metadata Constraints:
· Any constraints or restrictions on the use of the metadata, such as copyright or licensing information.
· Metadata Contact: Contact information for the person or organization responsible for the metadata.
· Data Character Set: Specifies the character encoding used in the metadata.
· Spatial Data Themes: Keywords and controlled vocabulary terms that describe the dataset's subject matter.
· Temporal Extent: Information about the time period covered by the dataset.
Maintenance Information:
· How often the dataset is updated and information about data maintenance.
· Use Limitations: Any limitations or constraints on the use of the dataset.
· Additional Information: Any other relevant information about the dataset.
Figure 1 – Extract of a Metadata Record.