Introduction to GIS Data
As a concept, GIS is the intersection of location and data.
As a real world application, GIS is software that captures, manages, and displays data in relation to location.
Creating a GIS map requires data.
GIS data is a broad category with great variation in terms of:
- Data formats
- File types and extensions
- Data capture methods
- Use cases for the data
In this chapter, we’ll cover the two basic data types (vector and raster), common GIS file formats, as well as resources for sourcing GIS data.
Click on any of the subjects below to jump directly there.
I. Vector Data
II. Raster Data
IV. Common GIS File Types
V. LiDAR Data
VI. Sourcing GIS Data
Vector data is, essentially, a list of coordinates: one that provides instructions on how an image should be rendered. This means that:
Vector images are high-fidelity graphical representations of an image or shape.
This graphical property means that vector images are infinitely scalable: enlarged or reduced with no quality loss. This makes them the preferred file type for web logos and large-scale prints.
Vector images can only be created and manipulated with a computer program like Adobe Illustrator or Sketch. You cannot, for example, use a camera to capture a vector image.
Vector images consist of three basic components: points, lines, and polygons.
Vector points are basically x,y coordinates. They don’t have dimensions and usually represent single data points.
In GIS mapping, vector points illustrate features too small to be drawn at scale.
For example, cities shown on a country map are too small to be drawn at scale.
If you created a map only of the city, you could use lines to draw the city’s boundary. However, create a map of the entire country and that boundary is no longer visible - so a labeled point is used instead.
The map below illustrates this concept perfectly: representing all the state capitals as a labeled, star-shaped point.
U.S. capital cities represented as points [Source]
Vector lines are a series of interconnected vector points.
They have distinct start and end points and though they can intersect with one another, a single line will not intersect with itself.
Lines are used to represent linear features such as rivers, roads, and trails.
Color, thickness, and line type (solid or dashed) are used to denote unique features, or unique attributes of the same feature.
For example, a heavily trafficked highway might be drawn with a thick line, whereas the residential roadway would be much thinner. Moreover, streets could be solid black lines, while the river might be dotted and blue.
Stylistic choices like these are at the map makers discretion, but can add depth and visual interest to the map.
Nürnberg subway system represented with lines [Source]
Polygons are lines in which the first point is also the last: creating a shape.
Polygons represent features with distinct boundaries: states, counties, property lines, lakes, or forests.
Topographic map represented with polygons [Source]
Where vector data - coordinates that create an image - is somewhat abstract, raster data is quite literal.
Raster data is grid or pixel based.
Commonly found as aerial photography, topographic maps, and satellite imagery, raster file extensions include TIFF, PNG, and JPEG.
In GIS mapping, raster data generally represents surfaces.
Unlike vector data, raster data cannot be scaled infinitely. Enlarge it too much and it becomes fuzzy and pixelated. Stretch too much in one direction and the features distort.
Despite these limitations, raster data does have advantages; chiefly, it provides a level of detail not possible with vectors.
Take digital photographs as an example.
Photographs provide an immense level of contextual detail and represent the subleties of light and color quite accurately. They are also one of the most common raster data types.
Consider the images below. The first depicts vector images of trees, the other is a raster photograph.
Both images depict trees accurately. However, the raster photograph is, not only more detailed, but is more visually nuanced.
In terms of GIS mapping, raster data comes in two types: discrete and continuous.
Discrete data can only take specific values, whereas continuous data can take any value within a range. For example:
The number of people in a room is a discrete value. You can have any number of people, but you can’t have half a person. You’re limited to whole numbers: no decimals or percents.
Continuous data is more flexible, including values such as height, weight, and length. A person's' height can be any value within the range of human heights. In fact, most people’s height is not exact to the inch or foot.
Continuous and discrete data, though complementary, do have different applications.
The map on the left illustrates discrete raster data.
Each value is assigned a different color, while each cell has only one data type and one color: there’s no gradation of either.
In contrast, the map on the right represents continuous data.
Each grid cell contains some level of gradation. Continuous rasters are often used to represent data that experiences gradual change: temperature, population, elevation, etc.
Vector vs Raster: What’s the Difference Between GIS Spatial Data Types?
What's the difference between discrete data and continuous data?
Types of GIS Data Explored: Vector and Raster
Shapefiles are, by far, the most common GIS file type. Developed by GIS powerhouse ESRI, shapefiles are a way to store and share GIS vector data.
Shapefiles combine non-topological data with associated attributes.
To breakdown what that really means, let’s return to our original definition of GIS: the intersection of data and location.
Non-topological data is the location. It consists of x,y coordinates and does not include a third dimension (the z coordinate).
Examples of non-topological data include street, state, or area maps.
Associated attributes are the data.
Consider an elevation map. The non-topological data (x,y coordinates) illustrate the base terrain, while the associated attributes (z coordinates) represent the elevation profile.
Topological data isn’t just limited to linear data such as elevation.
Drought conditions in the United States are a good example of a map that could be stored and shared as a shapefile. The map of the United States would be the non-topological data, while the drought conditions data would be the attributes.
Components of a Shapefile
Though shapefile sounds singular, there is actually a minimum of three file types that must be present in order to render a shapefile correctly.
|Main||.SHP||Contains the shape coordinates: essentially describing all the basic shapes within the file.|
|Index||.SHX||The spatial file, which helps the GIS software to find features more quickly within the main SHP file.|
|dBase||.DBF||Contains all the attribute data for the features within the first two file extensions.|
Other Common File Types
With 60+ GIS file types, each with unique characteristics and use cases, the sheer number of geospatial file formats can be overwhelming.
That said, many of these file types are specialized and/or only supported by one program - limiting their everyday use.
Below we cover five of the most common, widely used GIS file types. Click any of the links below to jump there directly.
For a more comprehensive list of GIS file types, be sure to check out one of the resources at the bottom of this section.
Compared to other web-based languages, JSON is lightweight and fairly straightforward.
JSON files generally contain two elements:
These files store coordinates as text, but render in a visual format.
|Extension||.TIF .TIFF .OVR|
|Description||TIFF files are raster image files: most closely related to JPEG, PNG, and GIF file types.|
Unlike other raster file types, they don’t compress to decrease file size. As such, they're not optimal for use on websites.
That said, they do offer the most flexibility in terms of editing and adding transparency, tags, and layers.
GeoTIFFs are TIFF files that contain location metadata. The metadata acts as instructions on how to locate the file on the map.
Supported by most platforms, GeoTIFF files are the industry-standard for satellite imagery and other GIS image files.
|File type||ESRI File Geodatabase|
|Description||File geodatabases allow users to store all thematically related data in a single database.|
Each database can organize and store vector and raster files, relationship classes, attribute tables, and spatial data.
Users can create multiple thematic databases as needed.
Like Shapefiles, geodatabases are a proprietary format created by ESRI.
Geodatabases and Shapefiles can achieve similar goals. However, geodatabases offer significant advantages:
Personal geodatabases were the precursor to file databases and are the default for Microsoft Access.
To learn more about personal databases, as well as the differences between the two database types check out the article below.
Learning resource: What is a Geodatabase? Personal vs File Geodatabase
|File type||Keyhole Markup Language|
|Description||KML stands for Keyhole Markup Language. As the default file format for Google Earth, it’s likely the best known GIS file type outside of professional GIS circles.|
KMZ is the compressed version of KML, signifying KML-Zipped.
KML files contain both geometry and attribute data.
They also contain a variety of configuration options that, though they add significant value to Google Earth as an application, limit the use of KML files elsewhere.
This format was originally developed by Keyhole Inc, which was later bought by Google.
|File type||Comma Seperated Value File|
|Description||CSV stands for comma separated value file.|
As the name suggests, CSV files are a list of data points (values) separated by commas.
As text files, they are easily the simplest file format here, making them ideal for transferring data between programs.
Though not technically a mapping format, CSV files are frequently used to create point layers in GIS platforms. For this to be successful, the CSV file must have columns for both x and y coordinates.
Originating as the combination of the words 'radar' and 'light,' the term LiDAR is now used as an acronym for 'light detection and ranging.'
LiDAR is a surverying method that employs lasers to measure distance.
Laser light pulses leave the LiDAR system, bounce off the ground or other objects, and return to the sensor. Distance is measured by tracking how long a pulse takes to return.
Light moves incredibly fast and in all directions simultaneously. This means that LiDAR devices can create point clouds: complex scans made of millions of individual points.
Though point cloud is an accurate description, that terminology doesn’t really reflect the awesome reality.
Point clouds are highly detailed 3D maps, illustrating everything from a downtown core to a national forest.
Unlike radar and sonar, LiDAR is not necessarily inhibited by object interference. One LiDAR emission can complete multiple returns, meaning it will bounce multiple times between the LiDAR system and any objects it meets.
This makes LiDAR particularly useful for mapping vegetated areas. For example, when surverying a national forest, the LiDAR emissions won't stop at the top of the tree canopy: they will make returns until hitting the ground.
The two most common LiDAR maps are digital elevation models (DEM) and canopy height models (CHM): both made possible by the multiple returns property.
For DEMs, you would take a full LiDAR scan and then filter for the last return: remembering that the last return generally represents ground points. With these filtered data points you can then create a bare earth map, one that excludes all but the surface of the Earth itself.
For CHMs, the idea is similar. Filter for the first return (in this case, the top of the tree) and then subtract the final return (the ground). This leaves the height of each tree in the area, allowing you to create a full canopy height map.
A Complete Guide to LiDAR: Light Detection and Ranging
Sourcing GIS Data
GIS data comes in many forms and from a huge variety of sources.
In an ideal world, you’d either have the tools to collect the data yourself, or access to the appropriate databases.
In reality, you’ll often need to source the data yourself.
Luckily, there is a massive amount of open-source map data online. A few well-placed Google searches can unearth an abundance of valuable resources.
Many counties maintain databases of their own GIS data, the majority of which is available for free download. There are also several open-source databases that are a great starting point for people looking to find a specific data type.
Below you’ll find the top 5 sources for free GIS data, as well as resources for further research.
Natural Earth Data
Best for: Cultural, physical, and basemap data
Natural Earth Data (NED) is an especially excellent resource for cartographers, topping several lists for best open-source GIS database.
NED offers a combination of both vector and raster data sets, most of which are available in three different size scales.
Supported by the North American Cartographic Information Society, NED hosts data on a global scale and should be your first stop for beautiful GIS map making.
Best for: High spatial resolution vector data
OpenStreetMap (OSM) offers high spatial resolution vector data. What differentiates OSM from other GIS data sources, is that all the data on OSM is crowdsourced from cartographers and other GIS map makers.
This means there is a massive amount of detailed information available.
The downside of the crowdsource format is that nothing is vetted beforehand. Verifying data accuracy is difficult and some data sets are incomplete.
That said, most anecdotal evidence points to a high degree of accuracy. As the data is coming primarily from GIS professionals, it’s in everyone’s interest to upload quality data that increases the quality of the database as a whole.
Best for: Remote sensing data
USGS EarthExplorer is easily one of the most comprehensive sources for remote sensing data i.e. data from satellites or other high-flying aircraft.
With one of the more user-friendly search functions and the ability to download in bulk, USGS is an invaluable resource for cartographers in need of satellite or aerial data.
Best for: LiDAR data
OpenTopography is one of few online resources for full LiDAR datasets. It is not globally comprehensive, with around 90% of the resources focusing on the United States, Canada, Australia, Brazil, Haiti, Mexico, and Puerto Rico.
That said, in the world of GIS, LiDAR data is a scarce and precious resource. So despite the limitations, these data sets can be invaluable for people whose projects focus on those countries.
NASA Earth Observations
Best for: Global satellite imagery
Nasa Earth Observations (NEO) is another resource that focuses on remote sensing data. This resource is unique because the data is climate and environment centered, making atmosphere, land, oceans, energy, and human life data more accessible.
In addition, these resources are updated quite consistently (ensuring greater accuracy) and are available in a multitude of formats: JPEG, PNG, KML, and GeoTIFF.