GIS Data

Introduction to GIS Data

Just jumping in? Be sure to check out
parts one and two: What is GIS and GIS Mapping

In Parts 1 and 2, we answered the question 'What is GIS?,' and covered the basics of GIS mapping. Here's a quick refresher:

As a concept, GIS is the intersection of location and data.

As a real world application, GIS is software that captures, manages, and displays data in relation to location.

With numerous use cases and map types, GIS mapping is easily the most common application for GIS software. However, creating a GIS map requires more than just x,y coordinates.

Creating a GIS map requires data.

GIS mapping software

GIS data is a broad category with great variation in terms of:

  • Data formats
  • File types and extensions
  • Data capture methods
  • Use cases for the data

In this chapter, we’ll cover the two basic data types (vector and raster), common GIS file formats, as well as resources for sourcing GIS data.

Click on any of the subjects below to jump directly there.

Vector Data

Vector data is, essentially, a list of coordinates: one that provides instructions on how an image should be rendered. This means that:

Vector images are high-fidelity graphical representations of an image or shape.

This graphical property means that vector images are infinitely scalable: enlarged or reduced with no quality loss. This makes them the preferred file type for web logos and large-scale prints.

Vector images can only be created and manipulated with a computer program like Adobe Illustrator or Sketch. You cannot, for example, use a camera to capture a vector image.

Vector images consist of three basic components: points, lines, and polygons.

Vector Points

Vector points are basically x,y coordinates. They don’t have dimensions and usually represent single data points.

In GIS mapping, vector points illustrate features too small to be drawn at scale.

For example, cities shown on a country map are too small to be drawn at scale.

If you created a map only of the city, you could use lines to draw the city’s boundary. However, create a map of the entire country and that boundary is no longer visible - so a labeled point is used instead.

The map below illustrates this concept perfectly: representing all the state capitals as a labeled, star-shaped point.

map of United States capital cities

U.S. capital cities represented as points [Source]

Vector Lines

Vector lines are a series of interconnected vector points.

They have distinct start and end points and though they can intersect with one another, a single line will not intersect with itself.

Lines are used to represent linear features such as rivers, roads, and trails.

Color, thickness, and line type (solid or dashed) are used to denote unique features, or unique attributes of the same feature.

For example, a heavily trafficked highway might be drawn with a thick line, whereas the residential roadway would be much thinner. Moreover, streets could be solid black lines, while the river might be dotted and blue.

Stylistic choices like these are at the map makers discretion, but can add depth and visual interest to the map.

map of Nurnberg U-Bahn subway system

Nürnberg subway system represented with lines [Source]

Polygons

Polygons are lines in which the first point is also the last: creating a shape.

Polygons represent features with distinct boundaries: states, counties, property lines, lakes, or forests.

Though they're most frequently used to represent perimeter, with modern GIS, polygons can also be used to measure a feature’s area.

topography map

Topographic map represented with polygons [Source]

Raster Data

Where vector data - coordinates that create an image - is somewhat abstract, raster data is quite literal.

Raster data is grid or pixel based.

Commonly found as aerial photography, topographic maps, and satellite imagery, raster file extensions include TIFF, PNG, and JPEG.

In GIS mapping, raster data generally represents surfaces.

Unlike vector data, raster data cannot be scaled infinitely. Enlarge it too much and it becomes fuzzy and pixelated. Stretch too much in one direction and the features distort.

Despite these limitations, raster data does have advantages; chiefly, it provides a level of detail not possible with vectors.

Take digital photographs as an example.

Photographs provide an immense level of contextual detail and represent the subleties of light and color quite accurately. They are also one of the most common raster data types.

Consider the images below. The first depicts vector images of trees, the other is a raster photograph.

Both images depict trees accurately. However, the raster photograph is, not only more detailed, but is more visually nuanced.

vector drawing of trees

Vector drawing of trees [Source] [Source]

raster image of a tree

Raster image of trees [Source]

In terms of GIS mapping, raster data comes in two types: discrete and continuous.

Discrete data can only take specific values, whereas continuous data can take any value within a range. For example:

The number of people in a room is a discrete value. You can have any number of people, but you can’t have half a person. You’re limited to whole numbers: no decimals or percents.

Continuous data is more flexible, including values such as height, weight, and length. A person's' height can be any value within the range of human heights. In fact, most people’s height is not exact to the inch or foot.

Continuous and discrete data, though complementary, do have different applications.

discrete raster data map

Map of discrete data [Source]

continuous data raster map

Map of continuous data [Source]

The map on the left illustrates discrete raster data.

Each value is assigned a different color, while each cell has only one data type and one color: there’s no gradation of either.

In contrast, the map on the right represents continuous data.

Each grid cell contains some level of gradation. Continuous rasters are often used to represent data that experiences gradual change: temperature, population, elevation, etc.

Additional resources
Vector vs Raster: What’s the Difference Between GIS Spatial Data Types?
What's the difference between discrete data and continuous data?
Types of GIS Data Explored: Vector and Raster

Shapefiles

Shapefiles are, by far, the most common GIS file type. Developed by GIS powerhouse ESRI, shapefiles are a way to store and share GIS vector data.

Shapefiles combine non-topological data with associated attributes.

To breakdown what that really means, let’s return to our original definition of GIS: the intersection of data and location.

Non-topological data is the location. It consists of x,y coordinates and does not include a third dimension (the z coordinate).

Examples of non-topological data include street, state, or area maps.

Associated attributes are the data.

Consider an elevation map. The non-topological data (x,y coordinates) illustrate the base terrain, while the associated attributes (z coordinates) represent the elevation profile.

london street map

Street map (Non-topological data) [Source]

world elevation map

Elevation map (Topological data) [Source]

Topological data isn’t just limited to linear data such as elevation.

Drought conditions in the United States are a good example of a map that could be stored and shared as a shapefile. The map of the United States would be the non-topological data, while the drought conditions data would be the attributes.

Components of a Shapefile

Though shapefile sounds singular, there is actually a minimum of three file types that must be present in order to render a shapefile correctly.

File typeExtensionDescription
Main.SHPContains the shape coordinates: essentially describing all the basic shapes within the file.
Index.SHXThe spatial file, which helps the GIS software to find features more quickly within the main SHP file.
dBase.DBFContains all the attribute data for the features within the first two file extensions.

Other Common File Types

With 60+ GIS file types, each with unique characteristics and use cases, the sheer number of geospatial file formats can be overwhelming.

That said, many of these file types are specialized and/or only supported by one program - limiting their everyday use.

Below we cover five of the most common, widely used GIS file types. Click any of the links below to jump there directly.

GEOJSON

GeoTIFF

GDB

KML/KMZ

CSV

For a more comprehensive list of GIS file types, be sure to check out one of the resources at the bottom of this section.

File typeGeographic Javascript Object Notation
Extensions.GEOJSON .JSON
DescriptionGeoJSON is vector file format that encodes geographical data using Javascript Object Notation (JSON), a data formatting language.

Compared to other web-based languages, JSON is lightweight and fairly straightforward.

JSON files generally contain two elements:

  • Name/value pairs
  • Lists of values
GeoJSON files contain those elements, as well as a geometry component.

These files store coordinates as text, but render in a visual format.

File typeGeoTIFF
Extension.TIF .TIFF .OVR
DescriptionTIFF files are raster image files: most closely related to JPEG, PNG, and GIF file types.

Unlike other raster file types, they don’t compress to decrease file size. As such, they're not optimal for use on websites.

That said, they do offer the most flexibility in terms of editing and adding transparency, tags, and layers.

GeoTIFFs are TIFF files that contain location metadata. The metadata acts as instructions on how to locate the file on the map.

Supported by most platforms, GeoTIFF files are the industry-standard for satellite imagery and other GIS image files.

File typeESRI File Geodatabase
Extension.GDB
DescriptionFile geodatabases allow users to store all thematically related data in a single database.

Each database can organize and store vector and raster files, relationship classes, attribute tables, and spatial data.

Users can create multiple thematic databases as needed.

Like Shapefiles, geodatabases are a proprietary format created by ESRI.

Geodatabases and Shapefiles can achieve similar goals. However, geodatabases offer significant advantages:

  • Faster performance
  • Topological organization
  • Raster capability
  • Data compression
  • Up to 1TB file sizes
There are actually two types of geodatabases: file (GDB) and personal (MBD).

Personal geodatabases were the precursor to file databases and are the default for Microsoft Access.

To learn more about personal databases, as well as the differences between the two database types check out the article below.

Learning resource: What is a Geodatabase? Personal vs File Geodatabase
File typeKeyhole Markup Language
Extension.KML .KMZ
DescriptionKML stands for Keyhole Markup Language. As the default file format for Google Earth, it’s likely the best known GIS file type outside of professional GIS circles.

KMZ is the compressed version of KML, signifying KML-Zipped.

KML files contain both geometry and attribute data.

They also contain a variety of configuration options that, though they add significant value to Google Earth as an application, limit the use of KML files elsewhere.

This format was originally developed by Keyhole Inc, which was later bought by Google.

File typeComma Seperated Value File
Extension.CSV
DescriptionCSV stands for comma separated value file.

As the name suggests, CSV files are a list of data points (values) separated by commas.

As text files, they are easily the simplest file format here, making them ideal for transferring data between programs.

Though not technically a mapping format, CSV files are frequently used to create point layers in GIS platforms. For this to be successful, the CSV file must have columns for both x and y coordinates.

Additional resources
What's the Difference Between PNG, JPEG, GIF, and TIFF?
The Ultimate List of GIS Formats and Geospatial File Extensions
What Is a CSV File, and How Do I Open It?

LiDAR Data

Originating as the combination of the words 'radar' and 'light,' the term LiDAR is now used as an acronym for 'light detection and ranging.'

LiDAR is a surverying method that employs lasers to measure distance.

Laser light pulses leave the LiDAR system, bounce off the ground or other objects, and return to the sensor. Distance is measured by tracking how long a pulse takes to return.

Light moves incredibly fast and in all directions simultaneously. This means that LiDAR devices can create point clouds: complex scans made of millions of individual points.

lidar point cloud of washington DC

LiDAR point cloud of Washington, D.C. [Source]

Though point cloud is an accurate description, that terminology doesn’t really reflect the awesome reality.

Point clouds are highly detailed 3D maps, illustrating everything from a downtown core to a national forest.

Unlike radar and sonar, LiDAR is not necessarily inhibited by object interference. One LiDAR emission can complete multiple returns, meaning it will bounce multiple times between the LiDAR system and any objects it meets.

This makes LiDAR particularly useful for mapping vegetated areas. For example, when surverying a national forest, the LiDAR emissions won't stop at the top of the tree canopy: they will make returns until hitting the ground.

The two most common LiDAR maps are digital elevation models (DEM) and canopy height models (CHM): both made possible by the multiple returns property.

DEM

DEM of 600 square km of land in Rex, NC [Source]

For DEMs, you would take a full LiDAR scan and then filter for the last return: remembering that the last return generally represents ground points. With these filtered data points you can then create a bare earth map, one that excludes all but the surface of the Earth itself.

For CHMs, the idea is similar. Filter for the first return (in this case, the top of the tree) and then subtract the final return (the ground). This leaves the height of each tree in the area, allowing you to create a full canopy height map.

Additional Resources
A Complete Guide to LiDAR: Light Detection and Ranging

Sourcing GIS Data

GIS data comes in many forms and from a huge variety of sources.

In an ideal world, you’d either have the tools to collect the data yourself, or access to the appropriate databases.

In reality, you’ll often need to source the data yourself.

Luckily, there is a massive amount of open-source map data online. A few well-placed Google searches can unearth an abundance of valuable resources.

Many counties maintain databases of their own GIS data, the majority of which is available for free download. There are also several open-source databases that are a great starting point for people looking to find a specific data type.

Below you’ll find the top 5 sources for free GIS data, as well as resources for further research.

map of United States capital cities

Natural Earth Data

Best for: Cultural, physical, and basemap data

Natural Earth Data (NED) is an especially excellent resource for cartographers, topping several lists for best open-source GIS database.

NED offers a combination of both vector and raster data sets, most of which are available in three different size scales.

Supported by the North American Cartographic Information Society, NED hosts data on a global scale and should be your first stop for beautiful GIS map making.

map of United States capital cities

OpenStreetMap Data

Best for: High spatial resolution vector data

OpenStreetMap (OSM) offers high spatial resolution vector data. What differentiates OSM from other GIS data sources, is that all the data on OSM is crowdsourced from cartographers and other GIS map makers.

This means there is a massive amount of detailed information available.

The downside of the crowdsource format is that nothing is vetted beforehand. Verifying data accuracy is difficult and some data sets are incomplete.

That said, most anecdotal evidence points to a high degree of accuracy. As the data is coming primarily from GIS professionals, it’s in everyone’s interest to upload quality data that increases the quality of the database as a whole.

USGS Earth Explorer logo

USGS EarthExplorer

Best for: Remote sensing data

USGS EarthExplorer is easily one of the most comprehensive sources for remote sensing data i.e. data from satellites or other high-flying aircraft.

With one of the more user-friendly search functions and the ability to download in bulk, USGS is an invaluable resource for cartographers in need of satellite or aerial data.

USGS Earth Explorer logo

OpenTopography

Best for: LiDAR data

OpenTopography is one of few online resources for full LiDAR datasets. It is not globally comprehensive, with around 90% of the resources focusing on the United States, Canada, Australia, Brazil, Haiti, Mexico, and Puerto Rico.

That said, in the world of GIS, LiDAR data is a scarce and precious resource. So despite the limitations, these data sets can be invaluable for people whose projects focus on those countries.

Nasa Earth Observations logo

NASA Earth Observations

Best for: Global satellite imagery

Nasa Earth Observations (NEO) is another resource that focuses on remote sensing data. This resource is unique because the data is climate and environment centered, making atmosphere, land, oceans, energy, and human life data more accessible.

In addition, these resources are updated quite consistently (ensuring greater accuracy) and are available in a multitude of formats: JPEG, PNG, KML, and GeoTIFF.

Additional Resources
List of GIS data sources
10 Free GIS Data Sources: Best Global Raster and Vector Datasets (2019)