Thursday 30 December 2010

Shapefile basics

It looks like I’m going to have to get to grips with shapefiles and GIS data so here’s a post to give me a heads-up on shapefiles.

What is a shapefile?

A shapefile stores nontopological geometry and attribute information for the spatial features in a data set. The geometry for a feature is stored as a shape comprising a set of vector coordinates.

Because shapefiles do not have the processing overhead of a topological data structure, they have advantages over other data sources such as faster drawing speed and edit ability. Shapefiles handle single features that overlap or that are noncontiguous. They also typically require less disk space and are easier to read and write.

Shapefiles can support point, line, and area features. Area features are represented as closed loop, double-digitized polygons. Attributes are held in a dBASE® format file. Each attribute record has a one-to-one relationship with the associated shape record.” *

Note that a geographic element forming part of a shapefile is referred to as a feature.

“A representation of a geographic feature that has both a spatial representation referred to as a "shape" and a set of attributes.” ***

So, key points:

  • The data is nontopographical geometry it faster to search.
  • Shapefiles can support point, line, and area features.

A shapefile is capable of storing a mixture of different shape types but this is prevented by the specification: "All the non-Null shapes in a shapefile are required to be of the same shape type."

Parts of a shapefile

Firstly, a shapefile actually consists of several files, not one, and a shapefile can contain a combination of mandatory and optional files. A shapefile will contain 3 mandatory files:

  • .shp - the shape file containing the feature geometry
  • .shx - the shape index containing a positional index of the feature geometry (facilitates quick searching)
  • .dbf - the attribute file containing attributes for each shape (dBase IV format)

An ESRI shapefile consists of a main file, an index file, and a dBASE table. The main file is a direct access, variable-record-length file in which each record describes a shape with a list of its vertices. In the index file, each record contains the offset of the corresponding main file record from the beginning of the main file. The dBASE table contains feature attributes with one record per feature. The one-to-one relationship between geometry and attributes is based on record number. Attribute records in the dBASE file must be in the same order as records in the main file.” **

There are a number of optional files but I am currently interested in:

  • .prj – the projection file containing the coordinate system and projection information in plain text
  • .sbn and .sbx – both containing the spatial index of the features
  • .shp.xml - metadata in XML format

 

References

* http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf, p.5
** http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf, p.6
*** http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf, p.31

Thursday 30 December 2010