Emisonian: Joe Emison's Blog

biz | tech | psych | law

The Next Generation of AVMs

A version of this post appeared in the February 2011 edition of Live Valuation Magazine and is also available for download as a PDF.

“The most accurate way to value a property is to find out how much someone will pay for it. Unfortunately, sales data is only updated when a home sells. However, building permit data allows us to take property sale values and bring them up to date, thus giving us a newer, better way to value properties.”—Holly Tachovsky, president of BuildFax, a national aggregator of building permit data.

Most automated valuation models (AVMs) estimate property values by looking at the internal characteristics of properties as part of a “hedonic model,” and by looking at historic sales around the properties as part of a “repeat sales index.” In theory, the combination of the hedonic and repeat sales evaluations captures the full range of factors necessary to value a property automatically. In practice, the quality of the data that drives the hedonic model leads to imperfect results. This article describes a better type of AVM using building permit data.


The Structure of Automated Valuation Models

Automated valuation models provide instantaneous property values using mathematical formulas and property data. Most AVMs consider the age of a structure, its square footage, number of bedrooms, and other characteristics that make up the property. These characteristics are used as part of a hedonic model, which is a specialized term for a type of mathematical formula that estimates value from a list of characteristics.

However, property characteristics cannot determine a property’s value on their own. The real estate agent mantra “Location, location, location” implies that two structures with identical characteristics may command different prices depending on their location. So the hedonic model is not enough; AVMs need a way to capture the market conditions around the property. Most AVMs achieve this through a repeat sales index, which calculates localized market fluctuations by looking at repeat sales of the same properties over time.

The Problem with Property Characteristics

“Our effective year built should never be used by insurance companies and others as a substitute for an actual physical inspection of property, nor for determining the true year a property was constructed or the current condition… It does NOT reflect the actual age of the property….”—Lori Parrish, CFA, property appraiser, Broward County.

Even with the sophisticated structure of a hedonic model weighted against a repeat sales index, AVMs are not perfect. Zillow, the provider of a widely-used consumer-facing AVM, says that at least 20% of the time, their estimate is more than 20% off the sales price, and in some top metro areas, their estimate is more than 20% off the sales price more than 40% of the time.

One of the main reasons that automated valuation models are inaccurate is that the underlying property characteristic data is inaccurate. Property characteristic data comes almost exclusively from tax assessor offices and MLS listings (largely derived from tax assessor data). Tax assessor data has a lower level of accuracy because it must be filled out for the Computer Assisted Mass Appraisal (CAMA) system to run and calculate taxes. If a value is not known, the CAMA system can’t run, so values—accurate, inaccurate, or guessed—are entered into the system.

Tax assessors are efficiently calculating accurate estimates for tax purposes, which the CAMA method enables. Every tax assessor’s office has a working appeals process for correcting inaccurate values (and thus the tax amount itself), resulting in taxes that are either accurate or underestimated, which is fine from a political as well as a personal standpoint.

The problem arises when third parties take data from tax assessor offices and interpret it not in the proprietary and specific way that the assessors use it, but rather as a completely accurate determination of property characteristics. This practice leads to many problems with the resulting hedonic models. Why? Four reasons: First, there is no dialog between the hedonic estimate and the homeowner. If the underlying values used to calculate property tax are significantly off, they will be corrected when the assessor’s office and homeowner take a closer look at the actual home. For the hedonic estimate, the homeowner is not directly notified and has no way of correcting the value.

Second, hedonic models now rely on data that is stored in many different locations, and often updated infrequently or at all. Thus, even if the tax assessor fixes the problem in the assessor’s data, the majority of the tens of thousands of copies of that data in existence will not be updated, and the inaccuracy will persist.

Third, hedonic models are much more sensitive to small inaccuracies than CAMA systems. This is largely due to the fact that they are estimating much larger numbers: home values, which are around 100 times larger than property tax amounts. There are many cases in which the data is wrong at the tax assessor’s office, but not so wrong to make the homeowner notice the error or want to go through an annoying dispute process. It is in these same cases that the hedonic estimates are far off, because the errors are significant when applied to estimating home value.

Fourth is the inaccurate-data feedback loop that retains a level of inaccuracy over even accurate assessor numbers. Because the hedonic models assume that the data is correct, the underlying mathematical formula assigns incorrect coefficients. For example, if household square footage values are either accurate or underestimated, then the hedonic formula coefficient associated with square footage will be too high, as it compensates for numbers that are too low on average. A coefficient that is too high because only some of the data points are inaccurate will adversely affect all of the estimates generated by the model.

The above quote from Lori Parrish, Broward County’s property appraiser, shows that county’s belief that their “year built” designation is inaccurate, and Broward County has stopping showing “year built” on their website because they are concerned about others relying on it. Moreover, Roger Arnemann, vice president of global consulting and data services at Risk Management Solutions, has examined many different portfolios and has found that construction type in commercial buildings is inaccurate almost half of the time, and the number of stories in single-family dwellings is inaccurate more than 20% of the time, among other field-level accuracy issues. While the theory behind the hedonic model part of AVMs is sound, the accuracy of the underlying data is on much shakier ground.


Correcting the Property Characteristics Problem with Building Permit Data

What can be done about this inherent problem with the crucial hedonic model part of AVMs? The solution is to replace the standard hedonic estimate—based on square footage, number of bedrooms, year built, etc—with a different formula, driven by more accurate data, that delivers the same underlying estimate of the non-market-adjusted value of the underlying structure. In short, use the last sales amount of the property, and add to it the values of the building permits issued on the property since the last sale.

Building permit data is essentially a “change log” for a home. Every permitted work for an increase in the value of the underlying property, from an addition to a roof replacement to fire damage repair to an electrical upgradeis logged by the building department and available through public record request. And most importantly, building permit data has an extremely high level of accuracy. Unlike tax assessor’s data, there is never any need to estimate or guess about the presence of a building permit.

As the property “change log,” building permit data has one core weakness: it cannot reveal the absolute or total value of a property that was built before available permit data coverage starts, often no more than 20 years. Building permit data is only effective when paired with a starting point, the sales data. The core weakness of sales data is that it happens infrequently and is only effective right after sales, which is where building permit data comes in. Building permit data brings sales data up to date by logging all of the significant changes to the property since the last sale.


Testing an AVM Based on Building Permit Data

I recently conducted an analysis of whether building permit data does in fact capture individual property characteristics in the way that a hedonic model is supposed to work.[1] Using a random sample of 10,000 properties across 10 different cities in Florida and sales data from AVM data supplier Real Info[2], I looked at those properties that had been sold at least twice in the past 20 years to see whether building permit data after the first sale would more accurately predict the amount of the second sale.

For example, 966 Oakpoint Circle in Apopka, Florida, sold for $306,000 on April 21, 2000. The same property sold again on June 12, 2006. The repeat sales index for the local area showed that prices of comparable homes had increased by roughly 75% between those two sale dates, which would give an estimate of around $535,000 on June 12, 2006. Tax assessor data available on June 12, 2006 was unchanged from its April 21, 2000, values, and a mixed (hedonic model and repeat sales index) estimate for 966 Oakpoint Circle on June 12, 2006, was $522,000.

However, building permit data on the property shows that in late 2000, an in-ground pool, cool deck, boat dock and boathouse were all built, for a total of $44,861 in permit valuation. Ignoring the hedonic estimate and instead adding this to the 2000 sales amount and including the repeat sales index estimate, we get a total of roughly $580,000. The actual sales amount on June 12, 2006 was $590,000. In this case, the AVM based on building permit data was less than 2% off the actual value, whereas the traditional model was more than 11% off.

Below are two charts from my analysis showing how building permit data increases the accuracy of a pure repeat sales index model.[3] The first shows that on properties with permits totaling more than $25,000, the building permit data AVM estimates the proper value within 5% for around four times more properties; the second looks at properties that had any number of permits and finds that the building permit data AVM still beats the repeat sales index model across the board.

In summary, in the situations where significant building permits have been issued, ignoring the building permit data leads to less accurate results. Including the building permit data improves the repeat sales index, and may obviate the need for the hedonic model altogether, as it captures the underlying property value from a more accurate data source.


Enhancing Today’s AVMs with Building Permit Data

“We are continuously developing new datasets to be on the forefront of enhancing AVMs, and we believe that building permit data will be a must-have AVM data source in the near term.”—Jim Kirchmeyer, CEO of Real Info, Inc.

It may not be necessary to discard hedonic models altogether. Building permit data can be used in a blended repeat sales index/hedonic model AVM in at least three different ways. First, building permit data can be used to establish better confidence levels on AVM estimates. In particular, where extensive permitting work has been done on a property, building permit data provides a more accurate estimate.

Second, as explained above, one insidious aspect to the inaccuracies in tax assessor data is that they make the mathematical formula less accurate for all—even accurate—property characteristic values. Building permit data can be used in the creation of the hedonic formula to mitigate this issue.

Finally, last sale + property change log (building permit data) could be added as a third estimate of valuation in AVMs, and weighted just as both the hedonic model and repeat sales index are weighted against each other. This could provide significant lift to existing AVMs without having to start development from scratch.

[1] Full details on the analysis are available in BuildFax Internal Research Paper No. 15; email joe@buildfax.com to request a copy.

[2] For more information on Real Info’s AVM data, please contact Jacob Garcia at jgarcia@real-info.com.

[3] Unfortunately, it was not possible for me to get historical values from a blended repeat sales index/hedonic model AVM, although it is unlikely that such a model would have yielded significantly different values from a repeat sales index because the average time between the two sales was four years, which is very little time to expect updates in even the most up-to-date tax assessor data.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: