Conversations about the MLS industry, creating software, and employee ownership.

Over on the RETS-Dev mailing list, there has been a discussion on-going about whether RETS should be trying to standardize the data or not. Those against standardizing the data say that the data is so non-standard now that if we try to standardize it, too much non-standard data will be lost in the process. Those for standardizing the data say that you have to start somewhere. Of course, this is a never-ending argument, as both things are true.

It is true that data in MLS systems too often is not validated. Some common examples mentioned in the discussion are fields like “age” that vary from a specific year to opinions like “old” and everything in between. Or a field like Approximate Square Feet that allows entries such as “HUGE” or “3000+” or other non-numeric data. These are just some of the many examples. It is tempting to say that this data should just be chucked, as, frankly, it is not very valuable. What is a “huge” house? What is an “old” house? On the other hand, there are cases where the data has some value but is just poorly structured. For example, there may be a phone number that says “123-456-7890 (call only after 5 p.m.)”. The “call only after 5 p.m.” is valid and important data but it isn’t part of a phone number. If a standard is rigidly enforced limiting data to the phone number, that data will be lost.

Also true, however, is that if we don’t start defining data standards at some point, these bad practices will simply continue. We cannot let the mistakes of the past define our future. This is not only true with regard to controlling the input on specific fields but defining a broad and deep set of fields. Many MLSs only track what they think about themselves and I think we, as MLS vendors, need to do a better job of helping our clients learn the “best practices” of the industry. We need to help our clients learn from each other, instead of having to continually oar their own ship.

But this is no easy task. First, we’re battling the adage that “the client is always right.” Second, we’re battling the truth that change is hard. Coupled together, these two forces present a strong barrier to getting new MLS clients to adopt a different way of tracking data. This is one of the reasons we’ve taken an active role in trying to establish national standards on data and that we’re insistent that the MLSs themselves be involved in this process. The client is always right and so the MLS clients need to ask for these changes. Similarly, change is less hard if done for compelling reasons, and nationwide data standards are more compelling than local standards.

Fortunately, some MLSs are already asking for data standards. There are some very large efforts on-going right now in California to merge some big data sets together. At the RETS Conference last week, Frank Tadman from REInfoLink showed us a very cool tool for comparing and linking data from the five or six MLS systems they are harmonizing in the San Francisco area. Similar efforts also are occurring in Southern California now and have been on-going for some time at large regionals like MRIS.

As important and large as these efforts are, however, we also need to recognize that these are only a few MLSs, maybe twenty or twenty-five out of 700. The other 650 or so MLSs undoubtedly have some knowledge and best practices contained in their data sets, too. We need to capture that knowledge and bring it to the table. To that end, we’ve been working on what we’re calling our “best design” at FBS, which involves the painstaking process of reviewing all 100 or so MLS meta-data structures we’ve set up to date and trying to harmonize them. This isn’t just an aggregation, effort, though. We’re also trying to refine the data into what is the best method of representing it most accurately. Our hope is that this will add to the collective knowledge and process of defining the data standards for the future.

Without a doubt, this is a transitional stage from lack of standards toward standardization. This transitional stage will be difficult and, historically, has proven to be a desert too far to cross. Not this time, though. The need is too great. We must define a path that allows the industry to move toward data standards. We need to define the standards now with MLSs so they can begin collecting better data. Until that time, we need to allow the disparate data (this house is “HUGE!”) to continue to be transported. If these two needs cannot be met simultaneously, then the need to progress with standards is more important than preserving the disparate data.

3 Comments

 
Paul Stusiak says
April 27, 2007
2:42 pm

You make some good points. There are a few things that appear to be truth statements that do not represent my thinking or represent things I said.

The first is that all non-conforming data should be disposed. That is not what I said. There are some cases where lax data input rules permit values that are not correct – your phone number example. That example combines a phone number with a remark. In the small number of exceptions where this exists in legacy data – remember, the idea is to make the information valuable going forward so cases where the information is bad shouldn’t appear in the future – those small number of legacy records need to be migrated to the new phone AND the other information is moved to the remarks. Additionally, where local conditions dictate, RETS2 is intended to permit extensions to the information to handle those cases.

In some cases, non-conforming information should be thrown away. An example was provided of House Age for a listing. For a specific MLS, some real data was provided where this value was not a number or something with general meaning like “unknown” or “old-timer” but was things like ‘1403′ or ‘04/02′ Since the location could not possibly have had houses for sale in 1403 AD, I’m guessing that it is useless information. Yes, someone entered it. Maybe it has meaning to them, but it has no meaning to anyone else. These are excercises that are carried out at the MLS level rather than at the standards level. If we all agree that there is some meaning to Age and local conditions have some very different understanding of what that means, then it should be a local extension and not the standard use of it.

I would also point out that this discussion started at the RETS meeting in Austin earlier this month. While I know that you know this, some of the people reading this may get the wrong impression that this is limited to the rets-dev mailing list. This discussion started in a room with both technology vendors and MLS staff. What I heard, while there was some equivocation from the MLS staff about the size of the problem, was that this was a very good thing to attempt. Those MLS staff represent by my rough count, 250,000 agents. They are not all from large regionals either, although they were well represented. Not one of those people said “no way”.

Finally, I would point out that the discussion on rets-dev has gotten mired down (mainly)in the details of the two elements that have some additional constraints. I would point out that the constraints were only applied to these two elements (phone number and email address) because they have well known representations that have standards already defined and that they are very valuable tools for communications. Without a proper phone number or email address, it is difficult to communicate with other parties. Permitting random information for an email address seems counter-intuitive. If I don’t know it, I don’t put it in. If I do know it, I should enter an email that is valid. Also, these are proposed constraints and are intended to stimulate discussion to identify important things like having a way to indicate that the age of a building is not known. Hopefully, this discussion will continue and will help to build a better standard.

You and I are definitely on the same page about the responsibilities that we have and the need to be more inclusive. There seems to be some will right now to make some effort around better data. It is at the least worth trying to gauge the willingness to fix this long-running problem that makes the information much less valuable.

April 27, 2007
4:23 pm

Thanks for commenting, Paul. What about the idea proposed of having a loose and strict version as a transition? Is that possible?

Robbie says
April 29, 2007
12:01 am

Ugh, it’s this same thinking that have people using Excel instead of SQL Server and wondering why they can’t do the cool things that the MLS across the state does. It seems counterintuitive, but the more restrictions you place on the data, the more freedom you end up having in taking advantage of it.

For example, if you allow a square footage of “HUGE”, you’ve just made it impossible to automate the publishing of your listing to Trulia, PropSmart or Google Base (Which force you to conform to their standards even if you don’t have any). You’ve made it impossible to throw the data into Excel, or some other business intelligence tool to gain insight as what is happening the market. Anytime something is random text instead of a number or well formed data with strict rules (like an email address or a url), you are losing the opportunity for a computer to automate the processing of your data. Which defeats the purpose of using computers to begin with!

Ultimately, software engineers need to explain to their customers why the customer is wrong, and inform them of the long term ramifications of their decisions. All we can and should do is give our professional opinion and hope they heed our advice in our area of expertise.

Personally, I think all non standard data should go into a Remarks or Data Comments field where free form text remarks belong and be easily ignored by software. The old data is too valuable to throw away completely and but isn’t valuable enough to prevent the design of better software.

Leave A Reply

 

FBS Blog

FBS develops internet based software for real estate professionals. If you manage real estate transactions or listings, our software makes your life easier.

The FBS Blog is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.


Authors









Categories

News

Michael Wurzer named to Inman’s 100 Most Influential Leaders Read…

FBS is integrating DocuSign into flexmls Forms Read…

Events

Association Executive Institute

Quebec City, Canada
Apr 16 - 20, 2010
FBS sponsors this event

NAR Mid-Year Convention

Washington, D.C.
May 13 - 15, 2010
One of the best trade show events of the year.

Buzz

"FBS stays well ahead of industry trends and provides solutions that are easy to use and cost-effective. They truly partner with us to ensure our members have the best tools in the industry."

Susan Poling
Executive Officer
Lincoln County Association of REALTORS
Home | Products | Support | Summit | Blog | About
©2010 FBS. All Rights Reserved.