Conversations about the MLS industry, creating software, and employee ownership.

There’s been quite a dustup over the decision reportedly made by the Indianapolis Metropolitan Board of REALTORS® (MIBOR) that their MLS IDX rules against “scraping” also prohibit Google from indexing an agent’s site showing IDX listings.

For a bit of background, indexing is what Google does — it crawls the web and creates indexes of as much of it as it can so that when people search on Google it can return relevant results quickly.  Here’s what Wikipedia has to say about scraping (with some emphases from me added):

Web scraping (or Web harvesting, Web data extraction) is a computer software technique of extracting information from websites. Usually, such software programs simulate human exploration of the Web by either implementing low-level Hypertext Transfer Protocol (HTTP), or embedding certain full-fledged Web browsers, such as the Internet Explorer (IE) and the Mozilla Web browser. Web scraping is closely related to Web indexing, which indexes Web content using a bot and is a universal technique adopted by most search engines. In contrast, Web scraping focuses more on the transformation of unstructured Web content, typically in HTML format, into structured data that can be stored and analyzed in a central local database or spreadsheet. Web scraping is also related to Web automation, which simulates human Web browsing using computer software.

The highlighted sentence is where the confusion begins on this issue.  Scraping and indexing are closely related.  That they are different, however, is emphasized by the important words “in contrast” that follow the “closely related” sentence.  Put together, indexing is “closely related” to scraping but it is “in contrast” to it in what I think are important ways, namely the resulting use of the data.  I’ll expound on this more below, but, for now, back to the controversy at hand.

In responding to the post on Agent Genius, Hilary Marsh from NAR said:

. . . questions have arisen about the scope of the requirement that IDX site operators protect the listings of other participants displayed on their IDX sites from “scraping”. Specifically, whether the policy distinguishes between “malicious” scraping and what might be considered “good” or “benign” scraping. Also, whether “indexing” is a type of scraping. The Center for REALTOR® Technology (”CRT”) advised that while the intent of “scrapers” may be malicious, and the intent of “indexers” good, the two practices from the Web server’s view appear to be the same. Consequently, NAR staff responded to questioners that the requirement to prevent scraping includes indexing.

So, the rub of the issue is that MIBOR punted the ball back to NAR, which asked CRT, and CRT (as a technical body) said, technically, there’s no difference between scraping and indexing.  Of course, as is clear from the above Wikipedia definition, CRT is right — there really is no distinction from the perspective of the computer activity between scraping and indexing.  Both processes read the web site and do stuff with the data.

However, focusing on the technical process here is wrong.  Instead, the important distinction is between the results of the activity.  Here is perhaps a compelling explanation of how these two are different.  When you go to visit a web site, your web browser reads the web site and displays the information back to you.  In fact, most web browsers store a copy of that site on your computer so that it can display it back to you faster if you look at it again later.  From a technical perspective, your visit to the web site and your browser caching the content locally on your computer is not very different from what a scraper does.

However, nobody is going to argue that web visitors are scrapers.  Why?  Because of their intent and what they are doing with the data.  A consumer looking at content is a good thing.  So, too, I would suggest is Google indexing the web and real estate content.  Google is not (at least today) taking the content and presenting it as their own creation.  Instead, they are linking back to the source of the data, which provides a critically important service to the web site being indexed.  This is what the web is all about and so interpreting indexing and scraping as the same thing results in the leap backward the commenters on the Agent Genius post decry.  It’s an undoing of the web for IDX sites, which have become critically important to agents and brokers today.

Before concluding this post, however, I also want to point out that not every one agrees that Google’s indexes are positive or even benign.  In Belgium, a court has ruled that Google’s News service violates certain newspapers copyrights.  In hailing the opinion, the winner of the case is quoted by the New York Times as saying:

”Today we celebrate a victory for content producers,” said Margaret Boribon, secretary-general of Copiepresse. ”We showed that Google cannot make profit for free from the credibility of our newspaper brands, hard work of our journalists and skill of our photographers.”

Could a similar argument be made by MLSs or listing agents about Google indexing listing data?  Possibly.  However, I think getting a similar ruling from a US court is unlikely.  (Any lawyers out there who know the law on this, please comment to clarify, because I’m definitely no expert here.)

More importantly, our industry has accepted the web as its friend and Google is accepted as a critical part of the web.  To many, in fact, Google is the web.  What’s wrong with the MIBOR decision and CRT’s narrow, technical interpretation that led to MIBOR’s decision, is that it goes against the many decisions that have already been made that the web is the real estate industry’s friend. That decision cannot be unmade.  It’s done.  Rule interpretations like that provided by CRT, however, do result in NAR members not being able to compete.  As many on Agent Genius have commented, Trulia, Zillow and Realtor.com are not hamstrung by this same interpretation of the IDX policy, which only hinders and restricts NAR’s members.  That’s wrong.

Fortunately, we live in a web world and, for many, that means we know each other personally. Most of those commenting over at Agent Genius have met, know and greatly respect Chris McKeever (@crtweet on Twitter), who now heads up CRT.  My hope is that Chris can join the conversation and clarify CRT’s interpretation or let us know why the current interpretation is best.  I’m asking for this conversation with the greatest respect for Chris and everyone at CRT.  MIBOR put them on the hot seat but perhaps there’s a possibility the conversation can result in greater understanding for everyone, and hopefully a quick clarification on this critically important matter for MLS organizations that haven’t yet interpreted the policy on this issue.

This last week we added a new function in flexmls Web to allow users to edit map shapes, name them and change their color. You can click to grab any point and move it to change the shape, and you can click in the center of the shape to drag it to a new position.

You also can name each shape and specify a color to distinguish it from others.

We’re next going to add the ability for users to share shapes with each other, which we think will produce some interesting collaborative opportunities to improve searching, statistics, and CMAs.

Do you think the ability to share shapes will be useful?

Who do you think owns the shape data once it is saved and shared?

FBS Blog

FBS develops internet based software for real estate professionals. If you manage real estate transactions or listings, our software makes your life easier.

The FBS Blog is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.


Authors









Categories

News

FBS is integrating DocuSign into flexmls Forms Read…

TAR/MLS Selects FBS and flexmls Web for Next MLS System Read…

Events

Inman Connect

New York City -- Marriott Marquis
Jan 13 - 15, 2010
Michael Wurzer is moderating the MLS panels.

Buzz

"We could not be more pleased. flexmls is a good, stable system that is easy for beginners, yet very powerful for advanced users."

Dave Montgomery
MLS Chair
Pocono Mountains Association of REALTORS
Home | Products | Support | Summit | Blog | About
©2009 FBS. All Rights Reserved.