Party Vibe

Register

Welcome To

Tax Takers Send in the Spiders

Forums Life Politics, Media & Current Events Tax Takers Send in the Spiders

  • This topic is empty.
Viewing 1 post (of 1 total)
  • Author
    Posts
  • Quote:
    02:00 AM Jan, 25, 2007

    Websites around the world are getting a new computerized visitor among the Googlebots and Yahoo web spiders: The taxman. A five-nation tax enforcement cartel has been quietly cracking down on suspected internet tax cheats, using a sophisticated web crawling program to monitor transactions on auction sites, and track operators of online shops, poker and porn sites.

    The “Xenon” program — a reference to the super-bright auto headlights that light up dark places — was started in The Netherlands in 2004 by the Dutch equivalent of the IRS, Belastingdienst. It has since been expanded and enhanced by international group of tax authorities in Austria, Denmark, Britain and Canada, with the assistance of Amsterdam-based data mining firm Sentient Machine Research.

    Xenon is primarily a spider: a program that downloads a web page, then traverses its links and downloads those as well, ad infinitum. In this manner spiders can create huge datasets of web material, while preserving the relationships between pages at the moment they were spidered — something that can reveal a lot about the people that made the pages.

    It’s unclear how effective Xenon has been in generating investigative leads. Contacted by Wired News, the tax departments of Canada and the United Kingdom confirmed participation in the program, but declined further comment.

    Dag Hardyson, the national project leader for e-commerce for Skatteverket, the Swedish tax authority, was more forthcoming. Skatteverket is scheduled to join the Xenon project this year, and Hardyson said web crawling is well suited to tax enforcement.

    “The internet is wide open for tools,” said Hardyson. “It’s much easier to handle than the real world.”

    Xenon, explained Marten den Uyl of Sentient, is in some ways the opposite of something like Google’s web crawler, which traverses a tree of links and grabs a copy of everything it sees. Xenon is smart about link selection and context, and uses a “slow search paradigm,” he said.

    Whereas a spider like the Googlebot might hit thousands of websites in a second, “With Xenon it may take minutes, hours or even days to do a slow search.”

    The slow search prevents the crawler from creating excessive traffic on a website, or drawing attention in the sites’ server logs. Den Uyl declined to say what user-agent the Xenon software reports itself as, but it’s likely to be variable or configurable on the tax investigator’s part.

    The spider can also be configured and trained to look at particular economic niches — a useful feature for compiling lists of business in industries that traditionally have high rates of non-filing. “For instance, weight control (yields) 85,000 hits, some for products … also services,” says Sweden’s Hardyson.

    Once the web pages are screen-scraped, Xenon’s Identity Information Extraction Module interfaces with national databases containing information like street and city names. It uses that data to automatically identify mailing addresses and other identity information present on the websites it has crawled, which it puts into a database that can be matched in bulk with national tax records.

    As illuminating as Xenon is for the tax man, the data-mining effort poses dangers to citizen privacy, said Par Strom, a noted privacy advocate in the world of Swedish IT.

    “Of course it’s not illegal,” said Strom. “I don’t feel quite comfortable having a tax office sending out those kind of spiders.”

    One issue has to do with how the information Xenon captures is protected.

    Sentient has created access controls for its law-enforcement data-mining tool, called Data Detective, but its Xenon software lacks many of those protections, said dan Uyl, commenting on the theory that investigators will quickly delete the compiled data.

    “Data Detective (handles) long-term data warehousing,” he said, “(Xenon is) short-term project data warehousing. Different type of data, different type of analysis.”

    But Hardyson said the Swedish government — which already has its own internally developed tax crawlers — is currently keeping a copy of everything it spiders. That means that someone’s long-expired actions have the potential to come back and haunt them. “We can scan and store all actions for every e-marketplace in Sweden, it’s about 55,000 per day,” said Hardyson. He said his agency hasn’t decided if it will change its policies with the new, more sophisticated Xenon software. “Is this what we should do? Our lawyers must look at it.”

    Canada’s tax authorities declined to state what its Xenon data retention policies are, as did Simon Bird, head of the “Web Robot Team” at the British HM Revenue and Customs office.

    In the United States, the IRS is not a part of the Xenon project, but would neither confirm nor deny that it uses spidering software in its investigations.

    Strom said now that the cat is out of the bag, there’s no way to get governments or corporations to forgo technologies like spiders and data mining.

    “The information is public of course, because it’s posted on the internet,” Strom says. “It wasn’t meant to be used this way … (this is) using the naivete of people. It’s on the limit of what is ethical.”

    http://www.wired.com/news/technology/security/1,72564-0.html

0

Voices

1

Reply

Tags

This topic has no tags

Viewing 1 post (of 1 total)
  • You must be logged in to reply to this topic.

Forums Life Politics, Media & Current Events Tax Takers Send in the Spiders