Using FOIA Data and Unix to halve major source of parking tickets

By tptacek - 3 days ago

Showing first level comment(s)

Matt's schtick is automated, large-scale FOIA requesting; he obtains huge collections of data from cities and then tries to do interesting stuff with it. Here, he apparently managed to get all the tickets in Chicago for several years running, and then used that data to fix the parking signs.

This, to me, is so neat.

tptacek - 3 days ago

He cost the city $60k in revenue by asking them to clear up confusing signage.

That's some small-government activism I can get behind!

floatrock - 2 days ago

This is great! I often request records but have never come up with a great use case!

Under Florida public record law, source code produced by state employees is, in very narrow circumstances, a non-exempt public record (the code can't process sensitive data, etc.). I'm considering a future endeavor where I periodically request the code to such projects until the I.T. department decides it's worth the effort to open source it.

I like to think this is a step towards consolidating publicly funded code and reducing duplicate effort. Ahh, imagine making a pull request to your city's website! But I'm getting ahead of myself...

hpincket - 2 days ago

Would be nice to use the same skills to help reduce cars illegally parked in the bike lane. Identify areas which cyclists commonly complain about that (e.g., to the city) and encourage them to put up better signs?

btrettel - 3 days ago

This is awesome. Question though: how is producing license plate data like this not a disallowed privacy invasion? It seems like you could totally track who's parking where and potentially do nasty stuff, if you know (say) someone well-off whom you don't like and who doesn't seem to mind getting tickets on a regular basis.

mehrdadn - 3 days ago

Are there any resources explaining the FOIA process? I'm not sure what types of information is available, what it can be used for, etc and am always amazed with the type of information people are able to get the government to hand over.

jklein11 - 2 days ago

Reminds me of working with free-form, manual entry order detail information, in a former life.

Hundreds of thousands of records a month. I ended up importing them into Excel(1) and then using... what was that called? An MS/Windows library that came with IE 5 and/or a few other things, that provided regex support (with a few quirks) that was accessible via VBA.

The point was, I could programmatically mine it -- including regex pattern matching and replacement of and within cell contents -- while also having a flexible UI within which to find and handle one-off cases. When the one-off's demonstrated a repeating pattern, I could quickly iterate to add that to the programmatic mining logic.

This included adding color cueing for items of particular interest, manual follow-up. Excel's sorting capabilities to bring potentially related instances into visually displayed groups. And the like.

It ended up working quite well. I might have preferred something else to VBA, and I did use Perl and other stuff, elsewhere (something that also gave me both power and the flexibility to rapidly iterate).

But the point is, with such data, I found it very useful to combine regex and rapid programmatic manipulation, together with a good visual interface (including visual cues, the ability to comment upon instances -- Excel cell-level comments -- etc.) and manual manipulation.

As a final aside, the extensive set of Excel keyboard shortcuts greatly aided in rapidly and effectively navigating and massaging the imported data.

1. This was back when Excel had... I think it was a 64K (or a bit less) limit on the number of rows in a sheet.

P.S. I tended to retain the originally imported data in its columns, and to produce my mining of it in other columns. That way, I could always and immediately see what I started with, for any particular record. (And, if things visually started to be "too many columns", well, Excel lets you hide a range of columns from the view. As one example of how its features really helped, on the visual front while doing this work.)

I still had to learn and allow for some quirks Excel exhibited with respect to importing text data. That included making sure the cells/columns being imported into carried the correct/needed formatting designation before importing into them (usually, "Text").

pasbesoin - 3 days ago

I too feel I couldn't pass the $190m cost in the first place. Granted, I can see where the cost ramps up as explained by @morei. Could someone explain whether this is for the 10-year contract or a license of some sort for each year?

If it is annually, they got 17m tickets over 7 years so for 10 years, assuming they issue just over 19m tickets, that means each parking ticket needs to be at least $10 to cover the cost, even at $100 per ticket, IBM is banking on 10% share? That seems excessive to me but I never worked in government so could someone enlighten me on this?

By any chance there's a conflict of interest for government to be willing to make improvement and cut down parking tickets or any other similar source of income? Or maybe that's what public audit is for?

Bobbleoxs - 2 days ago

I did something similar for universities, to help students select their courses:

https://easy-a.net/

I wrote a blog post about it, because it requires a ton of work to get FOIA requested data - this I'm assuming was done in the same painstaking way:

https://austingwalters.com/foia-requesting-100-universities/

I give this props. I'm sure it required a ton of work

lettergram - 2 days ago

Wow, great stuff!

Did you give more thought into the address cleaning bit? Or does anyone have an idea how to go about transforming mangled addresses into coordinates?

I have a problem that's been bothering me for months, similar to what you have here: people from an emergency service call-center are inputting the addresses of the emergencies. For emergencies that happen on the public domain, there is often not a specific address, but rather names of landmarks. Something like "Street StreetName / Opposite Train Station Y", which can be written like "st stName / opp tr st y" or some other infinite variations.

I don't have any after-data to corroborate, but I do have previous instances where the operator inputted the same address better. If I can extract the correct landmarks, I think I can do a Google Places search for them, with a cleaned query, like "Store Amazon, Best Street, Ohio" to get coordinates that can fall into an acceptable area.

PS: in the example you gave with Lake Shore Drive, I think you could easily correct the names with an algorithm based on the Levenshtein distance

kioleanu - 2 days ago