Currently viewing the tag: "data science"

…and when is data mining and analysis just a sophisticated, math-laden opinion?

I like to draw insight from juxtapositions. Yesterday, I listened to half a dozen academic presentations on modeling and data mining aimed at understanding the impact of extreme weather on global communities. As you might imagine, these exercises require large data sets, bold assumptions, and extrapolations, some out to as far as year 2100.

Later in the day, I sat at the piano with a blog post about Debussy’s Arabesque No. 1 for solo piano, a popular piece known for its “impressionistic” qualities. The author of the blog did some analysis on melody, harmony, and rhythm that essentially was trying to get into Debussy’s head as he composed this piece.

The blog author teased out a melody buried in some arpeggios and then attempted to show how it becomes a motif throughout the piece. She admitted she couldn’t really know whether this melody was Debussy’s intent, but made an assumption that this certainly could have been what was going through Debussy’s head.

The academic data mining and modeling would probably be scary to those who aren’t comfortable with numerical modeling and methods; the analysis of Arabesque No. 1 would probably be scary to those not familiar with musical notation and compositional methods. The assumptions and extrapolations made in both cases could make nervous anyone familiar with both.

In both cases, a “specialist” is trying to gain insight into something that, for all practical purposes, is unknowable – Debussy’s thought process (even if subconscious) as he composed Arabesque No. 1 and economic and community impacts as the planet warms over the coming decades – and then convince an audience that they’ve indeed shed some light into a dark cave. And if we are to take either analysis as useful, others would have to validate the findings, or otherwise agree on the methodology, results, and conclusions.

Moral of this tale: Analysis isn’t “new knowledge,” regardless of what kind of notation accompanies it, until many other experts weigh in and many analyses converge on similar conclusions. And just because someone has credentials that brand him or her a specialist, doesn’t mean their analysis is more than a sophisticated opinion.

What really astounds me about listening to academic presentations these days (which I have been doing my entire career) is how few people, usually experts with as much background and experience on the topic as the presenter, actually question the results or methodology. This to me is dangerous at its core. Academia is where data and findings should be vigorously interrogated and debated. These days, technical presentations in general seem to be more of an advertising opportunity than a spark for debate towards achieving some consensus and contribution to the knowledge base.  

Tagged with:
 

Now here’s a excellent example of the importance of data frequency resolution! This New York Times article informs us about some ‘weird’ characteristics of the planet Uranus (apart from the juvenile fun you can have with the name).

But what’s even more fascinating, if you are a data geek, is the notion that Uranus ejects “plasmoids” (a blob of plasma and magnetic fields, responsible for a planet’s atmosphere leaking away) was formulated just recently after space scientists went back into thirty year old data taken during Voyager 2’s 1986 journey, increased the resolution of the data from 8-minute averages to ~2 seconds. They detected what’s known as an anomaly in the planet’s magnetic field. You have to click on the NASA blog post referenced in the article to find this graph, below. The red is the average line; the black is the higher time frequency.

The plasmoid release occupies only 60 seconds of Voyager’s 45-hour long flight by Uranus, but has led to all kinds of interesting informed speculation about Uranus’ characteristics, especially compared to the other planets in our solar system. This “60 seconds” reminds me of what I vaguely recall learning in an anthropology class in college about constructing an entire hominid from a single tooth. (I thought it was Australopithicus but I wasn’t able to quickly confirm that.). Obviously, scientists will have to further validate their findings, either with a follow-on trip to the outer planets, or other means.

But the story certainly is an interesting lesson in data science. And I bet the scientists were itching to say Uranus burps, or even better, farts.

Tagged with:
 

So much “painting by numbers” is done with numerical models. And the government is probably the largest consumer of such models. All models require assumptions, and as Commandment 2 in “Painting By Numbers” counsels, you must identify these assumptions to understand the results.

The need for assumptions gives policy-makers wide latitude to drive towards answers which support their policies. For example, the EPA under the Obama administration calculated the “social cost of carbon” as a value around $50/ton of carbon emitted. The EPA under the Trump administration managed to tweak the model so that the social cost of carbon (SCC) was more like $7/ton.

I wrote about this a while back in this space. Apparently, one thing you can do is select a different value for the internal rate of return (a financial parameter) in the model, according to a few references I read at the time.

Now here’s some fun: A paper I found surfing the web entitled “The Social Cost of Carbon Made Simple” shows one methodology for calculating it. By the way, this has got to be the most wrongly titled paper of 2010, the year it was published. There is nothing simple about it! Go on – click on it and read the first few pages. I dare you.

https://www.epa.gov/sites/production/files/2014-12/documents/the_social_cost_of_carbon_made_simple.pdf

But the paper does acknowledge that a “…meta-analysis…found that the distribution of published SCC estimates spans several orders of magnitude and is heavily right-skewed: for the full sample, the median was $12, the mean was $43, and the 95th percentile was $150…” Moreover, the spread was as low as $1/ton.

See what I mean? If you want to de-emphasize carbon in your economic policies, you pick a methodology that minimizes SCC. If you want to build your policies around climate change, you pick a method that maximizes it. To the credit of the Obama administration, they settled on something close to the mean.

The paper is provisional work and nine years old, so don’t take it for any kind of gospel. I use it simply to illustrate points that require of the paper neither absolute accuracy or timeliness.

In an article (New York Times, March 27, 2020)  titled “Trump’s Environmental Rollbacks Find Opposition From Within: Staff Scientists,” I read this: “In 2018, when the Environmental Protection Agency proposed reversing an Obama-era rule to limit climate-warming coal pollution, civil servants included analysis showing that by allowing more emissions, the new version of the rule would contribute to 1,400 premature deaths a year.”

I’m not going to dig deep and determine how they arrived at the number 1400, and anyway, the key to the sentence isn’t the number, it’s the word “contribute.” How many other factors “contribute to those premature deaths?

The article argues that Trump administration officials are not even trying to “tweak” the models, but instead have come in with a “repeal and replace” attitude “without relying on data, and science and facts.” It was reported that Obama’s head of the EPA, before she departed, had encouraged staffers to remain and make sure that EPA’s analyses have the “truth” put in there. 

Unfortunately, numerical models don’t cough up the truth, just someone’s version of it. Those who don’t take the time understand all of this become victims reduced to parroting others’ versions of the truth. On the other hand, not even being willing to consider data and science and facts is completely wrong-headed. That is ignorance, as any model of human behavior will tell you.

Looking for something?

Use the form below to search the site:


Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...

Set your Twitter account name in your settings to use the TwitterBar Section.