…and when is data mining and analysis just a sophisticated, math-laden opinion?
I like to draw insight from juxtapositions. Yesterday, I listened to half a dozen academic presentations on modeling and data mining aimed at understanding the impact of extreme weather on global communities. As you might imagine, these exercises require large data sets, bold […]
…and when is data mining and analysis just a sophisticated, math-laden opinion?
I like to draw insight from juxtapositions. Yesterday, I listened to half a dozen academic presentations on modeling and data mining aimed at understanding the impact of extreme weather on global communities. As you might imagine, these exercises require large data sets, bold assumptions, and extrapolations, some out to as far as year 2100.
Later in the day, I sat at the piano with a blog post about Debussy’s Arabesque No. 1 for solo piano, a popular piece known for its “impressionistic” qualities. The author of the blog did some analysis on melody, harmony, and rhythm that essentially was trying to get into Debussy’s head as he composed this piece.
The blog author teased out a melody buried in some arpeggios and then attempted to show how it becomes a motif throughout the piece. She admitted she couldn’t really know whether this melody was Debussy’s intent, but made an assumption that this certainly could have been what was going through Debussy’s head.
The academic data mining and modeling would probably be scary to those who aren’t comfortable with numerical modeling and methods; the analysis of Arabesque No. 1 would probably be scary to those not familiar with musical notation and compositional methods. The assumptions and extrapolations made in both cases could make nervous anyone familiar with both.
In both cases, a “specialist” is trying to gain insight into something that, for all practical purposes, is unknowable – Debussy’s thought process (even if subconscious) as he composed Arabesque No. 1 and economic and community impacts as the planet warms over the coming decades – and then convince an audience that they’ve indeed shed some light into a dark cave. And if we are to take either analysis as useful, others would have to validate the findings, or otherwise agree on the methodology, results, and conclusions.
Moral of this tale: Analysis isn’t “new knowledge,” regardless of what kind of notation accompanies it, until many other experts weigh in and many analyses converge on similar conclusions. And just because someone has credentials that brand him or her a specialist, doesn’t mean their analysis is more than a sophisticated opinion.
What really astounds me about listening to academic presentations these days (which I have been doing my entire career) is how few people, usually experts with as much background and experience on the topic as the presenter, actually question the results or methodology. This to me is dangerous at its core. Academia is where data and findings should be vigorously interrogated and debated. These days, technical presentations in general seem to be more of an advertising opportunity than a spark for debate towards achieving some consensus and contribution to the knowledge base.
Now here’s a excellent example of the importance of data frequency resolution! This New York Times article informs us about some ‘weird’ characteristics of the planet Uranus (apart from the juvenile fun you can have with the name).
But what’s even more fascinating, if you are a data geek, is the notion that […]
Now here’s a excellent example of the importance of data frequency resolution! This New York Times article informs us about some ‘weird’ characteristics of the planet Uranus (apart from the juvenile fun you can have with the name).
But what’s even more fascinating, if you are a data geek, is the notion that Uranus ejects “plasmoids” (a blob of plasma and magnetic fields, responsible for a planet’s atmosphere leaking away) was formulated just recently after space scientists went back into thirty year old data taken during Voyager 2’s 1986 journey, increased the resolution of the data from 8-minute averages to ~2 seconds. They detected what’s known as an anomaly in the planet’s magnetic field. You have to click on the NASA blog post referenced in the article to find this graph, below. The red is the average line; the black is the higher time frequency.
The plasmoid release occupies only 60 seconds of Voyager’s 45-hour long flight by Uranus, but has led to all kinds of interesting informed speculation about Uranus’ characteristics, especially compared to the other planets in our solar system. This “60 seconds” reminds me of what I vaguely recall learning in an anthropology class in college about constructing an entire hominid from a single tooth. (I thought it was Australopithicus but I wasn’t able to quickly confirm that.). Obviously, scientists will have to further validate their findings, either with a follow-on trip to the outer planets, or other means.
But the story certainly is an interesting lesson in data science. And I bet the scientists were itching to say Uranus burps, or even better, farts.
So much “painting by numbers” is done with numerical models. And the government is probably the largest consumer of such models. All models require assumptions, and as Commandment 2 in “Painting By Numbers” counsels, you must identify these assumptions to understand the results.
The need for assumptions gives policy-makers wide latitude to drive towards answers which […]
So much “painting by numbers” is done with numerical models. And the government is probably the largest consumer of such models. All models require assumptions, and as Commandment 2 in “Painting By Numbers” counsels, you must identify these assumptions to understand the results.
The need for assumptions gives policy-makers wide latitude to drive towards answers which support their policies. For example, the EPA under the Obama administration calculated the “social cost of carbon” as a value around $50/ton of carbon emitted. The EPA under the Trump administration managed to tweak the model so that the social cost of carbon (SCC) was more like $7/ton.
I wrote about this a while back in this space. Apparently, one thing you can do is select a different value for the internal rate of return (a financial parameter) in the model, according to a few references I read at the time.
Now here’s some fun: A paper I found surfing the web entitled “The Social Cost of Carbon Made Simple” shows one methodology for calculating it. By the way, this has got to be the most wrongly titled paper of 2010, the year it was published. There is nothing simple about it! Go on – click on it and read the first few pages. I dare you.
But the paper does acknowledge that a “…meta-analysis…found that the distribution of published SCC estimates spans several orders of magnitude and is heavily right-skewed: for the full sample, the median was $12, the mean was $43, and the 95th percentile was $150…” Moreover, the spread was as low as $1/ton.
See what I mean? If you want to de-emphasize carbon in your economic policies, you pick a methodology that minimizes SCC. If you want to build your policies around climate change, you pick a method that maximizes it. To the credit of the Obama administration, they settled on something close to the mean.
The paper is provisional work and nine years old, so don’t take it for any kind of gospel. I use it simply to illustrate points that require of the paper neither absolute accuracy or timeliness.
In an article (New York Times, March 27, 2020) titled “Trump’s Environmental Rollbacks Find Opposition From Within: Staff Scientists,” I read this: “In 2018, when the Environmental Protection Agency proposed reversing an Obama-era rule to limit climate-warming coal pollution, civil servants included analysis showing that by allowing more emissions, the new version of the rule would contribute to 1,400 premature deaths a year.”
I’m not going to dig deep and determine how they arrived at the number 1400, and anyway, the key to the sentence isn’t the number, it’s the word “contribute.” How many other factors “contribute to those premature deaths?
The article argues that Trump administration officials are not even trying to “tweak” the models, but instead have come in with a “repeal and replace” attitude “without relying on data, and science and facts.” It was reported that Obama’s head of the EPA, before she departed, had encouraged staffers to remain and make sure that EPA’s analyses have the “truth” put in there.
Unfortunately, numerical models don’t cough up the truth, just someone’s version of it. Those who don’t take the time understand all of this become victims reduced to parroting others’ versions of the truth. On the other hand, not even being willing to consider data and science and facts is completely wrong-headed. That is ignorance, as any model of human behavior will tell you.
Recent Posts
- What Debussy, data mining and modeling have in common…
- Turning Traditional Economics Inside Out
- C-IRA Poster for the International Conference on Complex Systems
- The lack of error and uncertainty analysis in our science and technical communications is as pernicious as the ‘partisan divide’
- It’s just not that hard: Earth Day at 50
Recent Comments
- jmakansi on When a Favorite Short Story Expands to a Novel…
- Ronald Gombach on When a Favorite Short Story Expands to a Novel…
- Kathy Schwadel on When a Favorite Short Story Expands to a Novel…
- jmakansi on So Vast the Prison: Takes No Prisoners Regarding the Universal Plight of Women
- Elena on So Vast the Prison: Takes No Prisoners Regarding the Universal Plight of Women
Archives
- September 2020
- August 2020
- July 2020
- April 2020
- March 2020
- July 2017
- June 2017
- April 2017
- March 2017
- January 2017
- July 2016
- May 2016
- November 2015
- October 2015
- August 2015
- May 2015
- March 2015
- January 2015
- November 2014
- September 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- August 2013
- July 2013
- June 2013
- April 2013
- February 2013
- January 2013
- November 2012
- October 2012
- September 2012
- August 2012
- March 2012
- November 2011
- October 2011
- July 2011
- June 2011
- December 2010
- November 2010
- March 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- Error gathering analytics data from Google: Error 404 (Not Found)!!1 *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px} 404. That’s an error. The requested URL /analytics/v2.4/data?ids=ga:66373148&metrics=ga:pageviews&filters=ga%3ApagePath%3D%7E%2Ftag%2Fdata-science%2F.%2A&start-date=2024-10-24&end-date=2024-11-23 was not found on this server. That’s all we know.
- Error gathering analytics data from Google: Error 404 (Not Found)!!1 *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px} 404. That’s an error. The requested URL /analytics/v2.4/data?ids=ga:66373148&dimensions=ga:date&metrics=ga:pageviews&filters=ga%3ApagePath%3D%7E%2Ftag%2Fdata-science%2F.%2A&start-date=2024-10-24&end-date=2024-11-23 was not found on this server. That’s all we know.