Want to save time? Skip down to "I’m looking to compile a thread on Internet Research"!
Opinionated Preamble:
There is a lot of high level thinking on Less Wrong, which is great. It’s done wonders to structure and optimize my own decisions. I think the political and futurology-related issues that Less Wrong cover can sometimes get out of sync with the reality and injustices of events in the immediate world. There are comprehensive treatments of how medical science is failing, or how academia cannot give unbiased results, and this is the milieu of programmers and philosophers in the middle-to-upper-class of the planet. I at least believe that this circle of awareness can be expanded, even if it’s treading into mind-killing territory. If anything I want to give people a near-mode sense of the stakes aside from x-risk: all in all the x-risk scenarios I’ve seen Less Wrong fear the most, kill humanity somewhat instantly. A slower descent into violence and poverty is to me much more horrifying,* because I might have to live in it and I don’t know how*. In a matter of fact, I have no idea of how to predict it.
This is one reason why I’m drawn to the Intelligence Operations performed by the military and crime units, among other things. Intelligence product delivery is about raw and immediate fact, and there is a lot of it. The problems featured in IntelOps are one of the few things rationality is good for—highly uncertain scenarios with one-off executions and messy or noisy feedback. Facts get lost in translation as messages are passed through, and of course the feeding and receiving fake facts are all a part of the job—but nevertheless, knowing everything everywhere is in the job description, and some form of rationality became a necessity.
It gets ugly. The demand for these kinds of skills often lie in industries that are highly competitive, violent, and illegal. I believe that once a close look is taken on how force and power is applied in practice then there isn’t any pretending anymore that human evils are an accident.
Open Source Intelligence, or "OSINT", is the mining of data and facts from public information databases, news articles, codebases, journals. Although the amount of classified data dwarfs the unclassified, the size and scope of the unclassified is responsible for a majority of intelligence reports—and thus is involved in the great majority of executive decisions made by government entities. It’s worth giving some thought as to how much that we know, that they do too. As illustrated in this expose, the processing of OSINT is a great big chunk of what modern intelligence is about aside from many other things. I think understanding how rationality as developed on Less Wrong can contribute to better IntelOps, and how IntelOps can feed the rationality community, would be awesome, but that’s a post for another time.
--
The Show
Through my investigations into IntelOps I’ve noticed the emphasis on search. Good search.
I’m looking to compile a thread on Internet Research.** **I’m wondering if there is any wisdom on Less Wrong that can be taken advantage of here on how to become more effective searchers. Here are some questions that could be answered specifically, but they are just guidelines—feel free to voice associated thoughts, we’re exploring here.
-
Before actually going out and searching, what would be the most effective way of drafting and optimizing a collection plan? Are there any formal optimization models that inform our distribution of time and attention? Exploration vs exploitation comes to mind, but it would be worth formulating something specific. I heard that the multi-armed bandit problem is solved?
-
Do you have any links or resources regarding more effective search?
-
Do you have any experiences regarding internet research that you can share? Any patterns that you’ve noticed that have made you more effective at searching?
-
What are examples of closed-source information that are low-hanging fruit in terms of access (e.g. academic journals)? What are possible strategies for acquiring closed source data (e.g. enrolling in small courses at universities, e-mailing researchers, cohesion via the law/Freedom of Information Act, social engineering etc)?
-
I would like to hear from SEOs and software developers on what their interpretation of semantic web technologies and how they are going to affect end-users. I am somewhat unfamiliar with the semantic web, but from my understanding information that could not be indexed is now indexed; and new ontologies will emerge as this information is mined. What should an end-user expect and what opportunities will there be that didn’t exist in the current generation of search?
That should be enough to get started. Below are some links that I have found useful with respect to Internet Research.
--
Meta-Search Engines or Assisted Search:
- Carrot—http://search.carrot2.org/stable/search (concept clustering search engine)
Summarizers:
-
TextTeaser—http://www.textteaser.com/ - SOURCE: https://github.com/MojoJolo/textteaser
-
Copernic (Commercial Summarizing Feed Program) - http://www.copernic.com/en/products/summarizer/
Bots/Collectors/Automatic Filters:
-
Google Alerts—http://www.google.ca/alerts
-
Change Detection—http://www.changedetection.com/
Compilations and Directories:
-
Directories and Search Engine Repository—http://rr.reuser.biz/index.html (probably the last one you’ll ever need.)
-
How to Perform Industry Research—http://businesslibrary.uflib.ufl.edu/industryresearch
Guides:
-
Google Guide—http://www.googleguide.com/ (with practice and tutorials)
-
From UC Berkeley—http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/FindInfo.html
-
"How to Solve Impossible Problems"—http://www.johntedesco.net/blog/2012/06/21/how-to-solve-impossible-problems-daniel-russells-awesome-google-search-techniques/
-
The NSA Guide to "Untangling the Web"; Internet Research—http://www.nsa.gov/public_info/_files/Untangling_the_Web.pdf [C. 2007]
-
Fravia’s Learnings on searching (value in essays) - http://search.lores.eu/indexo.htm [C. 1990s − 2009]
-
"Power Searching With Google" Course—http://www.powersearchingwithgoogle.com/
Practice:
-
SearchReSearch—http://searchresearch1.blogspot.ca/
-
A Google A Day—http://agoogleaday.com/
I don’t really care how you use this information, but I hope I’ve jogged some thinking of why it could be important.
This post by me might be relevant.
On the topic of academic journals: I’m graduating from college next year and I want to maintain access to journals without going to grad school immediately. If I were to pay for journal access myself, it would cost me about $20,000 a year to sustain my current reading habits. I’d like to cut that down to below $10,000 (strictly legally).
I’ve only come up with two options so far:
Convince a university library or department to sponsor me and give me remote access to the university network.
Enroll at a college that has good journal subscriptions and cheap tuition (and provides VPN or EZproxy access to students who never arrive on campus...). Do any of the colleges offering online degrees give network access?
I hope option 1 works out. Are there other options for cheap, legal journal access?
Comment
As far as I know at my own university the official alumni organisation provides alumni with the ability to VPN/proxy over the university.
http://www.deepdyve.com/ is also worth a look if you don’t have access to a university. A free account allows you to view journal articles for five minutes. There also a 40$/month professional plan that gives you longer access to 40 articles per month.
You could also pay a student to be able to use his VPN. I don’t know the legalities of it. It might be illegal. There might also be different laws in different countries.
http://www.reddit.com/r/scholar is a source where you can ask for specific journal articles. But I don’t know the legality of the endevour.
Comment
Great suggestions!
That’s a prety good investment if I can enroll at a university that offers VPN for alumni. My university doesn’t let alumni access the network, and I think from a quick search that most US univerities don’t because of license restrictions. I’ll check out universities in other countries.
Nice! This will be useful right now, so thanks for mentioning it. Unfortuntely, their journal selection is limited compared to a university library, and paper downloads are only 20% off the publisher price (and limited to 40 papers per month). I think I’ll try contacting them for custom bulk download plans.
Account sharing is not allowed at my university, and I think most schools in the US don’t allow it.
There’s also the Less Wrong help desk. Both are useful, but it takes time for a person to process requests, and neither are suitable for high-frequency requests.
Comment
/r/scholar is actually surprisingly fast on turnaround time. But it is questionably legal.
Some big-city public libraries (New York, Boston etc) have journal subscriptions.
I found that Google works well. It’s rare that I find an article I want to read and can’t find somewhere—maybe on the author’s web page, maybe copied to some public directory, maybe someplace else. If everything else fails and you really need it, the authors are usually happy to email you a copy upon request.
Comment
For you, perhaps. But for me… Well, I host 580 PDFs on gwern.net because they are not otherwise available publicly, and I link to >865 external PDFs (37 of which are Internet Archive or Dropbox, and would not be indexed in Google). So that’s easily a third of the articles which I use somewhere, I cannot simply find it online easily.
I agree, papers are often publicly available somewhere indexed by Google, but I think that happens for less than half the papers I access.
That’s a good point! However, authors are sometimes slow to respond, and most authors die (or, less drastically, some lose copies of and access to old papers).
Sorry this is a small nitpick. The main searchlores author is Fravia, not Favia. He was instrumental in providing a community and rallying point for various reversing groups. He was anonymous for quite some time, until he passed away in 2009.
Why, yes, I believe a fellow by the name of Edward Snowden is interested in that subject, too :-/
Other than that, I find the subject of the thread to be far to wide. Searching is different from collecting (you can run your own net spiders without too much problems). Searching for people information is different from searching for scientific information which is different from searching for "that thing about which I have a vague memory that it mentioned X and Y, maybe...".
Comment
I was mainly pointing out that the reliance on information that is accessible by most anybody is a benefit that levels the playing field, so to speak.
Added GoogleGuide—http://www.googleguide.com/ (with practice and tutorials)
Adding Carrot—a search engine which takes your query and creates dynamic clusters of websites that form around related concepts. It’s like a form of Google’s related searches that does the sorting for you. There are also visualizations that it can generate for you that allow proportionality comparisons.
This is an example query for ‘rationality’ and this one is Explore vs Exploit with a visualization on the side.
Remember not to overextend analytical techniques!
’"ACH is not appropriate for all types of decisions. It is used to analyze hypotheses about what is true or what is likely to happen. One might also want to evaluate alternative courses of action, such as alternative business strategies, which computer to buy, or where to retire. In such cases, this software is of limited value. The ACH matrix can be used to break such a decision problem down into its component parts, with alternative choices (comparable to hypotheses) across the top of the matrix and goals or values to be maximized by making the right choice (comparable to evidence) down the side. However, this type of analysis requires a different type of calculation. The principle of refuting hypotheses (in this case alternative courses of action) cannot be applied to a decision based on goals or personal preferences. One would need a more traditional analysis of the pros and cons of each alternative.
Original source: ACH manual by Richards Heuer
"(ACH) prods you to look for additional evidence you had not realized was relevant, helps you question assumptions, identifies the most lucrative future areas of investigation, and generally stimulates systematic and creative thinking about the issue at hand. "
— Richards Heuer, ACH Creator"″