Get Almost Duplicate Books And Put Them Adjacent To Each OtherSearch For Three Consecutive WordsCollect files...
How can an F-22 Raptor reach supersonic speeds without having supersonic inlets?
New coworker has strange workplace requirements - how should I deal with them?
Using font to highlight a god's speech in dialogue
Why are direct proofs often considered better than indirect proofs?
Can a country avoid prosecution for crimes against humanity by denying it happened?
Function of the separated, individual solar cells on Telstar 1 and 2? Why were they "special"?
Why do fuses burn at a specific current?
'spazieren' - walking in a silly and affected manner?
Am I required to correct my opponent's assumptions about my morph creatures?
Divide Numbers by 0
Ways you can end up paying interest on a credit card if you pay the full amount back in due time
The 7-numbers crossword
When making yogurt, why doesn't bad bacteria grow as well?
How did Gollum know Sauron was gathering the Haradrim to make war?
Can Russians naturally pronounce "попал в бесперспективняк"?
D Scale Question
Why don't they build airplanes from 3D printer plastic?
Datasets of Large Molecules
How to solve this inequality , when there is a irrational power?
Would there be balance issues if I allowed opportunity attacks against any creature, not just hostile ones?
Is there anything in the universe that cannot be compressed?
garage light with two hots and one neutral
Is torque as fundamental a concept as force?
How can I modify a line which contains 2nd occurence of a string?
Get Almost Duplicate Books And Put Them Adjacent To Each Other
Search For Three Consecutive WordsCollect files from several different directories and put them in one placeload bash comands from file one per line and execute them for each file in a directoryGet lines matching a pattern in one file and put them into a second file matching the same patternread phone numbers from file and store them in other file uniquelyHowto split stdin into multiple multiline strings and put each of them into different bash variablesRead file line by line and put in other stringHow to compare duplicate files in the same directory and hardlink them together.BRD and .MIX files that get deleted. Creating themHow to separate each line of text by comma and put them in an array?Get filenames and write them on place of previous namespace
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
Previously, I asked the question Search For Three Consecutive Words to get almost similar books from a booklist. The idea was, if two strings have three similar consecutive words then they will be considered almost duplicate.
I got a good solution from there. The solution is given bellow.
I am using the following AWK Script (script.awk
).
NR == FNR {
gsub("[[:punct:]]", "")
for (i = 3; i <= NF; ++i)
w[$(i-2),$(i-1),$i]++
next
}
{
orig = $0
gsub("[[:punct:]]", "")
for (i = 3; i <= NF; ++i)
if (w[$(i-2),$(i-1),$i] > 1) {
print orig
next
}
}
The input data (TestData.txt
) is -
$ cat TestData.txt
7L: The Seven Levels of Communication
Numbers Guide: The Essentials of Business Numeracy by Richard Stutely
The MVP Machine: How Baseball's New Nonconformists Are Using Data to Build Better Players
Freakonomics: A Rogue Economist Explores the Hidden Side of Everything by Steven Levitt
Superfreakonomics by Steven Levitt
Moneyball by Michael Lewis
Money Master the Game by Tony Robbinson
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values
Impossible to Inevitable by Jason Lemkin
How to Sell Your Way Through Life by Napoleon Hill
Venture Deals by Brad Feld & Jason Mendelson
Envisioning the Survey Interview of the Future
Brave Leadership: Unleash Your Most Confident, Powerful, and Authentic Self to Get the Results You Need
Dealers of Lightning: Xerox PARC and the Dawn of the Computer Age
The Seven Levels of Communication: Go From Relationships to Referrals by Michael J. Maher
How to Be a Power Connector: The 5+50+100 Rule for Turning Your Business Network into Profits by Judy Robinett
Lean Startup by Eric Ries
The E-Myth Revisited
The Power of Broke
The Four Steps to the Epiphany by Steve Blank
The Art of the Start
Growth Juice by John A. Weber
Man's Worldly Goods: The Story of the Wealth of Nations by Leo Huberman
The Wealth of Nations by Adam Smith
A History of Central Banking and the Enslavement of Mankind
A History of Money and Banking in the United States: The Colonial Era to World War II
The History of Banking: The History of Banking and How the World of Finance Became What it is Today
The Federal Reserve: What Everyone Needs to Know
The Federal Reserve and its Founders: Money, Politics, and Power
America's Bank: The Epic Struggle to Create the Federal Reserve
The Power and Independence of the Federal Reserve
America's Money Machine: The Story of the Federal Reserve
Too Big to Fail by Andrew Ross Sorkin
Business - Later - Read Review To Confirm
Blogging for Your Business
Liar’s Poker by Michael Lewis
Sensemaking: The Power of the Humanities in the Age of the Algorithm, by Christian Madsbjerg.
Giftology: The Art and Science of Using Gifts to Cut Through the Noise, Increase Referrals, and Strengthen Retention, by John Ruhlin.
Getting Real by the people at Basecamp
Venture Deals by Brad Feld
To get the duplicate books I am giving the command awk -f script.awk TestData.txt TestData.txt
.
The output is -
$ awk -f script.awk TestData.txt TestData.txt
7L: The Seven Levels of Communication
Freakonomics: A Rogue Economist Explores the Hidden Side of Everything by Steven Levitt
Superfreakonomics by Steven Levitt
Moneyball by Michael Lewis
Venture Deals by Brad Feld & Jason Mendelson
The Seven Levels of Communication: Go From Relationships to Referrals by Michael J. Maher
The Power of Broke
Man's Worldly Goods: The Story of the Wealth of Nations by Leo Huberman
The Wealth of Nations by Adam Smith
A History of Central Banking and the Enslavement of Mankind
A History of Money and Banking in the United States: The Colonial Era to World War II
The History of Banking: The History of Banking and How the World of Finance Became What it is Today
The Federal Reserve: What Everyone Needs to Know
The Federal Reserve and its Founders: Money, Politics, and Power
America's Bank: The Epic Struggle to Create the Federal Reserve
The Power and Independence of the Federal Reserve
America's Money Machine: The Story of the Federal Reserve
Liar’s Poker by Michael Lewis
Sensemaking: The Power of the Humanities in the Age of the Algorithm, by Christian Madsbjerg.
Venture Deals by Brad Feld
However, I have a little problem. The Problem is -
Here,
7L: The Seven Levels of Communication AND The Seven Levels of Communication: Go From Relationships to Referrals by Michael J. Maher
are almost duplicate and should be together.
Again,
Moneyball by Michael Lewis AND Liar’s Poker by Michael Lewis
should be together.
Once More,
Venture Deals by Brad Feld & Jason Mendelson AND Venture Deals by Brad Feld
should be together. But they are not. You get the idea :)
What can I do so that the almost similar books are adjacent to each other.
bash shell-script awk
add a comment |
Previously, I asked the question Search For Three Consecutive Words to get almost similar books from a booklist. The idea was, if two strings have three similar consecutive words then they will be considered almost duplicate.
I got a good solution from there. The solution is given bellow.
I am using the following AWK Script (script.awk
).
NR == FNR {
gsub("[[:punct:]]", "")
for (i = 3; i <= NF; ++i)
w[$(i-2),$(i-1),$i]++
next
}
{
orig = $0
gsub("[[:punct:]]", "")
for (i = 3; i <= NF; ++i)
if (w[$(i-2),$(i-1),$i] > 1) {
print orig
next
}
}
The input data (TestData.txt
) is -
$ cat TestData.txt
7L: The Seven Levels of Communication
Numbers Guide: The Essentials of Business Numeracy by Richard Stutely
The MVP Machine: How Baseball's New Nonconformists Are Using Data to Build Better Players
Freakonomics: A Rogue Economist Explores the Hidden Side of Everything by Steven Levitt
Superfreakonomics by Steven Levitt
Moneyball by Michael Lewis
Money Master the Game by Tony Robbinson
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values
Impossible to Inevitable by Jason Lemkin
How to Sell Your Way Through Life by Napoleon Hill
Venture Deals by Brad Feld & Jason Mendelson
Envisioning the Survey Interview of the Future
Brave Leadership: Unleash Your Most Confident, Powerful, and Authentic Self to Get the Results You Need
Dealers of Lightning: Xerox PARC and the Dawn of the Computer Age
The Seven Levels of Communication: Go From Relationships to Referrals by Michael J. Maher
How to Be a Power Connector: The 5+50+100 Rule for Turning Your Business Network into Profits by Judy Robinett
Lean Startup by Eric Ries
The E-Myth Revisited
The Power of Broke
The Four Steps to the Epiphany by Steve Blank
The Art of the Start
Growth Juice by John A. Weber
Man's Worldly Goods: The Story of the Wealth of Nations by Leo Huberman
The Wealth of Nations by Adam Smith
A History of Central Banking and the Enslavement of Mankind
A History of Money and Banking in the United States: The Colonial Era to World War II
The History of Banking: The History of Banking and How the World of Finance Became What it is Today
The Federal Reserve: What Everyone Needs to Know
The Federal Reserve and its Founders: Money, Politics, and Power
America's Bank: The Epic Struggle to Create the Federal Reserve
The Power and Independence of the Federal Reserve
America's Money Machine: The Story of the Federal Reserve
Too Big to Fail by Andrew Ross Sorkin
Business - Later - Read Review To Confirm
Blogging for Your Business
Liar’s Poker by Michael Lewis
Sensemaking: The Power of the Humanities in the Age of the Algorithm, by Christian Madsbjerg.
Giftology: The Art and Science of Using Gifts to Cut Through the Noise, Increase Referrals, and Strengthen Retention, by John Ruhlin.
Getting Real by the people at Basecamp
Venture Deals by Brad Feld
To get the duplicate books I am giving the command awk -f script.awk TestData.txt TestData.txt
.
The output is -
$ awk -f script.awk TestData.txt TestData.txt
7L: The Seven Levels of Communication
Freakonomics: A Rogue Economist Explores the Hidden Side of Everything by Steven Levitt
Superfreakonomics by Steven Levitt
Moneyball by Michael Lewis
Venture Deals by Brad Feld & Jason Mendelson
The Seven Levels of Communication: Go From Relationships to Referrals by Michael J. Maher
The Power of Broke
Man's Worldly Goods: The Story of the Wealth of Nations by Leo Huberman
The Wealth of Nations by Adam Smith
A History of Central Banking and the Enslavement of Mankind
A History of Money and Banking in the United States: The Colonial Era to World War II
The History of Banking: The History of Banking and How the World of Finance Became What it is Today
The Federal Reserve: What Everyone Needs to Know
The Federal Reserve and its Founders: Money, Politics, and Power
America's Bank: The Epic Struggle to Create the Federal Reserve
The Power and Independence of the Federal Reserve
America's Money Machine: The Story of the Federal Reserve
Liar’s Poker by Michael Lewis
Sensemaking: The Power of the Humanities in the Age of the Algorithm, by Christian Madsbjerg.
Venture Deals by Brad Feld
However, I have a little problem. The Problem is -
Here,
7L: The Seven Levels of Communication AND The Seven Levels of Communication: Go From Relationships to Referrals by Michael J. Maher
are almost duplicate and should be together.
Again,
Moneyball by Michael Lewis AND Liar’s Poker by Michael Lewis
should be together.
Once More,
Venture Deals by Brad Feld & Jason Mendelson AND Venture Deals by Brad Feld
should be together. But they are not. You get the idea :)
What can I do so that the almost similar books are adjacent to each other.
bash shell-script awk
add a comment |
Previously, I asked the question Search For Three Consecutive Words to get almost similar books from a booklist. The idea was, if two strings have three similar consecutive words then they will be considered almost duplicate.
I got a good solution from there. The solution is given bellow.
I am using the following AWK Script (script.awk
).
NR == FNR {
gsub("[[:punct:]]", "")
for (i = 3; i <= NF; ++i)
w[$(i-2),$(i-1),$i]++
next
}
{
orig = $0
gsub("[[:punct:]]", "")
for (i = 3; i <= NF; ++i)
if (w[$(i-2),$(i-1),$i] > 1) {
print orig
next
}
}
The input data (TestData.txt
) is -
$ cat TestData.txt
7L: The Seven Levels of Communication
Numbers Guide: The Essentials of Business Numeracy by Richard Stutely
The MVP Machine: How Baseball's New Nonconformists Are Using Data to Build Better Players
Freakonomics: A Rogue Economist Explores the Hidden Side of Everything by Steven Levitt
Superfreakonomics by Steven Levitt
Moneyball by Michael Lewis
Money Master the Game by Tony Robbinson
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values
Impossible to Inevitable by Jason Lemkin
How to Sell Your Way Through Life by Napoleon Hill
Venture Deals by Brad Feld & Jason Mendelson
Envisioning the Survey Interview of the Future
Brave Leadership: Unleash Your Most Confident, Powerful, and Authentic Self to Get the Results You Need
Dealers of Lightning: Xerox PARC and the Dawn of the Computer Age
The Seven Levels of Communication: Go From Relationships to Referrals by Michael J. Maher
How to Be a Power Connector: The 5+50+100 Rule for Turning Your Business Network into Profits by Judy Robinett
Lean Startup by Eric Ries
The E-Myth Revisited
The Power of Broke
The Four Steps to the Epiphany by Steve Blank
The Art of the Start
Growth Juice by John A. Weber
Man's Worldly Goods: The Story of the Wealth of Nations by Leo Huberman
The Wealth of Nations by Adam Smith
A History of Central Banking and the Enslavement of Mankind
A History of Money and Banking in the United States: The Colonial Era to World War II
The History of Banking: The History of Banking and How the World of Finance Became What it is Today
The Federal Reserve: What Everyone Needs to Know
The Federal Reserve and its Founders: Money, Politics, and Power
America's Bank: The Epic Struggle to Create the Federal Reserve
The Power and Independence of the Federal Reserve
America's Money Machine: The Story of the Federal Reserve
Too Big to Fail by Andrew Ross Sorkin
Business - Later - Read Review To Confirm
Blogging for Your Business
Liar’s Poker by Michael Lewis
Sensemaking: The Power of the Humanities in the Age of the Algorithm, by Christian Madsbjerg.
Giftology: The Art and Science of Using Gifts to Cut Through the Noise, Increase Referrals, and Strengthen Retention, by John Ruhlin.
Getting Real by the people at Basecamp
Venture Deals by Brad Feld
To get the duplicate books I am giving the command awk -f script.awk TestData.txt TestData.txt
.
The output is -
$ awk -f script.awk TestData.txt TestData.txt
7L: The Seven Levels of Communication
Freakonomics: A Rogue Economist Explores the Hidden Side of Everything by Steven Levitt
Superfreakonomics by Steven Levitt
Moneyball by Michael Lewis
Venture Deals by Brad Feld & Jason Mendelson
The Seven Levels of Communication: Go From Relationships to Referrals by Michael J. Maher
The Power of Broke
Man's Worldly Goods: The Story of the Wealth of Nations by Leo Huberman
The Wealth of Nations by Adam Smith
A History of Central Banking and the Enslavement of Mankind
A History of Money and Banking in the United States: The Colonial Era to World War II
The History of Banking: The History of Banking and How the World of Finance Became What it is Today
The Federal Reserve: What Everyone Needs to Know
The Federal Reserve and its Founders: Money, Politics, and Power
America's Bank: The Epic Struggle to Create the Federal Reserve
The Power and Independence of the Federal Reserve
America's Money Machine: The Story of the Federal Reserve
Liar’s Poker by Michael Lewis
Sensemaking: The Power of the Humanities in the Age of the Algorithm, by Christian Madsbjerg.
Venture Deals by Brad Feld
However, I have a little problem. The Problem is -
Here,
7L: The Seven Levels of Communication AND The Seven Levels of Communication: Go From Relationships to Referrals by Michael J. Maher
are almost duplicate and should be together.
Again,
Moneyball by Michael Lewis AND Liar’s Poker by Michael Lewis
should be together.
Once More,
Venture Deals by Brad Feld & Jason Mendelson AND Venture Deals by Brad Feld
should be together. But they are not. You get the idea :)
What can I do so that the almost similar books are adjacent to each other.
bash shell-script awk
Previously, I asked the question Search For Three Consecutive Words to get almost similar books from a booklist. The idea was, if two strings have three similar consecutive words then they will be considered almost duplicate.
I got a good solution from there. The solution is given bellow.
I am using the following AWK Script (script.awk
).
NR == FNR {
gsub("[[:punct:]]", "")
for (i = 3; i <= NF; ++i)
w[$(i-2),$(i-1),$i]++
next
}
{
orig = $0
gsub("[[:punct:]]", "")
for (i = 3; i <= NF; ++i)
if (w[$(i-2),$(i-1),$i] > 1) {
print orig
next
}
}
The input data (TestData.txt
) is -
$ cat TestData.txt
7L: The Seven Levels of Communication
Numbers Guide: The Essentials of Business Numeracy by Richard Stutely
The MVP Machine: How Baseball's New Nonconformists Are Using Data to Build Better Players
Freakonomics: A Rogue Economist Explores the Hidden Side of Everything by Steven Levitt
Superfreakonomics by Steven Levitt
Moneyball by Michael Lewis
Money Master the Game by Tony Robbinson
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values
Impossible to Inevitable by Jason Lemkin
How to Sell Your Way Through Life by Napoleon Hill
Venture Deals by Brad Feld & Jason Mendelson
Envisioning the Survey Interview of the Future
Brave Leadership: Unleash Your Most Confident, Powerful, and Authentic Self to Get the Results You Need
Dealers of Lightning: Xerox PARC and the Dawn of the Computer Age
The Seven Levels of Communication: Go From Relationships to Referrals by Michael J. Maher
How to Be a Power Connector: The 5+50+100 Rule for Turning Your Business Network into Profits by Judy Robinett
Lean Startup by Eric Ries
The E-Myth Revisited
The Power of Broke
The Four Steps to the Epiphany by Steve Blank
The Art of the Start
Growth Juice by John A. Weber
Man's Worldly Goods: The Story of the Wealth of Nations by Leo Huberman
The Wealth of Nations by Adam Smith
A History of Central Banking and the Enslavement of Mankind
A History of Money and Banking in the United States: The Colonial Era to World War II
The History of Banking: The History of Banking and How the World of Finance Became What it is Today
The Federal Reserve: What Everyone Needs to Know
The Federal Reserve and its Founders: Money, Politics, and Power
America's Bank: The Epic Struggle to Create the Federal Reserve
The Power and Independence of the Federal Reserve
America's Money Machine: The Story of the Federal Reserve
Too Big to Fail by Andrew Ross Sorkin
Business - Later - Read Review To Confirm
Blogging for Your Business
Liar’s Poker by Michael Lewis
Sensemaking: The Power of the Humanities in the Age of the Algorithm, by Christian Madsbjerg.
Giftology: The Art and Science of Using Gifts to Cut Through the Noise, Increase Referrals, and Strengthen Retention, by John Ruhlin.
Getting Real by the people at Basecamp
Venture Deals by Brad Feld
To get the duplicate books I am giving the command awk -f script.awk TestData.txt TestData.txt
.
The output is -
$ awk -f script.awk TestData.txt TestData.txt
7L: The Seven Levels of Communication
Freakonomics: A Rogue Economist Explores the Hidden Side of Everything by Steven Levitt
Superfreakonomics by Steven Levitt
Moneyball by Michael Lewis
Venture Deals by Brad Feld & Jason Mendelson
The Seven Levels of Communication: Go From Relationships to Referrals by Michael J. Maher
The Power of Broke
Man's Worldly Goods: The Story of the Wealth of Nations by Leo Huberman
The Wealth of Nations by Adam Smith
A History of Central Banking and the Enslavement of Mankind
A History of Money and Banking in the United States: The Colonial Era to World War II
The History of Banking: The History of Banking and How the World of Finance Became What it is Today
The Federal Reserve: What Everyone Needs to Know
The Federal Reserve and its Founders: Money, Politics, and Power
America's Bank: The Epic Struggle to Create the Federal Reserve
The Power and Independence of the Federal Reserve
America's Money Machine: The Story of the Federal Reserve
Liar’s Poker by Michael Lewis
Sensemaking: The Power of the Humanities in the Age of the Algorithm, by Christian Madsbjerg.
Venture Deals by Brad Feld
However, I have a little problem. The Problem is -
Here,
7L: The Seven Levels of Communication AND The Seven Levels of Communication: Go From Relationships to Referrals by Michael J. Maher
are almost duplicate and should be together.
Again,
Moneyball by Michael Lewis AND Liar’s Poker by Michael Lewis
should be together.
Once More,
Venture Deals by Brad Feld & Jason Mendelson AND Venture Deals by Brad Feld
should be together. But they are not. You get the idea :)
What can I do so that the almost similar books are adjacent to each other.
bash shell-script awk
bash shell-script awk
asked 18 mins ago
bluerayblueray
2221 silver badge8 bronze badges
2221 silver badge8 bronze badges
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f538654%2fget-almost-duplicate-books-and-put-them-adjacent-to-each-other%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f538654%2fget-almost-duplicate-books-and-put-them-adjacent-to-each-other%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown