Showing posts with label Find. Show all posts
Showing posts with label Find. Show all posts

Saturday, March 13, 2021

Linux Find Command with 'or'



On a number of occasions I found it necessary to locate a list of files by extension (e.g. header/source files) and found that if I needed to locate files of multiple extensions I took the 'cowards way out', executing two 'find' commands and appending the results to a temporary output file, then using the output file as input to the next command in the pipeline (e.g. grep).

e.g.

$ find . -type f -name "*.c*" > /tmp/junk

$ find . -type f -name "*.h*" >> /tmp/junk

$ grep -l SetEvent `cat /tmp/junk`


Every single time I do this, one of the voices in my head will mock me in the manner of Monty Python, apparently one of the voices in my head is from medieval France with a talent for hurling insults.


So, today we begin a short journey to the correct way to utilize the 'or' functionality in a Unix find command.



$ find . -type f \( -name "*.c*" -o -name "*.h*" \) -exec grep -l SetEvent {} \;



The above command will locate all header and implementation files (e.g. *.h, *.hpp, *.c, *.cpp...) and return a list of files that contain the SetEvent expression within them. Note the parenthesis are significant to ensure the or'd list is piped to the exec command, otherwise only the second extension list with run through the exec command.


Cheers.

Processing Large Quantities of Files with Find/Exec

Photo by Markus Winkler from Pexels

 

I've always found the find command to be incredibly useful, but using with the exec command powerful but frustrating and confusing.  Often, patience runs thin and rather than take the time to learn how to effectively use find/exec for complex problems I bunt, returning to creating a tailor-made one-off bash script.  In the end, mission accomplished, but I always feel disappointed to have to revert to a custom bash script when I know in my heart-of-hearts its accomplishable quickly if I only knew how to do it.

Today is the day, I'm gonna spend some time to better understand how to use find/exec for some repeatedly necessary types of problems.

Let's start with an easy case, one that's less common, but really easy to accomplish.  We'll build from there.

Copying Files To New File Name (Prepending/Appending)

Let's say you have a list of files that you want to rename by pre-pending, or appending a substring.  For instance, say you have a hierarchy of directories with image files that you wish to copy to a *.backup filename;

$ find . -name "*.jpg" -exec cp {} {}.backup \;

The above command will find all *.jpg named files, for each file execute 'cp <filename> <filename>.backup'.  In other words, when finding image01.jpg the exec command would be cp image01.jpg image01.jpg.backup.  This would be done for every encountered file that satisfies the regex.

Prepending a string in a similar manner could be done by:

$ find . -name "*.jpg" -exec cp {} backup-{} \;

In this case image01.jpg would be copied to backup-image01.jpg.

Simple, fast, but not particularly useful if you're particular about the destination file names.

Replacing File Extension 

A bit more practical scenario is to want to change file extensions.  For example, say you really prefer *.jpg but have a series of files named *.jpeg.

This one is a bit trickier, takes a little more expertise, but can readily be accomplished and understood with a bit of time.

$ find . -name "*.jpeg" -exec sh -c 'mv "$0" "${0%.jpeg}.jpg"' {} \;

The simple filename substitution (e.g. {}) just doesn't cut it like the previous example because we wish to manipulate the filename.  So, we inline a shell command, one that is capable of using the incoming file name as is (e.g. $0) and able to manipulate it (changing .jpeg to .jpg).  That's the brief, lets dig a bit into it to better understand what's going on.

Incoming filenames sent to the shell script will be called *.jpeg (guaranteed by the find regex).  The filename comes in as a parameter (e.g. $0) to the shell script so the first 1/2 of the move command could be 'mv file01.jpeg ...'.  It may be worth pointing out, those that author shell commands may be familiar with $0 being the script name and the first argument be $1, but for an inline shell script, the first argument will be $0 as we are using it.

How about the seconds 1/2 of the shell command; while it looks like Snoopy dropping curse words it genuinely is meaningful.


The "${0%.jpeg}.jpg" is comprised of two parts; the first part ${0%.jpeg} is a variable pattern substitution;

${var%Pattern} Remove from $var the shortest part of $Pattern that matches the back end of $var

refer to this for details: https://tldp.org/LDP/abs/html/parameter-substitution.html

Simply put, take $0 (the filename) and grab everything up to .jpeg, image01.jpeg would be expanded to image01.

The second 1/2 of the expression simply re-adds .jpg, so the whole expression of image01.jpeg would be image01.jpg.  With the existing and new file names now available, pairing them with a mv command and you're in business.

Removal of Spaces in File Names

 Ugh, I'd rather step in dog shit barefoot than have spaces in my file names.  I know, it's an irrational hatred but there it is.  Filenames are the bane of scripting, while they can be addressed, a simple 3 line shell script quickly becomes immensely more complex when dealing with filenames w/spaces.  But, like bedbugs, any exposure to the outside world likely will bring them into your system.  So, you need to be prepared to either live with them or a quick means to remove them from the filenames.  My latest headache was downloading a series of video files from a MOOC, resulting in files of the form 'index 1.mp4'.  So, we will extend on our above example, but utilize bash (rather than sh) to gain some substitution features.  The "${0/ /_}" (a space between the '/' pairs) means replace all instances of spaces with '_'

        $ find . -name "*.html" -exec bash -c 'mv "${0/ /_}"' {} \;

Massive Media Conversion in One Command

While we've been focusing on shell scripts using 'cp' or 'mv' commands, we aren't limited to easy commands, let's say we wished to convert a hierarchical folder structure of AVI files that we want to reencode as MP4 files.

$ find . -name "*.avi" -exec sh -c 'ffmpeg -i "$0" -acodec copy "${0%.avi}.mp4"' {} \;

Cut that puppy loose on your computer and come back to a newly created list of MP4 files.

 

Hope this helps some of you.  I feel I understand the use of exec better having worked through this.  Cheers.

Dealing With Filenames Containing Spaces

Photo by XXSS IS BACK from Pexels

 

Since the Internet is used by a variety of users, operating systems and naming conventions it's not uncommon to get filenames that contain spaces.  Trouble is, this can wreak havoc with simple scripts that don't anticipate filenames with spaces.


for example;
With a folder containing files like this:
lipeltgm@kaylee:/var/tmp$ ls *ts
Nokomis Track - Minnebar15 - Oct 10th – Crowdcast-TohCuy6dhE00Y02iSoGYDo75LAQVfEmKmk (1).ts
Nokomis Track - Minnebar15 - Oct 10th – Crowdcast-TohCuy6dhE00Y02iSoGYDo75LAQVfEmKmk.ts
Nokomis Track - Minnebar15 - Oct 13th – Crowdcast-Qcgrm8WtcFKjmHYGs00Bg01Hy2HkkmmZYs.ts
Nokomis Track - Minnebar15 - Oct 6th – Crowdcast-OmLjaiEpTLslbQ9GHLmKITTSagJ00COuy.ts
Nokomis Track - Minnebar15 - Oct 8th – Crowdcast-JyWZ7lzwvH5MZSuGvp2uE49DVpeCVO3r.ts
Phalen Track Backup - Minnebar15 - Oct 13th – Crowdcast-dozLLcdmIlG6Bqdx2i99eNO8giYjpYsN.ts
Phalen Track - Minnebar15 - Oct 10th – Crowdcast-lzz9BX00V8cswvC2csSvecrjV00w1o80201G.ts
Phalen Track - Minnebar15 - Oct 13th – Crowdcast-vod_master.ts
Phalen Track - Minnebar15 - Oct 6th – Crowdcast-m7ItulACMOv6a6jhQuswa1nwseStEsfj.ts
Phalen Track - Minnebar15 - Oct 8th – Crowdcast-3xMlXLl8kVgTBVmbw602c78acyLDJpLcM.ts

A simple script, as follows, will interpret each entry as space/eol seperated.  Yes, you can address it by changing the delimiter, but another option is to simply rename the files to something more expected.

$ cat /tmp/go
for file in `find . -name "*.ts"`; do
echo $file
done

 The following find command will locate files with spaces in their names, changing the spaces to underscores ('_') to resemble something a bit easier to work with.

$ find . -type f -name "* *" | while read file; do mv "$file" ${file// /_}; done

Hope this helps.

Monday, April 1, 2019

Linux Find -- Your Tool To Searching For Dinosaur Bones


Imagine that you have a Swiss Army knife, but you only use the file on it.  Punch a hole in a leather belt, smooth out a broken nail, opening a can of beans, or filleting a trout….chances are you can get the job done with it, but you’re certainly making your job harder than it needs to be.

That’s a pretty fair analogy to how I’ve used the ‘find’ command for the majority of my life.  ‘Shoulder surfing’ of colleagues and I feel that’s the norm.  I’ve only recently (say over the past couple years) began using some of the other find ‘blades’ which can really make your job easier.

Searching for files satisfying a file (or directory) name and finding files of a particular type (e.g. file, directory) are certainly the most common uses for using find.

$ find . –name “*.cpp”
$ find . –iname “*main*.cpp”
$ find . –type f

Often, you want to act on the file list, like grepping for a specific string in each of the files.  For yeeeeeeeeeears I did that by passing the results of the find command to a new command line.  For example, if I was interested in locating the main function declarations in any C++ files found from the current directory I’d do it by either:


$ grep –l “main“ `find . –name “*.cpp”`
$ grep –l “main” $(find . –name “*.cpp”)

*shrug*; so….what’s wrong with that?  Whelp…a couple things: 1) its more complex than it needs to be and 2) it’s not uncommon for the results of find to exceed the command line length limits.  So, why in the world did I do that for literally DECADES?  Simple…I didn’t know any better and frankly the ‘-exec’ option confused my little ‘ol knowledge nugget. 
Understanding the ‘-exec’ subcommand will pay dividends almost immediately, once you get over the confusing syntax.  The subcommand takes the form ‘-exec somecommand {} \;’, the results of the find command will be substituted for the brackets.  The following ‘\;’ indicates the end of the command chain….just get in the habit of slapping it on the end for now.
So, the equivalent of the previous commands would take the form:


$ find . –name “*.cpp” –exec grep –l “main” {} \;

Suppose you have 2 files that satisfy this find: file1.cpp & file2.cpp, notionally this would result in the equivalent of ‘grep –l “main” file1.cpp file2.cpp’.
Well, that doesn’t seem much simpler….why bother?  Let’s say you’re interested in searching for header and implementation files (e.g. *.h & *.cpp).  You could certainly accomplish this without the exec subcommand:


$ grep –l “main“ `find . –name “*.cpp”` `find . –name “*.h”`
$ grep –l “main” $(find . –name “*.cpp”) $(find . –name “*.h”)

The equivalent using the exec subcommand would be:


$ find . \( -name "*.cpp" -o -name "*.h" \) -exec grep -l "main" {} \;

The first half specifies a search criteria of all files satisfying “*.h” OR “*.cpp”.

Last bit, consider how you’d find all files that reference ‘main’ but aren’t headers or implementation files?  Extremely simple change for the exec subcommand example, consider the complexity of not using exec.



$ find . –not \( -name "*.cpp" -o -name "*.h" \) -exec grep -l "main" {} \;

If you haven’t already done so, start using the special blades of your find command.

Friday, November 4, 2016

Unix Find Command -- Where'd I Put My Keys

I'll be the first to admit, I'm a bit of a dinosaur with respect to usage of IDEs.  While I'm well versed in Eclipse and other IDE's, I still find myself more efficient when using a multi-window environment, specifically 3-4 editor sessions along side debugging and execution windows.  In my native habitat, you'll find me running Linux with 3-4 terminals each with specific source files open (I'm not a tabbed session window man myself), one terminal tail'ing a redirected output file, and one where I run the application redirecting to the output file.  Certainly this depends on what I'm working on, but that's a pretty reasonable representation of how I roll.

So what?  Well, because I don't use IDEs I don't have the luxury of 'Open Declaration' style features isn't my jam.  In the absence of IntelliSense style functionality, I rely on the Unix find command to locate my next source code victim.  Despite that, I've had a poor understanding on how to properly use the find command, especially for more sophisticated commands.

For years, I had zero understanding of the ACTIONS options.  Casual observation of co-workers over the years seems to imply that many don't, resulting in clumsy, half-azzed usage of it.  Few understand the ramifications, specifically whenever you misuse the find command, it causes a puppy to cry.  So take heed my tech warrior and prevent the crying of puppies.


So let's set the stage for some find commands.
$ echo "hello" > hello.txt
$ echo "hello world" > helloworld.txt
$ mkdir subdir

$ echo "hello\nworld" > subdir/helloworld2.txt


So as an example of a clumsy misuse of find, for years if I wanted to display the contents of all these text files I'd do something like this:
$ more `find . -name "*.txt"`
::::::::::::::
./subdir/helloworld2.txt
::::::::::::::
hello\nworld
::::::::::::::
./helloworld.txt
::::::::::::::
hello world
::::::::::::::
./hello.txt
::::::::::::::
hello


Similarly, if I wanted to edit each of the files the clumsy command would take the form of:
$ vi `find . -name "*.txt"`


Ugh.

A key contributor to years of misuse is the God-awful syntax for the action commands, a more confusing syntax I've never met.  Albeit, incredibly powerful, but seriously.....WTF?

The first part of the command line is reasonable, straight-forward and typical:
$ find [starting-point] [expression]

$ find . -name "*.txt" will search the current directory and subdirectories for file names that match '*.txt'; straight-forward.

Say you want to display the contents of each of the text files like above, proper form is:
$ find . -name "*.txt" -exec more {} \;

Seriously; '{}' & '\;', what fresh hell is this?

I think a turning point for me was the day that I quit quibbling on how ridiculous the syntax is and simply accepted it; Admission, Surrender & Acceptance.

The '-exec' optional parameter takes the general form of '-exec [command] {} \;'.  Above we want to run more on each of the files, the list of files is represented by the brackets '{}', the delimited ';' implies it's the end of the command parameters.

Conceptually, the find . -name "*.txt" -exec more {} \; equates to a sequence of commands:
$ find . -name "*.txt"
./subdir/helloworld2.txt
./helloworld.txt
./hello.txt


Followed by:
$ more ./subdir/helloworld2.txt ./helloworld.txt ./hello.txt

Similarly, editing each txt file can be done by:
$ find . -name "*.txt" -exec vi {} \;

No puppies cried in the last two commands.

You're also able to chain commands together.  Say for instance you're interested in finding files that contain the word 'hello' in them, the following command will suit the bill:
$ find . -name "*.txt" -exec grep -l 'hello' {} \;

What if however you want to find files that contain 'hello' and 'world' and not necessarily on the same line, you accomplish that by chaining grep commands as follows:
$ find . -name "*.txt" -exec grep -q 'hello' {} \; - exec grep -l 'world' {} \;
./subdir/helloworld2.txt
./helloworld.txt