Home Page Icon
Home Page
Table of Contents for
Cover
Close
Cover
by James Tisdall
Beginning Perl for Bioinformatics
Beginning Perl for Bioinformatics
SPECIAL OFFER: Upgrade this ebook with O’Reilly
A Note Regarding Supplemental Files
Preface
What Is Bioinformatics?
What Bioinformatics Can Do
About This Book
Who This Book Is For
Why Should I Learn to Program?
Structure of This Book
Conventions Used in This Book
Comments and Questions
Acknowledgments
1. Biology and Computer Science
1.1. The Organization of DNA
1.2. The Organization of Proteins
1.3. In Silico
1.4. Limits to Computation
2. Getting Started with Perl
2.1. A Low and Long Learning Curve
2.2. Perl's Benefits
2.2.1. Ease of Programming
2.2.2. Rapid Prototyping
2.2.3. Portability, Speed, and Program Maintenance
2.2.4. Versions of Perl
2.3. Installing Perl on Your Computer
2.3.1. Perl May Already Be Installed!
2.3.2. No Internet Access?
2.3.3. Downloading
2.3.4. Binary Versus Source Code
2.3.5. Installation
2.3.5.1. Unix and Linux
2.3.5.2. Macintosh
2.3.5.3. Windows
2.4. How to Run Perl Programs
2.4.1. Unix or Linux
2.4.2. Macs
2.4.3. Windows
2.5. Text Editors
2.6. Finding Help
3. The Art of Programming
3.1. Individual Approaches to Programming
3.2. Edit—Run—Revise (and Save)
3.2.1. Saves and Backups
3.2.2. Error Messages
3.2.3. Debugging
3.3. An Environment of Programs
3.3.1. Open Source Programs
3.4. Programming Strategies
3.5. The Programming Process
3.5.1. The Design Phase
3.5.2. Algorithms
3.5.3. Pseudocode and Code
3.5.4. Comments
4. Sequences and Strings
4.1. Representing Sequence Data
4.2. A Program to Store a DNA Sequence
4.2.1. Control Flow
4.2.2. Comments Revisited
4.2.3. Command Interpretation
4.2.4. Statements
4.2.4.1. Variables
4.2.4.2. Strings
4.2.4.3. Assignment
4.2.4.4. Print
4.2.4.5. Exit
4.3. Concatenating DNA Fragments
4.4. Transcription: DNA to RNA
4.5. Using the Perl Documentation
4.6. Calculating the Reverse Complement in Perl
4.7. Proteins, Files, and Arrays
4.8. Reading Proteins in Files
4.9. Arrays
4.10. Scalar and List Context
4.11. Exercises
5. Motifs and Loops
5.1. Flow Control
5.1.1. Conditional Statements
5.1.1.1. Conditional tests and matching braces
5.1.2. Loops
5.1.2.1. open and unless
5.2. Code Layout
5.3. Finding Motifs
5.3.1. Getting User Input from the Keyboard
5.3.2. Turning Arrays into Scalars with join
5.3.3. do-until Loops
5.3.4. Regular Expressions
5.3.4.1. Regular expressions and character classes
5.3.4.2. Pattern matching with =~ and regular expressions
5.4. Counting Nucleotides
5.5. Exploding Strings into Arrays
5.6. Operating on Strings
5.7. Writing to Files
5.8. Exercises
6. Subroutines and Bugs
6.1. Subroutines
6.1.1. Advantages of Subroutines
6.1.2. Writing Subroutines
6.2. Scoping and Subroutines
6.2.1. Arguments
6.2.2. Scoping
6.3. Command-Line Arguments and Arrays
6.4. Passing Data to Subroutines
6.4.1. Subroutines: Pass by Value
6.4.2. Subroutines: Pass by Reference
6.5. Modules and Libraries of Subroutines
6.6. Fixing Bugs in Your Code
6.6.1. use warnings; and use strict;
6.6.2. Fixing Bugs with Comments and Print Statements
6.6.3. The Perl Debugger
6.6.3.1. A program with bugs
6.6.3.2. How to start and stop the debugger
6.6.3.3. Debugger command summary
6.6.3.4. Stepping through statements with the debugger
6.6.3.5. Setting breakpoints
6.6.3.6. Fixing another bug
6.6.3.7. use warnings; and use strict; redux
6.7. Exercises
7. Mutations and Randomization
7.1. Random Number Generators
7.2. A Program Using Randomization
7.2.1. Seeding the Random Number Generator
7.2.2. Control Flow
7.2.3. Making a Sentence
7.2.4. Randomly Selecting an Element of an Array
7.2.5. Formatting
7.2.6. Another Way to Calculate the Random Position
7.3. A Program to Simulate DNA Mutation
7.3.1. Pseudocode Design
7.3.1.1. Select a random position in a string
7.3.1.2. Choose a random nucleotide
7.3.1.3. Place a random nucleotide into a random position
7.3.2. Improving the Design
7.3.3. Combining the Subroutines to Simulate Mutation
7.3.4. A Bug in Your Program?
7.4. Generating Random DNA
7.4.1. Bottom-up Versus Top-down
7.4.2. Subroutines for Generating a Set of Random DNA
7.4.3. Turning the Design into Code
7.5. Analyzing DNA
7.5.1. Some Notes About the Code
7.6. Exercises
8. The Genetic Code
8.1. Hashes
8.2. Data Structures and Algorithms for Biology
8.2.1. A Gene Expression Database
8.2.2. Gene Expression Data Using Unsorted Arrays
8.2.3. Gene Expression Data Using Sorted Arrays and Binary Search
8.2.4. Gene Expression Data Using Hashes
8.2.5. Relational Databases
8.2.6. DBM
8.3. The Genetic Code
8.3.1. Background
8.3.2. Translating Codons to Amino Acids
8.3.3. The Redundancy of the Genetic Code
8.3.4. Using Hashes for the Genetic Code
8.4. Translating DNA into Proteins
8.5. Reading DNA from Files in FASTA Format
8.5.1. FASTA Format
8.5.2. A Design to Read FASTA Files
8.5.3. A Subroutine to Read FASTA Files
8.5.4. Writing Formatted Sequence Data
8.5.5. A Main Program for Reading DNA and Writing Protein
8.6. Reading Frames
8.6.1. What Are Reading Frames?
8.6.2. Translating Reading Frames
8.7. Exercises
9. Restriction Maps and Regular Expressions
9.1. Regular Expressions
9.2. Restriction Maps and Restriction Enzymes
9.2.1. Background
9.2.2. Planning the Program
9.2.3. Restriction Enzyme Data
9.2.4. Logical Operators and the Range Operator
9.2.5. Finding the Restriction Sites
9.3. Perl Operations
9.3.1. Precedence of Operations and Parentheses
9.4. Exercises
10. GenBank
10.1. GenBank Files
10.2. GenBank Libraries
10.3. Separating Sequence and Annotation
10.3.1. Using Arrays
10.3.2. Using Scalars
10.3.2.1. Pattern modifiers
10.3.2.2. Examples of pattern modifiers
10.3.2.3. Separating annotations from sequence
10.4. Parsing Annotations
10.4.1. Using Arrays
10.4.2. When to Use Regular Expressions
10.4.3. Main Program
10.4.4. Parsing Annotations at the Top Level
10.4.5. Parsing the FEATURES Table
10.4.5.1. Features
10.4.5.2. Parsing
10.5. Indexing GenBank with DBM
10.5.1. DBM Essentials
10.5.2. A DBM Database for GenBank
10.6. Exercises
11. Protein Data Bank
11.1. Overview of PDB
11.2. Files and Folders
11.2.1. Opening Directories
11.2.2. Recursion
11.2.3. Processing Many Files
11.3. PDB Files
11.3.1. PDB File Format
11.3.2. SEQRES
11.4. Parsing PDB Files
11.4.1. Extracting Primary Sequence
11.4.2. Finding Atomic Coordinates
11.5. Controlling Other Programs
11.5.1. The Stride Secondary Structure Predictor
11.5.2. Parsing Stride Output
11.6. Exercises
12. BLAST
12.1. Obtaining BLAST
12.2. String Matching and Homology
12.3. BLAST Output Files
12.4. Parsing BLAST Output
12.4.1. Extracting Annotation and Alignments
12.4.2. Parsing BLAST Alignments
12.5. Presenting Data
12.5.1. The printf Function
12.5.2. here Documents
12.5.3. format and write
12.6. Bioperl
12.6.1. Sample Modules
12.6.2. Bioperl Tutorial Script
12.7. Exercises
13. Further Topics
13.1. The Art of Program Design
13.2. Web Programming
13.3. Algorithms and Sequence Alignment
13.4. Object-Oriented Programming
13.5. Perl Modules
13.5.1. Bioperl
13.6. Complex Data Structures
13.7. Relational Databases
13.8. Microarrays and XML
13.9. Graphics Programming
13.10. Modeling Networks
13.11. DNA Computers
A. Resources
A.1. Perl
A.1.1. Web Site
A.1.2. CPAN: Comprehensive Perl Archive Network
A.1.3. FAQs: Frequently Asked Questions
A.1.3.1. Beginners
A.1.4. Online Manuals
A.1.5. Books
A.1.6. Conference
A.1.7. Newsgroups
A.2. Computer Science
A.2.1. Algorithms
A.2.2. Software Engineering
A.2.3. Theory of Computer Science
A.2.4. General Programming
A.3. Linux
A.4. Bioinformatics
A.4.1. Books
A.4.2. Governmental Organizations
A.4.3. Conferences
A.5. Molecular Biology
B. Perl Summary
B.1. Command Interpretation
B.2. Comments
B.3. Scalar Values and Scalar Variables
B.3.1. Strings
B.3.2. Numbers
B.3.3. Scalar Variables
B.4. Assignment
B.5. Statements and Blocks
B.6. Arrays
B.7. Hashes
B.8. Operators
B.9. Operator Precedence
B.10. Basic Operators
B.10.1. Arithmetic Operators
B.10.2. Bitwise Operators
B.10.3. String Operators
B.10.4. File Test Operators
B.11. Conditionals and Logical Operators
B.11.1. true and false
B.11.2. Logical Operators
B.11.3. Using Logical Operators for Control Flow
B.11.4. The if Statement
B.12. Binding Operators
B.13. Loops
B.14. Input/Output
B.14.1. Input from Files
B.14.2. Input from STDIN
B.14.3. Input from Files Named on the Command Line
B.14.4. Output Commands
B.14.4.1. Output to STDOUT, STDERR, and Files
B.15. Regular Expressions
B.15.1. Overview
B.15.2. Metacharacters
B.15.2.1. Escaping with
B.15.2.2. Alternation with |
B.15.2.3. Grouping with ( )
B.15.2.4. Character classes
B.15.2.5. Matching any character with .
B.15.2.6. Beginning and end of strings with ^ and $
B.15.2.7. Quantifiers: * + {MIN,} {MIN,MAX} ?
B.15.2.8. Making quantifiers match minimally with ?
B.15.3. Capturing Matched Patterns
B.15.4. Metasymbols
B.15.5. Extending Regular-Expression Sequences
B.15.6. Pattern Modifiers
B.16. Scalar and List Context
B.17. Subroutines and Modules
B.18. Built-in Functions
Index
About the Author
Colophon
SPECIAL OFFER: Upgrade this ebook with O’Reilly
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Next
Next Chapter
Beginning Perl for Bioinformatics
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset